Skip to main content

Data Store Schema

Collection

The data store uses a dedicated collection/database:

Collection NamePurposePrimary Key
agent_data_storeAgent-accessible key-value storage_id (composite)

Document ID Structure

Document IDs are constructed as:

{user_id}:{namespace}:{base64_encoded_key}

Components:

  • user_id: The owner's user ID (e.g., user_google_123456789)
  • namespace: The logical namespace (e.g., default, files:my-repo)
  • base64_encoded_key: URL-safe base64 encoding of the original key

Example:

user_google_123456789:files:my-repo:c3JjL21haW4ucHk=
^-- base64("src/main.py")

The base64 encoding ensures keys with special characters (slashes, colons, etc.) are safe for document IDs.

Document Schema

Data Store Entry

{
"_id": "user_123:default:bXkta2V5", # Composite ID
"_rev": "1-abc123def456", # CouchDB revision (if using CouchDB)

# Identity fields
"userId": "user_123", # Owner's user ID
"namespace": "default", # Namespace name
"key": "my-key", # Original key (decoded)

# Data fields
"value": { # Any JSON-serializable data
"result": "analysis complete",
"score": 95,
"items": ["a", "b", "c"]
},
"metadata": { # Optional user-provided metadata
"version": "1.0",
"source": "analyzer-agent"
},

# Agent tracking
"createdByAgent": "data-processor", # Agent that created this entry
"lastAccessedByAgent": "report-generator", # Last agent to read/write
"accessCount": 5, # Total number of accesses

# Timestamps (ISO 8601 format)
"createdAt": "2026-02-01T10:30:00Z",
"updatedAt": "2026-02-05T14:22:00Z",
"lastAccessedAt": "2026-02-05T14:22:00Z"
}

Field Descriptions

FieldTypeRequiredDescription
_idstringYesComposite document ID
userIdstringYesOwner's user ID
namespacestringYesNamespace (default: "default")
keystringYesOriginal key name
valueanyYesStored data (JSON-serializable)
metadataobjectNoUser-provided metadata
createdByAgentstringNoAgent that created this entry
lastAccessedByAgentstringNoLast agent to access this entry
accessCountintegerNoNumber of times accessed
createdAtstringYesISO 8601 creation timestamp
updatedAtstringYesISO 8601 last update timestamp
lastAccessedAtstringNoISO 8601 last access timestamp

Value Field

The value field accepts any JSON-serializable data:

# Primitives
"value": "string value"
"value": 42
"value": 3.14
"value": true
"value": null

# Arrays
"value": [1, 2, 3, "mixed", {"nested": "object"}]

# Objects
"value": {
"nested": {
"deeply": {
"data": [1, 2, 3]
}
}
}

Limitations:

  • Maximum recommended size: 1MB
  • No binary data (use base64 encoding if needed)
  • No circular references
  • No custom class instances (use dicts)

Metadata Field

The metadata field stores user-provided context:

"metadata": {
"version": "1.0",
"format": "json",
"source_url": "https://api.example.com/data",
"processed_by": ["agent-a", "agent-b"],
"tags": ["important", "quarterly"],
"custom_field": "any_value"
}

Metadata is merged on updates—new fields are added, existing fields are overwritten:

# Original metadata
{"version": "1.0", "author": "alice"}

# Update with
{"version": "2.0", "reviewer": "bob"}

# Result
{"version": "2.0", "author": "alice", "reviewer": "bob"}

Indexes

For optimal query performance, the data store creates indexes automatically. On CouchDB a Mango index is created at service startup via _ensure_standard_indexes() and re-ensured on first write to each new namespace via _ensure_namespace_indexed(). Both calls are idempotent.

CouchDB Mango Index

Created automatically — no manual setup required:

{
"index": { "fields": ["userId", "namespace"] },
"name": "idx-user-namespace",
"type": "json"
}

This index covers list_keys(), list_namespaces(), and get_all(). All three use find() with a Mango selector instead of scanning _all_docs.

You can verify the index exists:

curl -s http://admin:password@localhost:5984/agent_data_store/_index | jq

If you need to recreate it manually (e.g. after a database restore):

curl -X POST http://admin:password@localhost:5984/agent_data_store/_index \
-H "Content-Type: application/json" \
-d '{
"index": { "fields": ["userId", "namespace"] },
"name": "idx-user-namespace",
"type": "json"
}'

DynamoDB

Index TypePartition KeySort KeyPurpose
Primary_id-Direct document access
GSIuserIdnamespaceList keys in namespace

Firestore

Collection: agent_data_store
Document ID: {composite_id}

Composite Index:
- userId (Ascending)
- namespace (Ascending)

Query Patterns

All queries below use the idx-user-namespace Mango index (on CouchDB) or equivalent backend mechanism.

List Keys in Namespace

# list_keys(user_id, namespace) — uses find() with field projection
db.find("agent_data_store",
selector={"userId": user_id, "namespace": namespace},
fields=["key"])

List Namespaces

# list_namespaces(user_id) — uses find() with field projection
db.find("agent_data_store",
selector={"userId": user_id},
fields=["namespace"])
# Deduplicated in Python to return unique namespace names

Get All (Bulk Retrieve)

# get_all(user_id, namespace) — single indexed query, no projection
db.find("agent_data_store",
selector={"userId": user_id, "namespace": namespace})
# Returns full documents; mapped to {key: value} dict in Python

Get Document

# get(user_id, namespace, key) — direct ID lookup, no index needed
doc_id = f"{user_id}:{namespace}:{base64_encode(key)}"
db.get("agent_data_store", doc_id)

Migration Considerations

When migrating data between database backends:

  1. Document IDs: Preserve the composite ID format
  2. Timestamps: Ensure ISO 8601 format consistency
  3. Metadata merging: Test that metadata merge logic matches
  4. Access tracking: accessCount may need recalculation

Example Documents

Simple String Value

{
"_id": "user_123:default:Z3JlZXRpbmc=",
"userId": "user_123",
"namespace": "default",
"key": "greeting",
"value": "Hello, World!",
"createdByAgent": "hello-agent",
"accessCount": 1,
"createdAt": "2026-02-05T10:00:00Z",
"updatedAt": "2026-02-05T10:00:00Z"
}

File Analysis Result

{
"_id": "user_123:files:my-repo:c3JjL21haW4ucHk=",
"userId": "user_123",
"namespace": "files:my-repo",
"key": "src/main.py",
"value": {
"content": "def main():\n print('Hello')\n",
"lines": 2,
"language": "python",
"functions": ["main"]
},
"metadata": {
"indexed_at": "2026-02-05T09:00:00Z",
"file_size": 42,
"sha256": "abc123..."
},
"createdByAgent": "repo-indexer",
"lastAccessedByAgent": "code-searcher",
"accessCount": 15,
"createdAt": "2026-02-05T09:00:00Z",
"updatedAt": "2026-02-05T09:00:00Z",
"lastAccessedAt": "2026-02-05T14:30:00Z"
}

Cached API Response

{
"_id": "user_123:cache:github:cmVwb3M=",
"userId": "user_123",
"namespace": "cache:github",
"key": "repos",
"value": [
{"name": "repo-a", "stars": 100},
{"name": "repo-b", "stars": 50}
],
"metadata": {
"cached_at": "2026-02-05T12:00:00Z",
"ttl_seconds": 3600,
"api_endpoint": "/user/repos"
},
"createdByAgent": "github-fetcher",
"accessCount": 3,
"createdAt": "2026-02-05T12:00:00Z",
"updatedAt": "2026-02-05T12:00:00Z"
}