Data Modeling for AI Agents pt.2: the Context and Knowledge Layer

Modern AI systems increasingly rely on multi-agent architectures, where specialized agents (planning, search, action, review, and response) collaborate to achieve complex goals. To make that collaboration reliable and efficient, we have found that data must flow cleanly across three key layers:

Systems of Record (SoR): The source of truth for your business and operational data.
Context & Knowledge Management System (CKMS): The intelligence layer that turns data into usable context for agents.
Orchestration Layer: The process brain that coordinates how agents use that context to act.

Each layer needs intentional data modeling. Done right, it eliminates duplication, reduces operational cost, and keeps your agents aligned with the real circumstances of your business, avoiding drift into disconnected silos.

👉 This series expands on the framework introduced in our main article, Data Modeling for Multi-Agent Architectures diving deeper into the second layer and how to design it effectively.

Here are the links to the rest of the series:

Data Modeling for AI Agents pt.1: the Systems of Record Layer
Data Modeling for AI Agents pt.2: the Context and Knowledge Layer: this article
Data Modeling for AI Agents pt.3: the Orchestration Layer

If Systems of Record are your memory…

…then the CKMS is your working memory, where information becomes actionable.

It makes the SoR’s structured data accessible and meaningful to AI agents by adding context, embeddings, and metadata that encode relationships, provenance, and semantic meaning.

1. What the CKMS Does

Think of the CKMS as the semantic bridge between raw data and reasoning agents. Its main jobs:

Make structured data retrievable by meaning, not just ID
Track how and when knowledge was last updated
Provide consistent “context snapshots” to planning, search, and response generation agents

Without it, agents constantly re-embed data or operate with stale context.

2. Core Data Model of a CKMS

A minimal CKMS schema often includes:

Entity	Purpose
`KnowledgeObject`	The atomic unit of knowledge (document, record, or fact)
`Embedding`	Vector representation linked to a KnowledgeObject
`SourceRef`	Pointer to where the data originated in the SoR
`AgentContext`	Snapshots of what each agent saw or used during an interaction
`UsageEvent`	Record of when knowledge was read, updated, or referenced

Each object carries metadata: source, timestamp, freshness, and relevance.

This enables agents to reason over context that can be trusted, reducing duplication and inconsistencies.

3. Metadata Design Principles

Metadata is what separates a good CKMS from a vector soup. Useful metadata includes:

Provenance: Which SoR, workflow, or agent created the data
Context type: e.g., customer insight, document, conversation summary
Freshness: Last sync time, staleness threshold
Quality metrics: Usage frequency, accuracy rating

Metadata can be stored in Postgres or MongoDB, linked to embeddings stored in Weaviate, Qdrant, PGVector, or **Pinecone (**managed vector database platforms for semantic retrieval at scale).

4. Choosing a Stack

A practical CKMS stack could include:

Postgres for metadata and relationships
PGVector for embeddings in a single system, or Weaviate/Qdrant/Pinecone for scalable vector storage
LangChain or LlamaIndex as the retrieval abstraction layer
Prefect/Temporal for orchestration hooks and context updates

Keep the CKMS modular because agents should interact via an API, not direct DB calls.

5. Practical Example: Metadata Linking

Imagine your planning agent needs to find “recent customer sentiment shifts.”

A CKMS query might retrieve:

SELECT k.id, k.title, e.embedding
FROM knowledge_objects k
JOIN embeddings e ON e.object_id = k.id
WHERE k.context_type = 'customer_feedback'
AND k.last_synced_at > NOW() - INTERVAL '7 days';

This ensures fresh, semantically relevant items are returned, avoiding stale or unrelated context.