Data Modeling for AI Agents pt.2: the Context and Knowledge Layer

    How to design a CKMS that turns raw data into actionable context for AI agents, with metadata modeling and vector storage.

    Data Modeling
    10/22/2025
    Data Modeling for AI Agents pt.2: the Context and Knowledge Layer

    Modern AI systems increasingly rely on multi-agent architectures, where specialized agents (planning, search, action, review, and response) collaborate to achieve complex goals. To make that collaboration reliable and efficient, we have found that data must flow cleanly across three key layers:

    1. Systems of Record (SoR): The source of truth for your business and operational data.
    2. Context & Knowledge Management System (CKMS): The intelligence layer that turns data into usable context for agents.
    3. Orchestration Layer: The process brain that coordinates how agents use that context to act.

    Each layer needs intentional data modeling. Done right, it eliminates duplication, reduces operational cost, and keeps your agents aligned with the real circumstances of your business, avoiding drift into disconnected silos.

    👉 This series expands on the framework introduced in our main article, Data Modeling for Multi-Agent Architectures diving deeper into the second layer and how to design it effectively.

    Here are the links to the rest of the series:


    If Systems of Record are your memory…

    …then the CKMS is your working memory, where information becomes actionable.

    It makes the SoR’s structured data accessible and meaningful to AI agents by adding context, embeddings, and metadata that encode relationships, provenance, and semantic meaning.


    1. What the CKMS Does

    Think of the CKMS as the semantic bridge between raw data and reasoning agents. Its main jobs:

    • Make structured data retrievable by meaning, not just ID
    • Track how and when knowledge was last updated
    • Provide consistent “context snapshots” to planning, search, and response generation agents

    Without it, agents constantly re-embed data or operate with stale context.


    2. Core Data Model of a CKMS

    A minimal CKMS schema often includes:

    EntityPurpose
    KnowledgeObjectThe atomic unit of knowledge (document, record, or fact)
    EmbeddingVector representation linked to a KnowledgeObject
    SourceRefPointer to where the data originated in the SoR
    AgentContextSnapshots of what each agent saw or used during an interaction
    UsageEventRecord of when knowledge was read, updated, or referenced

    Each object carries metadata: source, timestamp, freshness, and relevance.

    This enables agents to reason over context that can be trusted, reducing duplication and inconsistencies.


    3. Metadata Design Principles

    Metadata is what separates a good CKMS from a vector soup. Useful metadata includes:

    • Provenance: Which SoR, workflow, or agent created the data
    • Context type: e.g., customer insight, document, conversation summary
    • Freshness: Last sync time, staleness threshold
    • Quality metrics: Usage frequency, accuracy rating

    Metadata can be stored in Postgres or MongoDB, linked to embeddings stored in Weaviate, Qdrant, PGVector, or **Pinecone (**managed vector database platforms for semantic retrieval at scale).


    4. Choosing a Stack

    A practical CKMS stack could include:

    • Postgres for metadata and relationships
    • PGVector for embeddings in a single system, or Weaviate/Qdrant/Pinecone for scalable vector storage
    • LangChain or LlamaIndex as the retrieval abstraction layer
    • Prefect/Temporal for orchestration hooks and context updates

    Keep the CKMS modular because agents should interact via an API, not direct DB calls.


    5. Practical Example: Metadata Linking

    Imagine your planning agent needs to find “recent customer sentiment shifts.”

    A CKMS query might retrieve:

    SELECT k.id, k.title, e.embedding
    FROM knowledge_objects k
    JOIN embeddings e ON e.object_id = k.id
    WHERE k.context_type = 'customer_feedback'
    AND k.last_synced_at > NOW() - INTERVAL '7 days';
    
    

    This ensures fresh, semantically relevant items are returned, avoiding stale or unrelated context.


    6. Common Pitfalls

    1. No metadata layer → embeddings lose meaning over time
    2. Overloading the CKMS → slower retrieval and higher costs
    3. Not tracking freshness or provenance → “hallucinated” context
    4. Agents re-embedding raw data repeatedly → unnecessary compute waste

    7. Key Takeaway

    The CKMS is your system’s shared brain, where structure meets semantics.

    Model it well, and every agent can operate contextually without re-learning the world each time it runs.

    More on Data Modeling

    Data Modeling for AI Agents pt.2: the Context and Knowledge Layer