Memory Quarantine — Trust Lanes for Agent Memory
How MeshGuard's memory quarantine system classifies, gates, and promotes agent memory to prevent poisoned or low-trust data from driving high-impact actions.
The Problem with Agent Memory
AI agents remember things. They store conversation history, learned procedures, RAG documents, tool outputs, and user preferences. This memory directly influences future actions — an agent that "remembers" a user wants daily email summaries will keep sending them. An agent that learned a refund procedure from a tool output will follow it the next time a customer asks.
This creates a dangerous trust gap. Not all memory is equally trustworthy:
- A procedure approved by a human operator is not the same as a procedure an agent learned from scraping a blog post.
- A fact retrieved from your internal knowledge base is not the same as a claim extracted from a third-party API response.
- A constraint configured by an admin is not the same as a preference inferred from a single user conversation.
Without trust classification, all memory looks the same to the agent. A poisoned memory chunk — injected via a compromised data source, a prompt injection attack, or simple hallucination — can drive the agent to take actions you never intended.
Memory quarantine solves this by assigning every piece of agent memory a trust lane and gating access to that memory based on the sensitivity of the action it will influence.
Trust Lanes
MeshGuard classifies memory into four trust lanes, from least to most trusted:
| Lane | Name | Description | Example Sources |
|---|---|---|---|
| 0 | Untrusted | External or unverified sources | Web scrapes, third-party API responses, user input, RAG documents, tool outputs |
| 1 | Observed | Agent-generated, not yet verified | LLM-generated summaries, agent-learned procedures |
| 2 | Verified | Passed adversarial validation checks | Memory that survived injection scans and contradiction checks |
| 3 | Approved Procedural | Explicitly human-approved | Admin-approved procedures, system configuration, human-verified facts |
Trust lanes are assigned at intake based on the source type:
- Lane 3 is reserved for
human_approvedandsystem_configsources. - Lane 1 is assigned to
agent_generationandlearned_proceduresources. - Lane 0 is assigned to everything else:
external_api,web_scrape,user_input,tool_output,rag_document.
Memory can be promoted up lanes through a structured pipeline, and demoted or quarantined when trust is lost.
The Three Guards
Memory quarantine operates through three enforcement points that form a complete trust perimeter around agent memory.
1. Intake Firewall
Every memory write passes through the intake firewall before it enters the memory store. The firewall:
- Classifies content into types:
claim(factual assertion),procedure(how-to instructions),evidence(raw reference data),context(background information),preference(user/system preference), orconstraint(rule or limitation). Classification is automatic based on text analysis, but can be overridden. - Assigns a trust lane based on the memory's source type.
- Attaches provenance — full lineage tracking including source type, source URL, which agent fetched it, the session context, and timestamps.
- Deduplicates by content hash (SHA-256). If the same content already exists, the existing chunk is returned.
- Computes a content hash for integrity verification.
Human-approved content (where approvedBy is set) automatically receives Lane 3. Admin trust lane overrides are supported but require admin authentication and are fully audit-logged.
2. Retrieval Guard
When an agent retrieves memory to influence an action, the retrieval guard filters chunks based on the action's sensitivity:
| Action Sensitivity | Required Minimum Lane | Example Actions |
|---|---|---|
low | Lane 0 (any memory) | Read public data, log events |
medium | Lane 1 (observed+) | Send notifications, update records |
high | Lane 2 (verified+) | Process payments, modify permissions |
critical | Lane 3 (approved only) | Delete data, change security settings |
The guard resolves the minimum trust lane by matching the intended action against configured trust requirements. If no explicit requirement exists, the default sensitivity-to-lane mapping applies.
Chunks that fall below the required lane are silently filtered from the retrieval results. The response includes a count of filtered chunks and warnings if all available memory is below the required threshold.
The retrieval guard also detects conflicts between retrieved chunks — for example, two chunks about the same topic (matching tags) with significantly different trust lanes (difference of 2 or more), which may indicate contradictory information from sources of different reliability.
3. Action Guard
The action guard is the final enforcement point. Before an action executes, it validates that every memory chunk that influenced the decision meets the action's trust requirements. This catches cases where memory that was acceptable at retrieval time has since been demoted or quarantined.
The action guard:
- Records which chunks influenced each action (full influence audit trail)
- Checks the lowest trust lane in the influencing set against the action's requirement
- Blocks execution if any influencing chunk is below the threshold
- Supports preflight checks for dry-run evaluation
- Allows admin overrides with approval workflow
The Promotion Pipeline
Memory doesn't stay at its initial lane forever. The promotion pipeline provides a structured path for elevating trust through validation:
Promotion Paths and Required Tests
| From | To | Required Tests |
|---|---|---|
| Lane 0 → Lane 1 | injection_scan | |
| Lane 0 → Lane 2 | injection_scan, contradiction_check | |
| Lane 0 → Lane 3 | injection_scan, contradiction_check, human_review | |
| Lane 1 → Lane 2 | injection_scan, contradiction_check | |
| Lane 1 → Lane 3 | injection_scan, contradiction_check, human_review | |
| Lane 2 → Lane 3 | human_review |
Promotion Tests
Injection scan — Runs the prompt injection detector against the memory content. If injection patterns are found above the confidence threshold, the test fails.
Contradiction check — Compares the memory chunk against known facts and existing verified memory. If the content contradicts established knowledge, the test fails.
Replay test — Replays the memory with synthetic data to verify it produces consistent, expected behavior.
Human review — Requires manual approval from a human reviewer. This is the only path to Lane 3 — no amount of automated testing can substitute for explicit human approval of procedural memory.
Age threshold — Time-based promotion: memory must be at least 168 hours (1 week) old.
Usage threshold — Usage-based promotion: memory must have been accessed at least 10 times without incident.
Promotion Workflow
- Request promotion — Specify the chunk ID and target lane. The system validates the chunk is active and the target lane is higher.
- Run automated tests — The system runs all required automated tests (injection scan, contradiction check, etc.). Human review is skipped and must be completed separately.
- Complete human review (if required) — A reviewer approves or rejects the promotion with notes.
- Finalize — If all tests pass, the chunk's trust lane is updated. If any test fails, the request is auto-rejected.
Quarantine and Demotion
When a memory chunk becomes suspect — due to a detected injection pattern, a contradiction with verified facts, or an admin decision — it can be quarantined or demoted.
Quarantine Status
| Status | Meaning |
|---|---|
active | Available for retrieval and use |
quarantined | Blocked from all retrieval and use |
pending_review | Awaiting human review |
expired | Past its TTL, no longer active |
Quarantine Actions
- Quarantine a single chunk or batch of chunks
- Quarantine by lineage — quarantine all chunks from a specific source (source type, source agent, date range)
- Unquarantine — restore a chunk after review
- Revoke — permanent removal
- Lane override — manually change a chunk's trust lane (admin only, audit-logged)
- Bulk update — change lanes for multiple chunks at once
Content TTLs
Memory chunks have default time-to-live values by content type:
| Content Type | Default TTL |
|---|---|
claim | 1 week |
procedure | 1 day |
evidence | 30 days |
context | 1 week |
preference | 90 days |
constraint | 1 year |
Procedures have a deliberately short TTL because stale procedures are one of the highest-risk memory types — an agent following an outdated procedure can cause significant harm.
Configuration
The memory quarantine system is configured per organization:
# Default TTLs by content type (hours)
defaultTtlHours:
claim: 168
procedure: 24
evidence: 720
context: 168
preference: 2160
constraint: 8760
# Auto-promotion thresholds
autoPromoteAfterUsageCount: 10
autoPromoteAfterHours: 168
# Quarantine behavior
quarantineOnInjectionDetection: true
quarantineOnContradiction: true
# Audit retention
influenceAuditRetentionDays: 90Action Trust Requirements
Configure minimum trust lanes for specific action patterns:
# High-impact actions require verified memory
- actionPattern: "delete:*"
sensitivity: critical
minTrustLane: 3
allowOverride: false
- actionPattern: "write:payment*"
sensitivity: high
minTrustLane: 2
allowOverride: true
overrideRequiresApproval: true
- actionPattern: "read:*"
sensitivity: low
minTrustLane: 0When to Use Memory Quarantine
Memory quarantine is most valuable when your agents:
- Ingest external data — RAG pipelines, web scraping, API calls, or user-provided documents that could contain poisoned content.
- Learn procedures — Agents that learn how to do things from observation or instruction, where a bad procedure could cause repeated harm.
- Make high-impact decisions — Agents that process payments, modify access controls, or take actions that are difficult to reverse.
- Operate in multi-agent systems — Where one agent's output becomes another agent's memory, and a single poisoned source can propagate through the chain.
- Need compliance audit trails — Regulated industries where you need to prove which data influenced each decision.
If your agents only consume static, pre-approved knowledge bases and don't learn or ingest external data, memory quarantine adds overhead without significant benefit.
Monitoring
The memory quarantine system exposes statistics per organization:
- Chunk counts by lane, status, content type, and source type
- Trust gate evaluation counts (passes vs. blocks and block rate)
- Promotion request counts (approvals vs. rejections)
- Quarantine action counts and active quarantine count
Use these metrics to tune your trust requirements. A high gate block rate may indicate your trust requirements are too strict for your data sources, or that you need to invest in promotion pipelines to elevate trustworthy memory faster.
Where This Connects
- Preventing Prompt Injection — The injection scan used in promotion tests uses the same detection engine
- Monitoring Agent Behavior — Behavioral anomalies can trigger automated quarantine actions
- Securing Agent Delegation — Delegated agents inherit memory trust constraints from their delegator
- Trust Tiers Explained — Agent trust tiers are separate from memory trust lanes, but both feed into enforcement decisions
Memory quarantine is the difference between an agent that treats all data as gospel and one that knows the difference between a verified fact and an unverified claim. For agents that ingest external data or learn from their environment, it is a critical layer of defense.
