9
Show HN: Cortexa – Bloomberg terminal for agentic memory
Hi HN — I’m Prateek Rao. My cofounders and I built Cortexa, which we describe as a Bloomberg terminal for agentic memory.
A pattern I keep seeing: when agents misbehave, most teams iterate on prompts and then “fix” it by plugging in a memory layer (vector DB + RAG). That helps sometimes — but it doesn’t guarantee correctness. In practice it often introduces a new failure mode: the agent retrieves something dubious, writes it back to memory as if it’s truth, and that mistake becomes sticky. Over time you get memory pollution, circular hallucination loops, and debugging turns into log archaeology.
What Cortexa does:
1. Agent decision forensics (end-to-end “why”): trace outputs/actions back to the exact retrievals, memory writes, and tool calls that caused them.
2. Memory write governance: intercept and score memory writes (0–1), and optionally block/quarantine ungrounded entries before they poison future runs.
3. Memory hygiene + vector store noise control: automatically detect and remove near-duplicate / low-signal entries so retrieval stays high-quality and storage + inference costs don’t creep up.
Why this matters: Observability is the missing layer for agentic AI. Without it, autonomy is fragile: small errors silently compound, deployments become risky, and engineering cost goes up because failures aren’t reproducible or attributable.
Who this is for: 1. Teams shipping agentic workflows in production 2. Anyone fighting “unknown why” failures, memory pollution, or runaway context costs 3. Engineers who want auditability + faster debugging loops
Site: https://cortexa.ink/
Would love feedback from anyone running agents at scale: 1.What’s the most painful agent failure mode you’ve seen in production? 2.What signals would you want in an “agent terminal” (retrieval diffs, memory blame, tool-call traces, alerts, etc.)?
first, congrats on the waitlist launch! and, at what scale or failure frequency does memory governance become necessary? In other words, how do teams know when they have crossed from prompt tuning problems into systemic memory pollution?
Three signals you've crossed from prompt issues to systemic memory pollution: 1.Behavior drifts without prompt changes (memories are accumulating contradictions). 2.Failures aren't reproducible in a single call (the bad write happened sessions ago). 3.Failures spread across agents (Agent B confidently repeats Agent A's hallucination).
Scale isn't about total size; it's (Write Frequency × Agent Count × Session Overlap). Pollution can start at just a few hundred writes/day.
The leading indicator: Memory Contradiction Rate. If >15% of new writes conflict with existing memories, you're in systemic territory regardless of scale.
looks like a really cool idea ngl