39
Show HN: Ktx – Open-source executable context layer for data agents
Hi HN, we’re open-sourcing ktx. It’s an executable context layer that makes agents reliable on your data stack.
We built it after going through the experience of building production-grade data agents for dozens of companies. If you’ve also tried building them, or simply tried using Claude Code or Codex on your data warehouse, you’ll know that accuracy is the #1 issue. Agents are great at generating valid SQL, but it’s not always correct SQL.
To cite a few examples of “agents gone wrong”:
- Stale column + hidden business rule: when preparing a board report, a finance analyst asks Claude Code for “ARR by customer segment”, it derives ARR from multiple tables (subscriptions, plans, accounts), then groups by accounts.industry. But CC doesn’t know that this industry column was deprecated a few months prior, or that past board reports excluded paused subscriptions from the ARR calculation
- Join fanout: a data analyst at a retailer uses their company’s internal agent to prep a product revenue deck for a QBR. The agent joins orders to order_items, then sums orders.total_amount_cents grouped by order_items.product_id. The SQL runs fine, but each order’s revenue is repeated once per line item, which most people will miss if most orders only have 1 item
- Missing attribution logic: a marketing analyst asks Codex “Which campaigns drove the most revenue?” Codex joins marketing_touches to users to orders and groups by utm_campaign. But since each order can have multiple touches before purchase, the same order can be credited to first touch, last touch, every touch, or every campaign the user clicked before buying. If the agent chooses the method that doesn’t match the team’s attribution logic, they’ll make suboptimal decisions
To solve this at first we gave the agent more context through skills + a wiki-style knowledge base. That gives it some useful extra context but still relies on it writing the SQL without incorrect assumptions.
The next solution we explored was implementing a classic semantic layer. That solves the executable part, but they’re such a pain to build and maintain since they were made for legacy BI tools. Plus as a standalone tool, they lack all the useful context from unstructured data sources like internal docs.
So we built ktx and split it into 2 parts:
1. Business context goes in Markdown wiki pages that are auto-ingested and auto-populated
2. Queryable definitions go into YAML files that define tables, row grain, joins, measures, dimensions, filters, and filter groups
That way, when an agent needs a metric, it asks ktx for a measure, dimensions, filters, and filter groups instead of writing the whole query itself. ktx’s planner chooses the join path, uses grain and relationship metadata, catches issues like join fanout and chasm joins, and compiles the warehouse SQL, while utilizing the extra unstructured knowledge it has access to.
ktx is Apache 2.0. It can ingest from most warehouses (BigQuery, Snowflake, Postgres & others), modeling tools (dbt, MetricFlow, LookML), BI tools (Looker, Metabase), doc tools like Notion, and corrections from user interactions.
Install manually:
npm install -g @kaelio/ktx
ktx setup
Or give this prompt to your agent:
Run npx skills add Kaelio/ktx --skill ktx and use ktx skill to install and configure ktx
We’d especially like feedback from people who’ve tried using Claude Code, Codex, or building custom agents on analytics warehouses. Where did they fail? And what did you try to make the answers more reliable?
Interesting approach. Context management for agents is an underexplored area. I've been looking at similar problems - the key challenge is keeping token usage low while giving the agent enough context to be useful. Tiered retrieval (facts first, full text only whenneeded) seems to work well in practice.
Agreed, in addition to this I'd say it's better to spend tokens once while ingesting the sources and storing the canonical reusable definition rather than having agents do the full exploration over and over again
How does this compare with Wren 2.0, OpenVikings etc
Thanks for the question! Just to complete - OpenVikings is context/memory infrastructure for agents. ktx is analytics context for data agents specifically. So they're in different categories
The way we think about the space: there's a "semantic layer" side (Cube, dbt MetricFlow, Wren's engine) that compiles correct SQL but is hand-authored. And a "company brain" side (OpenViking, Glean, wikis) that indexes prose but can't query data warehouses safely. ktx is built to be both halves at once - a YAML semantic layer and a searchable wiki of business definitions, cross-linked (each wiki page references the metrics it explains) and auto-maintained
Hey, ktx maintainer here! Great question, we actually have quite a bit in common with Wren, they're in the same category. The main difference is that with Wren you author the semantic model (write MDL) and it executes SQL through its own Rust engine (similar to Cube as well). ktx instead builds the context for you - ingesting definitions already in dbt, Looker, Metabase, and Notion, auto-detecting joins and fan/chasm traps, flagging contradictions. It also builds both : the SL and wiki that accumulates and maintains free-form tribal knowledge (that’s also interlinked with ktx SL)
[dead]
[dead]