lakecode is the coding agent with a governed, compounding memory. It learns your systems once, gets cheaper and more correct every session — and your org can see and govern everything it knows.
Private beta — we onboard design-partner teams personally, one at a time.
Every team can ship v1. The hard part is year two — when the original authors have moved on, the docs are stale, and the only record of why things work is scattered across closed PRs and Slack threads. Coding agents make this worse, not better.
Every session starts cold and re-derives your codebase's quirks, internal APIs, and conventions. You pay the same exploration cost in tokens and time, forever.
Why a constant is 4471. Which IDs are valid. What failed last quarter. The knowledge that decides correctness was never written anywhere a model can see — no context window, fine-tune, or doc-RAG reaches it.
What does the agent know? Where did it learn it? Who approved it? Black-box agent memory is a compliance non-starter — especially on a data platform.
Every session's exploration becomes the next session's prior. That's the whole product.
lakecode is a full coding agent in your terminal. It auto-primes from the substrate at session start — your org's decisions, constraints, findings, and failed paths — then gets to work.
What the session learns persists: findings, decisions, failed paths — and code-change rationale captured at commit time. Knowledge that never existed in any document.
Nothing becomes org truth silently. Claims carry provenance and confidence, flow through a review queue, and climb a promotion ladder: local → workspace → org.
The next session — yours or a teammate's — starts already knowing. Cheaper where you'd already be right, correct where you'd be wrong. The loop compounds.
Memory does two different jobs. We measure them separately — and never blend the numbers.
Execution-bound · the answer is in the repo
~27% cheaper
~22–26% fewer tool calls, at identical correctness.
When exploration can find the answer, memory is an efficiency lever: the agent skips the wandering it already did last time. Correctness is unchanged by design — that's what defines this regime.
Auto-prime + write-back A/Bs · 2026-06-04 · claude-sonnet-4-6 both arms · 20 runs
Knowledge-bound · the answer lives in org knowledge
When correctness depends on knowledge that was never in the repo, agents without it are confidently wrong — and no model upgrade fixes that. One substrate claim flips the result from 0% to 100%.
Grounded-coding bench · 2026-06-08 · claude-sonnet-4-6 everywhere · n=3 per arm · single knowledge-trap domain
Two regimes, two claims. Efficiency numbers are measured at identical correctness on execution-bound tasks; correctness numbers are knowledge-bound tasks where baselines fail. They are never the same number.
The causal validation, end to end: same prompt, same repo, same agent — the only variable is what the substrate holds.
1 · Cold · 0/2
A fresh session is asked to add the project's mandatory outbound HTTP headers. It queries the substrate, finds nothing — and declines to invent a value. No fabrication.
2 · Teach once
The convention ships once — User-Agent: revy-fetch/4471, a gateway allow-list rule from an incident. The commit hook persists it as a claim with provenance.
3 · Warm · 2/2
A fresh, independent session retrieves the claim and writes the exact header into generated code, citing the incident — no shared context window, no fine-tune, no RAG of the diff.
Flywheel causal validation · 2026-06-09 · claude-sonnet-4-6 · n=2 per arm
Every number comes from a pre-registered run with the model pinned and the caveats attached. If a claim isn't dated, we don't ship it.
6/6 = 6/6
Coding parity with Claude Code — bug-fixes byte-identical.
coding-bench v1 · 2026-06 · same model both arms · small generic fixture, n=1 per task
−27% cost
~22–26% fewer tool calls at identical correctness, warm vs cold substrate.
auto-prime + write-back A/Bs · 2026-06-04 · claude-sonnet-4-6 both arms · execution-bound tasks
0/3 → 3/3
On knowledge-bound tasks, one substrate claim flips correctness from 0% to 100%.
grounded-coding bench · 2026-06-08 · claude-sonnet-4-6 · n=3 per arm, single domain
0/2 → 2/2
Fresh sessions honor a convention a prior session persisted; cold runs fabricated nothing.
flywheel causal validation · 2026-06-09 · claude-sonnet-4-6 · n=2 per arm
Single-turn pipeline economics · Stage 1 · Databricks SDK · May 2026
11 questions × 3 replicates, graded by a blind LLM judge. Both systems answer with the same model (Claude Sonnet 4.6) — the gap is the compiled context, not the model.
lakecode pipeline
Claude Code
Cheaper on every question — 2.6× on common APIs, 3.4× on library internals, 5.4× on cross-cutting questions. Caveats: coverage-limited to repo internals, and agentic baselines move fast — the Claude Code baseline itself dropped 30% in 17 days. That's why we publish absolute costs, not just ratios.
Every claim has provenance. Nothing is promoted silently. Two surfaces, one substrate.
lakecode chatStructured org memory — claims with provenance and confidence — built from ingestion and from what your agents learn. Refreshes incrementally on commit.
An entity graph over the codebase — structure, interfaces, tests, configs — with claims extracted and linked to the entities they describe.
Findings, root causes, failed paths. The agent's own write-back is gated — proposed knowledge goes through review before it becomes org truth.
Code-change rationale captured at commit time. The why behind changes — knowledge that never existed in any document — persists beyond the merge.
Architecture docs, runbooks, ADRs — extracted into structured claims and linked into the same graph as the code they describe.
Unity Catalog metadata, table and column lineage, notebook↔table references — the lakehouse edition grounds agents in your data platform itself.
The promotion ladder turns one engineer's finding into the whole team's prior — after review. The second engineer never rediscovers the first one's fix.
We onboard one team at a time: ingest your repos and docs, write a golden-question acceptance set with you, and run the flywheel demo on your own code — cold fail, teach once, warm pass.
Hybrid retrieval — vector, lexical, graph, rerank — selects only what matters for each task, measured on a frozen canonical bench.
"add caching to the user service"
"fix the flaky order test"
"refactor billing to use the new API"
Terminal-native agent. And the substrate serves any MCP client — Claude Code, Cursor, your editor.
A full coding agent alongside git, your editor, and your existing workflow. No new windows.
Auto-primes from the substrate at session start, and persists what it learns — including rationale at commit time.
Your org's memory grounds Claude Code, Cursor, and any MCP client — not just the lakecode CLI.
TypeScript, Python, Rust, Go, Java — if it lives in a repo, lakecode can learn it.
Your models are rented. Your memory is owned — and it compounds with use.
Every session's exploration becomes the next session's prior. The discount and the correctness edge both grow with usage.
Doc-RAG reads what was written. lakecode writes what was learned — incident rationale, failed paths, conventions that never existed in any document.
The substrate is the customer's data — always exportable. The product is what makes it compound: retrieval tuning, promotion governance, platform projection.
The Databricks edition
Grounded in your lakehouse. Governed in Unity Catalog. lakecode ingests UC lineage and notebook graphs — and projects what your agents know back into UC, so your platform team governs agents with the tooling they already trust.
lakecode is in private beta with design-partner teams. Your repo, your golden questions, the flywheel demo on your own code.