← All articles Engineering

The Big AI Lock-In

OBTO Team · Insights from the Glass Box

On a Tuesday last November, two notices reached us in the same afternoon. One was an email: a model snapshot we'd wired into a customer workflow would be removed from the API in February. The other wasn't an email at all — it was every request to two of the big providers returning 5xx at once, because a configuration push at Cloudflare had taken roughly a fifth of the web down with it. Same afternoon, two reminders that the thing our software runs on is not ours.

We call it the big AI lock-in, and it's quieter than the version people argue about online. The loud version is "which model is best." The quiet version is a question almost nobody asks until it's too late: where does the state of your work actually live?

Lock-in isn't about the model. It's about the state.

For most teams, the state lives inside one provider's context window, inside one chat client. Every reasoning step, every correction, every hard-won fact about your codebase sits in a session owned by a vendor. When the model is deprecated, the provider reprices, or the client you used quietly changes, that state doesn't migrate. You start over.

And switching providers is never the base-URL change the marketing implies. Your prompts are tuned to one model's quirks. Your retrieval index is built on one embedding family. Your tools speak one function-calling dialect. Worse, the long tail of context — the stuff a teammate would call institutional memory — was never written down anywhere a different model could read it.

None of this is hypothetical. Anthropic commits to at least 60 days' notice before retiring a publicly released model. OpenAI pulled GPT-4o from ChatGPT this past February on roughly two weeks' warning, and gave API users of the chatgpt-4o-latest snapshot until the 17th before it was gone. Deprecation is a feature of this industry, not a bug. If a workflow can't survive the model it was built on going away, it isn't in production — it's on loan.

So while building OBTO we made a rule for ourselves: keep the durable parts of a workstream — the memory and the conversation — outside any single model or client. Two systems do that work. We built both because we needed them first.

The Memory Fabric: write once, recall from anywhere

The first is a cross-session vector memory we call the Memory Fabric, backed by a service named Hindsight. It is deliberately boring to use. Any model, in any session, can write a fact and later recall it by meaning instead of exact match:

// Write a fact once — from whatever model you happen to use today.
remember({
  appName: "checkout",
  key: "refund-idempotency",
  content: "Refund webhooks retry; the refund path needs an idempotency key."
});

// Weeks later, a different model in a different client asks by meaning.
recall({ appName: "checkout", query: "gotchas in the refund flow" });
// -> [{ score: 0.71, content: "Refund webhooks retry; the refund path..." }]

The point isn't the API; it's the boundary. The memory lives on the platform, not in a context window. It survives a session reset, a pod reboot, and — the part that matters here — a change of model. When we move a workflow from one provider to another, its memory doesn't have to move, because it was never inside the provider to begin with.

Honest tradeoffs, because we'd rather name them: recall is similarity-ranked, not exact. A top-k query returns the closest matches by meaning, which is forgiving when you phrase the same question three ways and unforgiving when two facts are near-duplicates. We cap a single memory at 60,000 characters and lean on short, declarative entries — "this is true about this app" — over dumped transcripts. Vector memory is a card catalog, not a database. Treat it like one and it earns its keep.

Agent Bridge: one workstream, many clients

Memory handles facts. The harder problem is the workstream — the live thread of a task, which today is trapped inside whichever chat window you happened to open. Agent Bridge is a shared message bus where agents and humans post to named threads. An agent asks a question, logs a result, or flags a decision; a human answers later from anywhere; another agent — different model, different client — picks the thread up and keeps going.

// An agent, mid-task, hits a real decision and posts it.
{ thread: "checkout-migration", author: "claude@laptop", role: "agent",
  kind: "question",
  body: "Refund path can double-insert under a retry. Fix now, or punt
         to v2 with a known-race note?" }

// A human replies later — from a phone, not the agent's client.
{ thread: "checkout-migration", author: "james", role: "human",
  kind: "reply",
  body: "Punt to v2. Add the comment and keep moving." }

That exchange is real plumbing, not a diagram. Messages carry a role — agent or human — and a kind: status, question, result, or reply. Threads are named and durable. A cheap cursor lets a returning agent ask "what happened while I was gone?" in a single round trip. The effect is that a task stops being a property of a chat window. You can start a migration in one AI client at your desk, answer its question from your phone at lunch, and let a different model finish it that evening — same thread, same context, no copy-paste.

Could you get a slice of this from a single provider's native memory and history? For a while, yes — right up until that provider is the thing that's down or deprecated. The whole point of a bus is that it doesn't belong to the model. It's the difference between writing your notes in someone else's notebook and writing them in your own.

What this doesn't buy you

Portability isn't free, and pretending otherwise would be the salesy move we try to avoid. You will still re-tune prompts when you switch models; tone and instruction-following differ, and no abstraction hides that. A retrieval index built on one embedding family still has to be re-embedded to move cleanly. And writing things down — to memory, to a thread — is a discipline, not an automation. The tools make portability possible; they don't make it free. What they remove is the worst case, where a deprecation email means rebuilding a workflow from nothing.

They also pair naturally with the routing we already do. Multi-model orchestration sends each step to the model that fits it — planning to a frontier model, extraction and formatting to a fast, cheap one — which we covered in multi-model orchestration. And because every step is captured in the same per-run ledger that powers our agent observability, the trace that tells you what an agent did is the same one a teammate reads when they pick the thread up tomorrow.

Describe it. Ship it. Own it.

We think about all of this the way we think about everything on the platform, and the load-bearing word is the last one. Owning the work means that when the models underneath you change — and they will, on someone else's schedule — the memory and the workstreams you've built on top of them are still yours. Lock-in is rarely a decision anyone makes on purpose. It's the default you back into by letting your state live in a place you don't control.

If you want to see it on a real workflow, the getting-started guide takes about ten minutes, and the free Builder tier includes Glass Box tracing and memory. Our pricing is flat platform tiers plus metered tokens at published rates — deliberately structured so the thing you're trying to escape isn't quietly priced back in.