← All articles Engineering

OBTO vs LangSmith vs Langfuse: Choosing a Self-Hosted AI Stack

OBTO Team · Insights from the Glass Box

"OBTO vs LangSmith" is a question we hear regularly, and the honest first answer is that they are not the same kind of thing. LangSmith and Langfuse are observability and evaluation layers — they watch your LLM application from the outside. OBTO is a platform that runs your agents, with observability built into the runtime. If you're choosing a self-hosted stack for agents, you're really making two decisions at once: where your agents execute, and how you see what they did.

This comparison lays out what each tool actually is, what self-hosting genuinely costs for each, and where each one is the right answer — including the cases where OBTO is not.

What each tool actually is

LangSmith is the managed observability and evaluation platform from the LangChain team: tracing, datasets, LLM-as-judge evaluations, human annotation queues, and a prompt hub. It integrates most deeply with LangChain and LangGraph, though its SDK works framework-agnostically. It is cloud-first by design.

Langfuse is an open-source LLM engineering platform: tracing, prompt management, evaluations, and a playground. Since June 2025, every product feature — including previously cloud-only ones like managed evaluations and annotation queues — has been MIT-licensed.

OBTO is a platform that executes agentic workloads — fully managed in OBTO's cloud by default, and self-hostable on your own Kubernetes at the Enterprise tier. It provides the agent runtime, MCP server hosting, full-stack app deployment, and scheduling, with Glass Box tracing and the per-run Glass Receipt cost ledger built in. It is not a LangChain-style framework and not an observability sidecar.

The structural difference matters: neither LangSmith nor Langfuse runs your agents. You still need compute, an MCP host, deployment, and a billing relationship with model providers. OBTO overlaps with them on the observability surface, but in deployment terms it competes with assembling that stack yourself.

Self-hosting: three very different stories

Langfuse is the gold standard for free self-hosting in this category. The MIT core is the full product — no license key, no usage metering. The real cost is infrastructure and operations: Postgres, ClickHouse, Redis, and object storage, plus ongoing DevOps time. Independent estimates put a mid-scale self-hosted deployment at roughly $3,000–4,000 per month all-in. A commercial license is only needed for enterprise extras like SCIM, extended audit logs, and retention policies.

LangSmith offers self-hosting only on its Enterprise plan, at custom pricing that is commonly reported in the six figures, with infrastructure requirements to match (a substantial Kubernetes cluster plus managed Postgres, Redis, and ClickHouse). For most teams the practical LangSmith answer is its managed cloud.

OBTO covers both ends. Builder, Team, and Business run fully managed in OBTO's cloud — nothing to operate, free tier included. The Enterprise plan then ports the entire runtime — apps, agents, MCP servers, tracing — to your own Kubernetes cluster with no code changes, with the same transparency guarantees either way. To be clear about the trade-off: OBTO's self-hosted option is a commercial plan, not free open source. If your requirement is "free, MIT-licensed, self-hosted observability," Langfuse wins that comparison outright.

Pricing: the structure is the story

Published rates change; structures change slowly. As of mid-2026:

The deeper difference is what you are billed for. LangSmith bills traces; Langfuse bills events; both sit on top of your separate model and hosting bills. OBTO meters tokens, requests, and storage because it is also your runtime — the platform bill and the inference bill are the same Glass Receipt. That makes per-task cost accounting a query rather than a reconciliation project across three invoices.

Observability depth vs. operational breadth

A fair comparison has to concede depth where it exists. LangSmith and Langfuse are deeper evaluation platforms: curated datasets, regression suites, judge pipelines, and annotation workflows are their core business. If your team's bottleneck is systematic eval and prompt iteration, a dedicated tool is the stronger choice.

OBTO's tracing comes from the opposite direction: because the platform executes the workload, every tool call, token, and policy decision is captured at the runtime level rather than through SDK instrumentation you have to maintain. There is no "forgot to wrap this call" blind spot, and traces carry real cost figures, as we covered in our agent observability guide. The approaches also compose — some teams run agents on OBTO and export traces to Langfuse for eval workflows.

The headless angle: OBTO as an MCP backend

One difference that doesn't fit neatly into a pricing table: OBTO runs entirely headless. There is no UI you're forced to adopt — the platform works as a no-UI, API-and-MCP backend you drive from Claude Desktop, ChatGPT, Cursor, VS Code, or any MCP client your team already uses. Your agents' front door is whatever client your people are in, not another dashboard.

That matters most for companies whose deliverable is the MCP server. On OBTO, MCP tools are data, not deployments: tools can be created and updated at runtime, and new definitions become callable the moment they're saved — no build pipeline or release cycle per tool change. Standing up a company MCP server — tools, policies, auth, hosting — is the platform's native workflow, and every invocation lands in the same Glass Box trace. Our guide to building MCP tools shows what that looks like in practice.

To keep the comparison fair: this isn't a gap in LangSmith or Langfuse so much as a different category. An observability layer can watch your MCP server; it can't be one.

How to choose

And if the budget question is what's blocking you, all three have a $0 way in. OBTO's free Builder tier includes an application, an MCP endpoint, and Glass Box tracing — the getting-started guide takes about ten minutes, which is roughly the time it takes to read a pricing page.

Run it. Watch it. Audit it.

One platform — managed cloud or your own Kubernetes — for agent runtime, MCP hosting, and runtime-native tracing, with a receipt for every run.

Get started

More from the OBTO blog