← All articles Engineering

OBTO vs LangSmith vs Langfuse: Choosing a Self-Hosted AI Stack

OBTO Team · Insights from the Glass Box

"OBTO vs LangSmith" is a question we hear regularly, and the honest first answer is that they are not the same kind of thing. LangSmith and Langfuse are observability and evaluation layers — they watch your LLM application from the outside. OBTO is a platform that runs your agents, with observability built into the runtime. If you're choosing a self-hosted stack for agents, you're really making two decisions at once: where your agents execute, and how you see what they did.

This comparison lays out what each tool actually is, what self-hosting genuinely costs for each, and where each one is the right answer — including the cases where OBTO is not.

What each tool actually is

LangSmith is the managed observability and evaluation platform from the LangChain team: tracing, datasets, LLM-as-judge evaluations, human annotation queues, and a prompt hub. It integrates most deeply with LangChain and LangGraph, though its SDK works framework-agnostically. It is cloud-first by design.

Langfuse is an open-source LLM engineering platform: tracing, prompt management, evaluations, and a playground. Since June 2025, every product feature — including previously cloud-only ones like managed evaluations and annotation queues — has been MIT-licensed.

OBTO is a platform that executes agentic workloads — fully managed in OBTO's cloud by default, and self-hostable on your own Kubernetes at the Enterprise tier. It provides the agent runtime, MCP server hosting, full-stack app deployment, and scheduling, with Glass Box tracing and the per-run Glass Receipt cost ledger built in. It is not a LangChain-style framework and not an observability sidecar.

The structural difference matters: neither LangSmith nor Langfuse runs your agents. You still need compute, an MCP host, deployment, and a billing relationship with model providers. OBTO overlaps with them on the observability surface, but in deployment terms it competes with assembling that stack yourself.

Self-hosting: three very different stories

Langfuse is the gold standard for free self-hosting in this category. The MIT core is the full product — no license key, no usage metering. The real cost is infrastructure and operations: Postgres, ClickHouse, Redis, and object storage, plus ongoing DevOps time. Independent estimates put a mid-scale self-hosted deployment at roughly $3,000–4,000 per month all-in. A commercial license is only needed for enterprise extras like SCIM, extended audit logs, and retention policies.

LangSmith offers self-hosting only on its Enterprise plan, at custom pricing that is commonly reported in the six figures, with infrastructure requirements to match (a substantial Kubernetes cluster plus managed Postgres, Redis, and ClickHouse). For most teams the practical LangSmith answer is its managed cloud.

OBTO covers both ends. Builder, Team, and Business run fully managed in OBTO's cloud — nothing to operate, free tier included. The Enterprise plan then ports the entire runtime — apps, agents, MCP servers, tracing — to your own Kubernetes cluster with no code changes, with the same transparency guarantees either way. To be clear about the trade-off: OBTO's self-hosted option is a commercial plan, not free open source. If your requirement is "free, MIT-licensed, self-hosted observability," Langfuse wins that comparison outright.

Pricing: the structure is the story

Published rates change; structures change slowly. As of mid-2026:

LangSmith: a free Developer tier; Plus at roughly $39 per seat per month with usage-billed traces on top; Enterprise is custom. You pay per seat and per trace volume.
Langfuse Cloud: Hobby at $0, Core at $29, Pro at $199, Enterprise at $2,499 per month, with overage around $8 per 100k units. Self-hosting carries no license fee.
OBTO: Builder at $0, Team at $49/mo base with 5M tokens included and $0.38 per million over, Business at $149/mo, and custom Enterprise. Pricing scales with applications and metered usage rather than per-seat fees.

The deeper difference is what you are billed for. LangSmith bills traces; Langfuse bills events; both sit on top of your separate model and hosting bills. OBTO meters tokens, requests, and storage because it is also your runtime — the platform bill and the inference bill are the same Glass Receipt. That makes per-task cost accounting a query rather than a reconciliation project across three invoices.

Observability depth vs. operational breadth

A fair comparison has to concede depth where it exists. LangSmith and Langfuse are deeper evaluation platforms: curated datasets, regression suites, judge pipelines, and annotation workflows are their core business. If your team's bottleneck is systematic eval and prompt iteration, a dedicated tool is the stronger choice.

OBTO's tracing comes from the opposite direction: because the platform executes the workload, every tool call, token, and policy decision is captured at the runtime level rather than through SDK instrumentation you have to maintain. There is no "forgot to wrap this call" blind spot, and traces carry real cost figures, as we covered in our agent observability guide. The approaches also compose — some teams run agents on OBTO and export traces to Langfuse for eval workflows.

The headless angle: OBTO as an MCP backend

One difference that doesn't fit neatly into a pricing table: OBTO runs entirely headless. There is no UI you're forced to adopt — the platform works as a no-UI, API-and-MCP backend you drive from Claude Desktop, ChatGPT, Cursor, VS Code, or any MCP client your team already uses. Your agents' front door is whatever client your people are in, not another dashboard.

That matters most for companies whose deliverable is the MCP server. On OBTO, MCP tools are data, not deployments: tools can be created and updated at runtime, and new definitions become callable the moment they're saved — no build pipeline or release cycle per tool change. Standing up a company MCP server — tools, policies, auth, hosting — is the platform's native workflow, and every invocation lands in the same Glass Box trace. Our guide to building MCP tools shows what that looks like in practice.

To keep the comparison fair: this isn't a gap in LangSmith or Langfuse so much as a different category. An observability layer can watch your MCP server; it can't be one.

How to choose

Choose LangSmith if you're invested in LangChain or LangGraph, want best-in-class managed evaluation tooling, and are comfortable with cloud hosting at seat-plus-usage pricing.
Choose Langfuse if you already have a runtime story, want genuinely free MIT-licensed self-hosted observability, and have the DevOps capacity to operate ClickHouse and friends.
Choose OBTO if you want one platform that runs, hosts, meters, and traces your agents — fully managed in OBTO's cloud by default, self-hostable on your own Kubernetes when you need it — MCP-native, multi-model, with utility-style billing you can audit line by line.

And if the budget question is what's blocking you, all three have a $0 way in. OBTO's free Builder tier includes an application, an MCP endpoint, and Glass Box tracing — the getting-started guide takes about ten minutes, which is roughly the time it takes to read a pricing page.