From Span Hacks to Ingest Pipelines: Server-Side LLM Cost Enrichment with Elastic APM

In my last post, I showed a CostEnrichingSpanExporter that injects cost data into LLM spans. It works. But it mutates span._attributes, a private API that violates the OpenTelemetry spec. The ReadableSpan contract says spans are immutable after on_end(). The Python SDK’s _attributes happens to be a writable BoundedAttributes dict, but that’s an implementation detail, not a contract. GitHub issue #4424 explicitly requests hooks for this use case and confirms the gap. What if cost enrichment happened in Elasticsearch, not in Python? ...

February 22, 2026 · 18 min · Mahesh Babu Gorantla

Monitoring LLM Usage with Elastic APM & OpenLLMetry

You shipped an LLM-powered feature. Users love it. Then the invoice arrives. Nobody can explain where $4,000 in API costs went last Tuesday. LLMs are black boxes in production. You can’t see how many tokens each request burns, which model is slower, or why a batch job at 3 AM quietly retried thousands of failed completions and doubled your daily spend. Traditional APM tools are starting to add LLM support, though coverage and pricing vary; some bundle it in, others charge extra. Dedicated LLM observability platforms offer deeper insight out of the box, though many require a proprietary SDK or proxy that ties your instrumentation to a single vendor. ...

February 15, 2026 · 17 min · Mahesh Babu Gorantla