Calling raw LLM APIs, reporting token usage

This is helpful if any of the following applies to you:

  • You are calling an LLM API, not using their client library/SDK.
  • You are using a library that is not auto-instrumented by OpenLLMetry.
  • You want to report token usage for a specific API call.
  • You are using an open-source/self-hosted LLM.

There are several critical attributes that need to be set on a span to ensure it appears as an LLM span in the UI.

  • lmnr.span.type – must be set to 'LLM'.
  • gen_ai.response.model – must be set to the model name returned by an LLM API (e.g. gpt-4o-mini).
  • gen_ai.system – must be set to the provider name (e.g. ‘openai’, ‘anthropic’).

In addition, the following attributes can be manually added to report token usage and costs:

  • gen_ai.usage.input_tokens – number of tokens used in the input.
  • gen_ai.usage.output_tokens – number of tokens used in the output.
  • llm.usage.total_tokens – total number of tokens used in the call.
  • gen_ai.usage.input_cost, gen_ai.usage.output_cost, gen_ai.usage.cost – can all be reported explicitly. However, Laminar calculates the cost of the major providers using the values of gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.response.model. For the list of supported providers and models, see this section.

All of these values can be set in Python and JavaScript/TypeScript using static methods on Laminar class.

Example

Use observe to create a span and Laminar.setSpanAttributes to set its attributes.

import { Laminar, LaminarAttributes } from '@lmnr-ai/lmnr';

await observe(
  {
    name: 'requestHandler',
    spanType: 'LLM'
  }, 
  async (message: string) => {
    const response = await fetch('https://custom-llm.com/v1/completions', {
      method: 'POST',
      body: JSON.stringify({
        model: 'custom-model-1',
        messages: [
          {
            role: 'user',
            content: message
          }
        ]
      })
    }).then(res => res.json());
    Laminar.setSpanAttributes({
      [LaminarAttributes.PROVIDER]: 'custom-llm.com',
      [LaminarAttributes.REQUEST_MODEL]: 'custom-model-1',
      [LaminarAttributes.RESPONSE_MODEL]: response.model,
      [LaminarAttributes.INPUT_TOKEN_COUNT]: response.usage.input_tokens,
      [LaminarAttributes.OUTPUT_TOKEN_COUNT]: response.usage.output_tokens,
    })
    return response;
}, userMessage);

Trace specific parts of code (Python only)

In Python, you can also use Laminar.start_as_current_span if you want to record a chunk of your code using with statement.

Example

from lmnr import Laminar
from openai import OpenAI

Laminar.initialize(project_api_key=os.environ['LMNR_PROJECT_API_KEY'])
openai_client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

def request_handler(user_message: str):
    with Laminar.start_as_current_span(
        name="handler",
        input=user_message
    ) as span:
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "user",
                    "content": user_message
                }
            ]
        )
        result = response.choices[0].message.content

        # this will set the output of the current span
        Laminar.set_span_output(result)

        return result

Laminar.start_as_current_span in detail

Laminar.start_as_current_span is a context manager that creates a new span and sets it as the current span. Under the hood it uses bare OpenTelemetry start_as_current_span method, but it also sets some Laminar-specific attributes.

Parameters

  • name (str): name of the span.
  • input (Any): input to the span. It will be serialized to JSON and recorded as span input.
  • span_type (Literal['DEFAULT'] | Literal['LLM']): type of the span. If not specified, it will be 'DEFAULT'.

Returns

  • Span: the span object. It is a bare OpenTelemetry span, so you can use it to set attributes. We still recommend using Laminar.set_span_output and Laminar.set_span_attributes to set attributes, unless you want to set some custom attributes.

Examples

# [Recommended] `with` statement
with Laminar.start_as_current_span(name="handler", input=user_message):
    # your code here
    Laminar.set_span_attributes({"some_attribute": "some_value"})
    Laminar.set_span_output(result) # this sets the output of the current span

# [Recommended, advanced] `with` statement with custom attributes
with Laminar.start_as_current_span(name="handler", input=user_message) as span:
    # your code here
    span.set_attribute("some_attribute", "some_value")
    Laminar.set_span_output(result) # this sets the output of the current span

Continuing a trace

When you manually instrument your code, sometimes you may want to continue an existing trace. For example, a trace may start in one API route, and you may want to continue it in a different API route.

It is helpful to pass the span object between functions, so that you can continue the same trace.

Example

We will use two main methods here – Laminar.startSpan() and Laminar.withSpan(). Laminar.withSpan() is a thin convenience wrapper around opentelemetry.context.with.

The type of the span is Span. You can import it from @lmnr-ai/lmnr or directly from @opentelemetry/api.

import { Laminar, observe, Span } from '@lmnr-ai/lmnr';

const foo = (span: Span) => {
  Laminar.withSpan(span, async() => {
    // your code here, e.g. an LLM call
  }, false);
};

const bar = async (span: Span) => {
  await Laminar.withSpan(span, async() => {
    await observe({ name: 'bar' }, async () => {
      // your code here
    });
  })
};

const outer = Laminar.startSpan('outer');
foo(outer);
await bar(outer);
// Don't forget to end the span!
outer.end();

Ending the span is required. If you don’t end the span, the trace will not be recorded. You can also end the span by passing true as the third argument (endOnExit) to the last Laminar.withSpan.

As a result, you will get a nested trace with outer as the top level span, where any spans created with use_span/Laminar.withSpan will be children of outer.

In this example, outer is the top level span, foo called OpenAI, which got auto-instrumented, and bar is a custom span that we created manually.

Using LaminarSpanContext

LaminarSpanContext is a Laminar’s representation of an OpenTelemetry span context. You can pass it as a string between services, in the cases when you cannot pass the span object directly.

An example of such a case is a server that handles different parts of the same client session in different request handlers, but has access to a shared file. In this case, you can save the span context in the file and pass it to the next request handler.

Laminar provides several utilities to serialize and deserialize the span context.

Example

In TypeScript, LaminarSpanContext is simply a typed record, so you can pass its stringified JSON representation. You can use Laminar.serializeLaminarSpanContext to create and serialize the span context.

serializeLaminarSpanContext optionally takes a span object and returns a string. If no span is provided, it will use the current active span.

import { Laminar } from '@lmnr-ai/lmnr';

let _spanContext: string | null = null;

const firstHandler = async () => {
    const span = Laminar.startSpan({
        name: 'firstHandler',
    });
    _spanContext = Laminar.serializeLaminarSpanContext(span);
    // your code here
    span.end();
}

const secondHandler = async () => {
    const span = Laminar.startSpan({
        name: 'secondHandler',
        parentSpanContext: _spanContext ?? undefined,
    });
    // your code here
    span.end();
}

This will result in a nested trace with firstHandler as the top level span, and secondHandler as a child span.

Provider and model names

Here is the list of provider names that Laminar uses when calculating costs. We use the names defaulted in OpenLLMetry and OpenLLMetry JS packages. If you are not using the auto-instrumentations, set the gen_ai.system attribute manually to one of the following values.

For the model names, we use the names in the provider’s API.

ProviderNamesExample model nameLink to other model names
Anthropicanthropicclaude-3-5-sonnet, claude-3-5-sonnet-20241022, or claude-3-5-sonnet-20241022-v2:0docs.anthropic.com
Azure OpenAIazure-openaigpt-4o-mini or gpt-4o-mini-2024-07-18learn.microsoft.com
Bedrock Anthropicbedrock-anthropicclaude-3-5-sonnet or claude-3-5-sonnet-20241022-v2:0docs.aws.amazon.com
Geminigemini, google-genaimodels/gemini-1.5-proai.google.dev
Groqgroqllama-3.1-70b-versatileconsole.groq.com
Mistralmistralmistral-large-2407docs.mistral.ai
OpenAIopenaigpt-4o or gpt-4o-2024-11-20platform.openai.com

For the full list of models and prices, and to suggest edits, see the initialization data in our github.