Calling raw LLM APIs, reporting token usage

This is helpful if any of the following applies to you:

  • You are calling an LLM API, not using their client library/SDK.
  • You are using a library that is not auto-instrumented by OpenLLMetry.
  • You want to report token usage for a specific API call.
  • You are using an open-source/self-hosted LLM.

There are several critical attributes that need to be set on a span to ensure it appears as an LLM span in the UI.

  • lmnr.span.type – must be set to 'LLM'.
  • gen_ai.response.model – must be set to the model name returned by an LLM API (e.g. gpt-4o-mini).
  • gen_ai.system – must be set to the provider name (e.g. ‘openai’, ‘anthropic’).

In addition, the following attributes can be manually added to report token usage and costs:

  • gen_ai.usage.input_tokens – number of tokens used in the input.
  • gen_ai.usage.output_tokens – number of tokens used in the output.
  • llm.usage.total_tokens – total number of tokens used in the call.
  • gen_ai.usage.input_cost, gen_ai.usage.output_cost, gen_ai.usage.cost – can all be reported explicitly. However, Laminar calculates the cost of the major providers using the values of gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.response.model. For the list of supported providers and models, see this section.

All of these values can be set in Python and JavaScript/TypeScript using static methods on Laminar class.

Example

Use observe to create a span and Laminar.setSpanAttributes to set its attributes.

import { Laminar, LaminarAttributes } from '@lmnr-ai/lmnr';

await observe(
  {
    name: 'requestHandler',
    spanType: 'LLM'
  }, 
  async (message: string) => {
    const response = await fetch('https://custom-llm.com/v1/completions', {
      method: 'POST',
      body: JSON.stringify({
        model: 'custom-model-1',
        messages: [
          {
            role: 'user',
            content: message
          }
        ]
      })
    }).then(res => res.json());
    Laminar.setSpanAttributes({
      [LaminarAttributes.PROVIDER]: 'custom-llm.com',
      [LaminarAttributes.REQUEST_MODEL]: 'custom-model-1',
      [LaminarAttributes.RESPONSE_MODEL]: response.model,
      [LaminarAttributes.INPUT_TOKEN_COUNT]: response.usage.input_tokens,
      [LaminarAttributes.OUTPUT_TOKEN_COUNT]: response.usage.output_tokens,
    })
    return response;
}, userMessage);

Use observe to group separate LLM calls in one trace

Automatic instrumentation creates spans for LLM calls within the current trace context. That is, by default, each LLM call will create a new trace.

If you want to group several auto-instrumented calls in one trace, simply observe the top-level function that makes these calls.

Example

In this example, the request_handler makes a call to OpenAI to determine the user intent. If the intent matches the expected value, it makes another call to OpenAI (possibly with additional RAG) to generate a response.

requestHandler is observed, so all calls to OpenAI are grouped in one trace.

import { Laminar, observe } from '@lmnr-ai/lmnr';
import { OpenAI } from 'openai';

Laminar.initialize({ projectApiKey: process.env.LMNR_PROJECT_API_KEY });

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const routerPrompt = `... ${userMessage} Answer yes or no.`;

const handle = async (userMessage: string) => 
    await observe({name: 'requestHandler'}, async () => {
        const routerResponse = await openai.chat.completions.create({
            messages: [
                {role: 'user', content: routerPrompt + userMessage}
            ],
            model: 'gpt-4o-mini',
        })
        
        const userIntent = routerResponse
            .choices[0]
            .message;

        if (userIntent === 'yes') {
            // Likely some RAG here to enrich the context
            // ...
            const modelResponse = await openai.chat.completions.create({
                messages: [
                    {role: 'user', content: userMessage}
                ],
                model: 'gpt-4o',
            });

            return modelResponse.choices[0].message.content;
        }

        return "the user is not asking for help with ...";
    });

// ...

// Finally, at exit from your program
await Laminar.shutdown();

As a result, you will get a nested trace with the request_handler span as the top level span, and the OpenAI calls as child spans.

observe in detail

This is a reference on Python @observe decorator and JavaScript observe function.

JS wrapper functions’ syntax is not as clean as Python decorators, but they are functionally equivalent. TypeScript has decorators, but they are (1) experimental, (2) only available on classes and methods, so we’ve decided to provide a wrapper function syntax. This is common for OpenTelemetry, but may not be as common for LLM observability, so we provide a reference here.

General syntax

await observe({ ...options }, async (param1, param2) => {
  // your code here
}, arg1, arg2);

Parameters (ObserveOptions)

  • name (string): name of the span. If not passed, and the function to observe is not an anonymous arrow function, the function name will be used.
  • sessionId (string): session ID for the wrapped trace.
  • traceType ('DEFAULT'|'EVENT'|'EVALUATION'): Type of the trace. Unless it is within evaluation, it must be 'DEFAULT'.
  • spanType ('DEFAULT'|'LLM') - Type of the span. 'DEFAULT' is used if not specified. If the type is 'LLM', you must manually specify some attributes. This translates to lmnr.span.type attribute on the span.
  • traceId (string): [experimental] trace ID for the current trace. This is useful if you want to continue an existing trace. IMPORTANT: must be a valid UUID, i.e. has to include 8-4-4-4-12 hex digits.
  • input: a dictionary of input parameters. Is preferred over function parameters.
  • ignoreInput: if true, the input will not be recorded.
  • ignoreOutput: if true, the output will not be recorded.

Inputs and outputs

  • Function parameters and their values are serialized to JSON and recorded as span input.
  • Function return value is serialized to JSON and recorded as span output.

For example:

const result = await observe({ name: 'my_span' }, async (param1, param2) => {
    return param1 + param2;
}, 1, 2);

In this case, the span will have the following attributes:

  • Span input (lmnr.span.input) will be {"param1": 1, "param2": 2}
  • Span output (lmnr.span.output) will be 3

Notes

  • observe is an async function, so you must await it.
  • Streaming responses are taken care of, so if your function returns a readable stream, it will be observed correctly.

Additional examples

Trace specific parts of code (Python only)

In Python, you can also use Laminar.start_as_current_span if you want to record a chunk of your code using with statement.

Example

from lmnr import Laminar
from openai import OpenAI

Laminar.initialize(project_api_key=os.environ['LMNR_PROJECT_API_KEY'])
openai_client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

def request_handler(user_message: str):
    with Laminar.start_as_current_span(
        name="handler",
        input=user_message
    ) as span:
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "user",
                    "content": user_message
                }
            ]
        )
        result = response.choices[0].message.content

        # this will set the output of the current span
        Laminar.set_span_output(result)

        return result

Laminar.start_as_current_span in detail

Laminar.start_as_current_span is a context manager that creates a new span and sets it as the current span. Under the hood it uses bare OpenTelemetry start_as_current_span method, but it also sets some Laminar-specific attributes.

Parameters

  • name (str): name of the span.
  • input (Any): input to the span. It will be serialized to JSON and recorded as span input.
  • span_type (Literal['DEFAULT'] | Literal['LLM']): type of the span. If not specified, it will be 'DEFAULT'.

Returns

  • Span: the span object. It is a bare OpenTelemetry span, so you can use it to set attributes. We still recommend using Laminar.set_span_output and Laminar.set_span_attributes to set attributes, unless you want to set some custom attributes.

Examples

# [Recommended] `with` statement
with Laminar.start_as_current_span(name="handler", input=user_message):
    # your code here
    Laminar.set_span_attributes({"some_attribute": "some_value"})
    Laminar.set_span_output(result) # this sets the output of the current span

# [Recommended, advanced] `with` statement with custom attributes
with Laminar.start_as_current_span(name="handler", input=user_message) as span:
    # your code here
    span.set_attribute("some_attribute", "some_value")
    Laminar.set_span_output(result) # this sets the output of the current span

Continuing a trace

When you manually instrument your code, sometimes you may want to continue an existing trace. For example, a trace may start in one API route, and you may want to continue it in a different API route.

It is helpful to pass the span object between functions, so that you can continue the same trace.

Example

We will use two main methods here – Laminar.startSpan() and Laminar.withSpan(). Laminar.withSpan() is a thin convenience wrapper around opentelemetry.context.with.

The type of the span is Span. You can import it from @lmnr-ai/lmnr or directly from @opentelemetry/api.

import { Laminar, observe, Span } from '@lmnr-ai/lmnr';

const foo = (span: Span) => {
  Laminar.withSpan(span, async() => {
    // your code here, e.g. an LLM call
  }, false);
};

const bar = async (span: Span) => {
  await Laminar.withSpan(span, async() => {
    await observe({ name: 'bar' }, async () => {
      // your code here
    });
  })
};

const outer = Laminar.startSpan('outer');
foo(outer);
await bar(outer);
// Don't forget to end the span!
outer.end();

Ending the span is required. If you don’t end the span, the trace will not be recorded. You can also end the span by passing true as the third argument (endOnExit) to the last Laminar.withSpan.

As a result, you will get a nested trace with outer as the top level span, where any spans created with use_span/Laminar.withSpan will be children of outer.

In this example, outer is the top level span, foo called OpenAI, which got auto-instrumented, and bar is a custom span that we created manually.

Using LaminarSpanContext

LaminarSpanContext is a Laminar’s representation of an OpenTelemetry span context. You can pass it as a string between services, in the cases when you cannot pass the span object directly.

An example of such a case is a server that handles different parts of the same client session in different request handlers, but has access to a shared file. In this case, you can save the span context in the file and pass it to the next request handler.

Laminar provides several utilities to serialize and deserialize the span context.

Example

In TypeScript, LaminarSpanContext is simply a typed record, so you can pass its stringified JSON representation. You can use Laminar.serializeLaminarSpanContext to create and serialize the span context.

serializeLaminarSpanContext optionally takes a span object and returns a string. If no span is provided, it will use the current active span.

import { Laminar } from '@lmnr-ai/lmnr';

let _spanContext: string | null = null;

const firstHandler = async () => {
    const span = Laminar.startSpan({
        name: 'firstHandler',
    });
    _spanContext = Laminar.serializeLaminarSpanContext(span);
    // your code here
    span.end();
}

const secondHandler = async () => {
    const span = Laminar.startSpan({
        name: 'secondHandler',
        parentSpanContext: _spanContext ?? undefined,
    });
    // your code here
    span.end();
}

This will result in a nested trace with firstHandler as the top level span, and secondHandler as a child span.

Provider and model names

Here is the list of provider names that Laminar uses when calculating costs. We use the names defaulted in OpenLLMetry and OpenLLMetry JS packages. If you are not using the auto-instrumentations, set the gen_ai.system attribute manually to one of the following values.

For the model names, we use the names in the provider’s API.

ProviderNamesExample model nameLink to other model names
Anthropicanthropicclaude-3-5-sonnet, claude-3-5-sonnet-20241022, or claude-3-5-sonnet-20241022-v2:0docs.anthropic.com
Azure OpenAIazure-openaigpt-4o-mini or gpt-4o-mini-2024-07-18learn.microsoft.com
Bedrock Anthropicbedrock-anthropicclaude-3-5-sonnet or claude-3-5-sonnet-20241022-v2:0docs.aws.amazon.com
Geminigemini, google-genaimodels/gemini-1.5-proai.google.dev
Groqgroqllama-3.1-70b-versatileconsole.groq.com
Mistralmistralmistral-large-2407docs.mistral.ai
OpenAIopenaigpt-4o or gpt-4o-2024-11-20platform.openai.com

For the full list of models and prices, and to suggest edits, see the initialization data in our github.