Calling raw LLM APIs, reporting token usage

This is helpful if any of the following applies to you:

  • You are calling an LLM API, not using their client library/SDK.
  • You are using a library that is not auto-instrumented by OpenLLMetry.
  • You want to report token usage for a specific API call.
  • You are using an open-source/self-hosted LLM.

There are several critical attributes that need to be set on a span to ensure it appears as an LLM span in the UI.

  • lmnr.span.type – must be set to 'LLM'.
  • gen_ai.response.model – must be set to the model name returned by an LLM API (e.g. gpt-4o-mini).
  • gen_ai.system – must be set to the provider name (e.g. ‘openai’, ‘anthropic’).

In addition, the following attributes can be manually addded to report token usage and costs:

  • gen_ai.usage.input_tokens – number of tokens used in the input.
  • gen_ai.usage.output_tokens – number of tokens used in the output.
  • llm.usage.total_tokens – total number of tokens used in the call.
  • gen_ai.usage.input_cost, gen_ai.usage.output_cost, gen_ai.usage.cost – can all be reported explicitly. However, Laminar calculates the cost of the major providers using the values of gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.response.model.

All of these values can be set in Python and JavaScript/TypeScript using static methods on Laminar class.

Example

Use with Laminar.start_as_current_span to create a span and set its attributes.

from lmnr import Laminar, Attributes

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is the longest river in the world?",
            },
        ],
    },
]
with Laminar.start_as_current_span(
    name="my_custom_llm_call",
    input=messages,
    span_type="LLM"
):
    response = requests.post(
        "https://api.custom-llm.com/v1/completions",
        json={
            "model": "custom-model-1",
            "messages": messages,
        },
    ).json()
    Laminar.set_span_output(response["choices"][0]["message"]["content"])
    Laminar.set_span_attributes(
        {
            Attributes.PROVIDER: "custom-llm.com",
            Attributes.REQUEST_MODEL: "custom-model-1",
            Attributes.RESPONSE_MODEL: response["model"],
            Attributes.INPUT_TOKEN_COUNT: response["usage"]["input_tokens"],
            Attributes.OUTPUT_TOKEN_COUNT: response["usage"]["output_tokens"],
        }
    )

Use observe to group separate LLM calls in one trace

Automatic instrumentation creates spans for LLM calls within the current trace context. That is, by default, each LLM call will create a new trace.

If you want to group several auto-instrumented calls in one trace, simply observe the top-level function that makes these calls.

Example

In this example, the request_handler makes a call to OpenAI to determine the user intent. If the intent matches the expected value, it makes another call to OpenAI (possibly with additional RAG) to generate a response.

request_handler is observed, so all calls to OpenAI inside it are grouped in one trace.

from lmnr import Laminar, observe
from openai import OpenAI
import os

Laminar.initialize(project_api_key=os.environ['LMNR_PROJECT_API_KEY'])

openai_client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

@observe()
def request_handler(user_message: str):
    router_prompt = f"... {user_message} Answer yes or no"
    user_intent = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "user",
                "content": router_prompt
            }
        ]
    )
    if user_intent.choices[0].message.content == "yes":
        # likely some RAG here to enrich the context
        # ...

        model_response = openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {
                    "role": "user",
                    "content": user_message
                }
            ]
        )
        return model_response.choices[0].message.content
    return "the user is not asking for help with ..."

As a result, you will get a nested trace with the request_handler span as the top level span, and the OpenAI calls as child spans.

observe in detail

This is a reference on Python @observe decorator and JavaScript observe function.

JS wrapper functions’ syntax is not as clean as Python decorators, but they are functionally equivalent. TypeScript has decorators, but they are (1) experimental, (2) only available on classes and methods, so we’ve decided to provide a wrapper function syntax. This is common for OpenTelemetry, but may not be as common for LLM observability, so we provide a reference here.

Parameters

  • name (str|None): name of the span. If not provided, the name of the wrapped function will be used. For example:
@observe(name="my_span")
def my_function(): # the span will be named "my_span"
    pass

@observe()
def my_function(): # the span will be named "my_function"
    pass
  • session_id (str|None): session ID for the current trace. If you know the session ID staticallly, you can pass it here.
  • user_id (str|None): user ID for the current trace. If you know the user ID staticallly, you can pass it here.

Inputs and outputs

  • Function parameters and their values are serialized to JSON and recorded as span input.
  • Function return value is serialized to JSON and recorded as span output.

For example:

@observe()
def my_function(param1, param2):
    return param1 + param2

my_function(1, 2)

In this case, the span will have the following attributes:

  • Span input (lmnr.span.input) will be {"param1": 1, "param2": 2}
  • Span output (lmnr.span.output) will be 3

Notes

  • @observe is a decorator factory, so it must always be used with parentheses: @observe().
  • This decorator can be used with both synchronous and asynchronous functions.
  • Streaming responses are taken care of, so if your function returns a generator, it will be observed correctly.

Trace specific parts of code (Python only)

In Python, you can also use Laminar.start_as_current_span if you want to record a chunk of your code using with statement.

Example

from lmnr import Laminar
from openai import OpenAI

Laminar.initialize(project_api_key=os.environ['LMNR_PROJECT_API_KEY'])
openai_client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

def request_handler(user_message: str):
    with Laminar.start_as_current_span(
        name="handler",
        input=user_message
    ) as span:
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "user",
                    "content": user_message
                }
            ]
        )
        result = response.choices[0].message.content

        # this will set the output of the current span
        Laminar.set_span_output(result)

        return result

Laminar.start_as_current_span in detail

Laminar.start_as_current_span is a context manager that creates a new span and sets it as the current span. Under the hood it uses bare OpenTelemetry start_as_current_span method, but it also sets some Laminar-specific attributes.

Parameters

  • name (str): name of the span.
  • input (Any): input to the span. It will be serialized to JSON and recorded as span input.
  • span_type (Literal['DEFAULT'] | Literal['LLM']): type of the span. If not specified, it will be 'DEFAULT'.

Returns

  • Span: the span object. It is a bare OpenTelemetry span, so you can use it to set attributes. We still recommend using Laminar.set_span_output and Laminar.set_span_attributes to set attributes, unless you want to set some custom attributes.

Examples

# [Recommended] `with` statement
with Laminar.start_as_current_span(name="handler", input=user_message):
    # your code here
    Laminar.set_span_attributes({"some_attribute": "some_value"})
    Laminar.set_span_output(result) # this sets the output of the current span

# [Recommended, advanced] `with` statement with custom attributes
with Laminar.start_as_current_span(name="handler", input=user_message) as span:
    # your code here
    span.set_attribute("some_attribute", "some_value")
    Laminar.set_span_output(result) # this sets the output of the current span

Continuing a trace

When you manually instrument your code, sometimes you may want to continue an existing trace. For example, a trace may start in one API route, and you may want to continue it in a different API route.

It is helpful to pass the span object between functions, so that you can continue the same trace.

Example

We will use two main methods here – Laminar.start_span() and use_span. You can import use_span from lmnr, but it is just a re-export of opentelemetry.trace.use_span.

The type of the span is Span. You can import it directly from opentelemetry.trace.

from lmnr import Laminar, use_span
import os

Laminar.initialize(project_api_key=os.environ['LMNR_PROJECT_API_KEY'])

def foo(span):
    with use_span(span):
        # your code here, e.g. an LLM call
        pass

def bar(span):
    with use_span(span):
        with Laminar.start_as_current_span(name="bar"):
            # your code here
            pass

outer = Laminar.start_span(name="outer")
foo(outer)
bar(outer)
# Don't forget to end the span!
outer.end()

Ending the span is required. If you don’t end the span, the trace will not be recorded. You can also end the span in the last by passing end_on_exit=True to the last use_span.

As a result, you will get a nested trace with outer as the top level span, where any spans created with use_span/Laminar.withSpan will be children of outer.

In this example, outer is the top level span, foo called OpenAI, which got auto-instrumented, and bar is a custom span that we created manually.

Setting trace ID manually

This is not completely compatible with OpenTelemetry, so use only when you have to. Please use the span object passing method above whenever possible.

If there is no way for you to pass the span object between functions, you can set the trace ID manually. In the backend, Laminar will associate all spans with the same trace ID. Laminar’s trace IDs are UUIDs, so if you want to set a trace ID manually, you must pass a valid UUID.

Example

from lmnr import Laminar
import os
import uuid

trace_id = None
Laminar.initialize(project_api_key=os.environ['LMNR_PROJECT_API_KEY'])

def foo():
    with Laminar.start_as_current_span(name="foo", trace_id=trace_id):
        # your code here
        pass

def bar():
    with Laminar.start_as_current_span(name="bar", trace_id=trace_id):
        # your code here
        pass

trace_id = uuid.uuid4()

foo()
bar()

Both spans, foo and bar, will be in the same trace, because they have the same trace ID.