Online evaluations is the concept of running some code or LLM as a judge
on the results of your LLM calls in real-time. This is useful to gather
statistics about the performance of your LLMs.Laminar achieves that by analyzing the inputs and outputs of your LLM spans.
Online evaluations allow you to gather statistics about the performance of your LLMs
in production. We like to think of online evaluations as almost like canaries or
metrics in traditional applications, but for unstructured data.This approach allows you to constantly monitor your apps without having to collect data
and run evaluations post-factum.
Span path – this is an identifier of your LLM function. It is constructed from
the location of the call within your code, and must ideally be unique. For more
information about what a span is, see the tracing documentation.
Span label – a label attached to a span. It can be a boolean or a categorical label.
The label can be both set manually or by an evaluator.
Evaluator – the function evaluating the input and output of your LLM call. It
can be a code snippet or a prompt for LLM as a judge.
You can register one evaluator per label class. For example, you could have a
code evaluator check for the presence of a certain keyword in the output of your LLM app and assigns label class keyword.
In that case, you would not be able to add another evaluator for the keyword label class.Every span path can have multiple span labels, but only one evaluator per label.