Laminar Manual Evaluation
The Laminar Manual Evaluation SDK and API provide you with granular control over the evaluation process, allowing you to integrate Laminar directly into your existing evaluation pipeline or create flexible and complex evaluation workflows.
Manual vs. SDK Evaluation
Use manual evaluation when you need granular control over the evaluation lifecycle, custom tracing, or want to integrate evaluations with complex workflows.
The manual evaluation approach gives you fine-grained control over:
- Step-by-step execution tracking
- Flexible evaluation logic and scoring
- Integration with existing systems and workflows
How Manual Evaluation Works
Manual evaluation follows a structured workflow with three core components:
- Create Evaluation - Initialize a new evaluation
- Execute and Evaluate - Run your logic and evaluate it
- Run your core logic
- Run your evaluation logic
- Save and Update - Store datapoints and evaluation results
Quickstart
Let’s walk through implementing manual evaluation with tracing, breaking down each component:
Step 1: Setup and Initialization
First, initialize Laminar and create your evaluation clients:
Step 2: Create Your Executor and Evaluation logic
- Our executor function makes a call to OpenAI and is wrapped with tracing:
- Then our evaluator function
accuracy
is going to measure accuracy of executor.
Step 3: Create Evaluation and Datapoints
First, you need to create an evaluation session, then create datapoints with test data and update them with execution results and scores.
Create Evaluation
Before creating datapoints, you must initialize an evaluation session:
Create/Update Datapoints
The most important aspect is connecting your evaluation datapoints to the current execution trace using trace_id
/ traceId
.
This makes it possible to attach the created spans to a trace, which can then be inspected later in trace view.
You need to call Laminar.getTraceId()
/ Laminar.get_trace_id()
inside of context of span to get id of trace.
We strongly advise to call createDatapoint
/ create_datapoint
before calling executor/evaluator so trace can be associated as evaluation trace type.
Complete Example
Evaluation Results
When you run the following example of manual evaluation, you’ll see detailed tracing and evaluation results in your Laminar dashboard:
API Reference
For detailed API specifications including request/response schemas, visit: