The Laminar Manual Evaluation SDK and API provide you with granular control over the evaluation process, allowing you to integrate Laminar directly into your existing evaluation pipeline or create flexible and complex evaluation workflows.

Manual vs. SDK Evaluation

Use manual evaluation when you need granular control over the evaluation lifecycle, custom tracing, or want to integrate evaluations with complex workflows.

The manual evaluation approach gives you fine-grained control over:

  • Step-by-step execution tracking
  • Flexible evaluation logic and scoring
  • Integration with existing systems and workflows

How Manual Evaluation Works

Manual evaluation follows a structured workflow with three core components:

  1. Create Evaluation - Initialize a new evaluation
  2. Execute and Evaluate - Run your logic and evaluate it
    • Run your core logic
    • Run your evaluation logic
  3. Save and Update - Store datapoints and evaluation results

Quickstart

Let’s walk through implementing manual evaluation with tracing, breaking down each component:

Step 1: Setup and Initialization

First, initialize Laminar and create your evaluation clients:

import { Laminar, LaminarClient, observe } from "@lmnr-ai/lmnr";
import { OpenAI } from "openai";

Laminar.initialize({
    projectApiKey: 'your_project_api_key',
    instrumentModules: {
        openAI: OpenAI  // Automatically traces OpenAI calls
    }
});

const client = new LaminarClient({
    projectApiKey: 'your_project_api_key',
});

const openai = new OpenAI({ apiKey: 'your_openai_api_key' });

Step 2: Create Your Executor and Evaluation logic

  • Our executor function makes a call to OpenAI and is wrapped with tracing:
const executeTestCase = async (testCase) => {
    return await observe(
        { name: 'executor', spanType: 'EXECUTOR', input: testCase.data },
        async () => {
            const response = await openai.chat.completions.create({
                model: 'gpt-4o-mini',
                messages: [
                    {
                        role: 'user',
                        content: `What is the capital of ${testCase.data.country}? ` +
                          'Answer only with the capital, no other text.'
                    }
                ]
            });
            return response.choices[0].message.content || '';
        }
    );
};
  • Then our evaluator function accuracy is going to measure accuracy of executor.
const accuracy = async (output, target) => {
    return await observe(
        { name: 'accuracy', spanType: 'EVALUATOR', input: { output, target } },
        async () => {
            if (!target) return 0;
            return output.includes(target) ? 1 : 0;
        }
    );
};

Step 3: Create Evaluation and Datapoints

First, you need to create an evaluation session, then create datapoints with test data and update them with execution results and scores.

Create Evaluation

Before creating datapoints, you must initialize an evaluation session:

const evalId = await client.evals.create({
    name: "Capital of Country Manual Eval", 
    groupName: "Manual API - Capital Cities" 
});

Create/Update Datapoints

The most important aspect is connecting your evaluation datapoints to the current execution trace using trace_id / traceId. This makes it possible to attach the created spans to a trace, which can then be inspected later in trace view.

You need to call Laminar.getTraceId() / Laminar.get_trace_id() inside of context of span to get id of trace.

We strongly advise to call createDatapoint / create_datapoint before calling executor/evaluator so trace can be associated as evaluation trace type.

const datapointId = await client.evals.createDatapoint({
    evalId,
    data: testCase.data,
    target: testCase.target,
    index: i,
    // Must be called within span context
    traceId: Laminar.getTraceId(),
});

await client.evals.updateDatapoint({
    evalId,
    datapointId,
    scores: { accuracy: accuracyScore },
    executorOutput: { 
        response: output, 
        model: 'gpt-4o-mini',
        country: testCase.data.country
    },
});

Complete Example

import { Laminar, LaminarClient, observe } from "@lmnr-ai/lmnr";
import { OpenAI } from "openai";

Laminar.initialize({
    projectApiKey: 'your_project_api_key',
    instrumentModules: {
    openAI: OpenAI
    }
});

const client = new LaminarClient({
    projectApiKey: 'your_project_api_key',
});
const openai = new OpenAI({ apiKey: 'your_openai_api_key' });

const executeTestCase = async (testCase) => {
    return await observe(
        { name: 'executor', spanType: 'EXECUTOR', input: testCase.data },
        async () => {
            const response = await openai.chat.completions.create({
                model: 'gpt-4o-mini',
                messages: [
                    {
                        role: 'user',
                        content: `What is the capital of ${testCase.data.country}? ` +
                          'Answer only with the capital, no other text.'
                    }
                ]
            });
            return response.choices[0].message.content || '';
        }
    );
};

const accuracy = async (output, target) => {
    return await observe(
        { name: 'accuracy', spanType: 'EVALUATOR', input: { output, target } },
        async () => {
            if (!target) return 0;
            return output.includes(target) ? 1 : 0;
        }
    );
};

async function runEvaluation() {
    try {
        const testData = [
            {
                data: { country: 'France' },
                target: 'Paris',
            },
            {
                data: { country: 'Germany' },
                target: 'Berlin',
            },
        ];

        const evalId = await client.evals.create({ 
            name: "Capital of Country Manual Eval", 
            groupName: "Manual API - Capital Cities" 
        });
        
        for (let i = 0; i < testData.length; i++) {
            await observe(
                { name: 'evaluation', spanType: 'EVALUATION', input: { testCase: testData[i] } },
                async () => {
                    const testCase = testData[i];

                    // Save datapoint first to associate trace as evaluation type
                    const datapointId = await client.evals.createDatapoint({
                        evalId,
                        data: testCase.data,
                        target: testCase.target,
                        index: i,
                        // Must be called within span context
                        traceId: Laminar.getTraceId(),
                    });

                    const output = await executeTestCase(testCase);

                    const accuracyScore = await accuracy(output, testCase.target)

                    await client.evals.updateDatapoint({
                        evalId,
                        datapointId,
                        scores: { accuracy: accuracyScore },
                        executorOutput: { 
                            response: output, 
                            model: 'gpt-4o-mini',
                            country: testCase.data.country
                        },
                    });
                }
            );
        }

        await Laminar.flush();
    } catch (error) {
        console.error("Error:", error.message);
    }
}

runEvaluation();

Evaluation Results

When you run the following example of manual evaluation, you’ll see detailed tracing and evaluation results in your Laminar dashboard:

API Reference

For detailed API specifications including request/response schemas, visit: