Evaluation data only
Evaluate results of previous runs stored in a dataset
Key concepts
- Evaluator pipeline - the pipeline evaluating the results
- Dataset format – two fixed JSON objects:
target
anddata
, each with any keys.
Flow overview
High-level overview of the evaluation flow
What happens if
data
and target
have overlapping keys?
Values in target
have higher priority over and overwrite the values in data
. Detailed flow
For every datapoint in the dataset, evaluation does the following:
- Merge the
data
andtarget
objects. If keys clash, values intarget
are selected. - Set all the values in the resulting merged object as values to the input nodes of the evaluator pipeline. This is done by matching* the keys of the object to the input node names of the evaluator pipeline.
- Evaluator pipeline is run. All required env variables must be set for this to succeed.
- Evaluator pipeline produces a single numeric output. This is stored in the results of the evaluation.
Requirements
- Evaluator pipeline has at least one commit.
- Object obtained by merging
target
anddata
at least contains keys matching* the names of the input nodes on the evaluator pipeline. - All required environment variables for the evaluator run are set in the Env vars page.
- Evaluator pipeline must produce one output. This output must be parsable into a number (8-bit float).
* Exact case-sensitive match