Introduction
Laminar stores all queryable data in Clickhouse. Clickhouse is an analytical columnar database which provides a SQL-like query language. This guide will explain the SQL syntax used in Laminar SQL Editor. For the full Clickhouse SQL reference, see the Clickhouse documentation.Basic query structure
We only allowSELECT
queries, so we will focus on the syntax for this.
Here’s the basic syntax of a Clickhouse SELECT
query. Some parts that we don’t recommend or don’t support are omitted.
SELECT FROM WHERE GROUP BY HAVING ORDER BY LIMIT
query.
Data types
Clickhouse has numerous data types. Here are some of the most important and relevant ones.DateTime64
- represents a date and time with a precision of nanoseconds. All datetimes in Laminar are in UTC.String
- represents a string.UUID
- represents a universally unique identifier. This is used for most identifier columns.Float64
- represents a floating point number.UInt64
- represents a 64-bit unsigned integer.UInt8
- unsigned 8-bit integer. Used for enum values.
'value'::type
syntax, for example:
JSON
Laminar stores JSONs as strings in Clickhouse. See Working with JSONs for more details.Filtering by start_time
Spans inside each project are ordered bystart_time
, so adding a start_time
filter will speed up the query and prevent it from failing because of running out of memory.
This is relevant to both the spans
table and the traces
aggregation view on it.
Example
Suppose you want to have a look at all the tool call spans in a single trace. You know the trace ID.start_time
filter.
Avoiding joins
Clickhouse is a columnar database, so it’s not optimized for joins. If you need to join data, you can do it in the application layer.Example
Let’s say you want to have a look at LLM spans that took abnormally long time to complete and see the effect of this on their corresponding traces. In regular SQL, you would join thespans
table with the traces
table to get the trace duration.
traces
table for each trace_id.
Working with dates
See the full reference in Clickhouse documentation.Truncating datetimes
Clickhouse has a special syntax for truncating datetimes. The most general function for truncation istoStartOfInterval(value, interval_specifier)
.
Spans per day in the last month
date_trunc
function, notice how this is similar.
toStartOfInterval
is more flexible, because you can specify any interval, not just the ones supported by date_trunc
.
For example, you can group spans within last day to 15-minute intervals.
toStartOfSecond(value)
- truncates to the start of the secondtoStartOfMinute(value)
- truncates to the start of the minutetoStartOfTenMinutes(value)
- truncates to the start of the minutetoStartOfHour(value)
- truncates to the start of the hourtoStartOfDay(value)
- truncates to the start of the daytoStartOfWeek(value)
- truncates to the start of the weektoStartOfMonth(value)
- truncates to the start of the monthtoStartOfQuarter(value)
- truncates to the start of the monthtoStartOfYear(value)
- truncates to the start of the year
Working with JSON
Many columns, such asattributes
on spans
table, contain JSON values stored as strings.
Clickhouse provides a wide variety of functions to work with JSONs. See the full reference in Clickhouse documentation.
Generally, there are two families of functions to work with JSONs:
JSON*
functions - comprehensive set of functions that work on the JSON stringssimpleJSON*
functions - simpler subset that works much faster, and under a few assumptions.
We recommend using
simpleJSON*
functions, especially if you know the data type of the value you want to extract.Extracting values from JSON by key
Most of the time, you will want to extract a value from JSON bykey
. If you know the data type of the value, you can use the simpleJSONExtract*
functions.
For example,
Checking if a key exists
If you want to check if a key exists in the JSON, you can use thesimpleJSONHas
functions.
The query below will return the number of LLM spans that have a gen_ai.request.structured_output_schema
key in the attributes
column,
i.e. the number of LLM spans that have a structured output schema defined at request time.
Nested extraction by key
If you know that a value inside a JSON object is a stringified JSON, you can use thesimpleJSONExtract*
functions repeatedly to extract the nested value.
Suppose you pass a structured output schema to the LLM, and the schema always has a description
key.
More complex JSON functions
If you need more flexibility at the cost of query performance, you can use theJSON*
functions.
Here’s a quick list of things, you can do with JSON*
functions:
- Extract the last value with Python-like index syntax from an array, e.g.
JSONExtractRaw(input, -1)
to get the last input message - Extract a nested value from JSON by path, including array indices, e.g.
JSONExtractRaw(input, -1, 'content')
to get the last input message content - Extract keys from a JSON object, similar to
Object.keys
ordict.keys
, e.g.JSONExtractKeys(attributes)
- Extract keys and values from a JSON object, similar to
Object.entries
ordict.items
, e.g.JSONExtractKeysAndValues(attributes)
- Count the number of elements in an array, e.g.
JSONLength(input)
to get the number of input messages