Indexing
Index dataset for semantic search
Concept
Laminar datasets can be indexed for semantic search. Every datapoint is embedded and so a semantic similarity search can be performed.
Indexing a dataset
Requirements
- For every datapoint you want to index, the
data
field must be a JSON object. - All datapoints that you want to index must have the entry with a certain key.
Example
The below datapoints both have the key text
in the data
field. We can index on this key.
If we index on the text
field, we can search over the indexed dataset by semantic similarity between the query and the text
field.
For example, if we search for “turbulence”, we will get the second datapoint as the most similar one.
We could also index on the user_tag_key
field, but in that case the first datapoint would not be indexed,
because it does not have the user_tag_key
field.
Indexing
Go to the dataset page and click the “Index” button. Type in the key that you want to index on. Empty key will unindex all datapoints.
Once the dataset is indexed on a key, new datapoints will be indexed on that key automatically, if they have the key in data
field.
You can search over indexed dataset by semantic similarity between the query and the indexed value.
Searching
Semantic search over datapoint can be performed in two ways:
- In the Semantic Search node in pipelines. See Semantic Search node for more information.
- Via API. See Semantic Search API Reference for more information.