You can add datapoints either from file or by manually adding datapoints one-by-one.

1. File upload

You can upload datapoints either from a structured file, which contains datapoints, or from an unstructured file of various formats.

To do that, click “Add from source” at the top of the Datasets page. Then, select in the tab whether you want to upload in a structured or unstructured way.

1.1 Structured file

Supported file formats are: .csv, .json, .jsonl. We infer the format based on the file extension.

  • csv - header is required, default separator and minimal quoting are assumed. If a row has an empty value or less values than headers, missing values will be filled with empty strings.
  • json - the file must contain one array of datapoints.
  • jsonlines - each line must contain one datapoint.

For each datapoint, we first construct the key-value object, and then parse it according to the following rules:

  1. If the only key is "data", and the corresponding value is a JSON object, place that object in datapoint’s "data".
  2. If the keys are "data" and "target", and both corresponding values are JSON objects, populate the datapoint’s "data" and "target" accordingly.
  3. In ALL other cases, all keys and values will go inside "data".

This is needed to ensure that at the top level "data" and "target" are both JSON objects.

If there is an error parsing the file, no datapoints will be added. If a single value in the file does not conform to format, it will be silently ignored.

Examples

{"data": {"key1": "val1"}}

becomes:

{"data": {"key1": "val1"}}

1.2 Unstructured file

Laminar supports converting many other files to datasets including txt, pdf, and docx. Such files will be parsed, chunked, and added to the dataset. Every chunk will have one datapoint with the following info in "data":

  • "content" – the actual chunk content
  • "page_number" – for multipage documents, the number of the page this chunk belongs to
  • "page_content" – content of the entire page
Having trouble uploading a text file with an extension, such as .mdx or .json? Rename the file to .txt locally and re-upload

2. Add manually

Click “Add datapoint” at the top of the Datasets page and new empty row will be added. Then, you can fill out the content of “data” and “target” for newly created datapoint.