Creating a dataset

Creating a dataset

✅ Enterprise: AWS, Azure, GCP, and On-Premises

✅ Marketplace: AWS, Azure, and GCP

✅ SaaS: AWS, Azure, and GCP

✅ Single Node Installation (SNI)


This section details how to create a dataset in Kyvos by selecting a data connection, defining the input type (such as HDFS, table, or SQL), and preparing the data for use in semantic models and analysis.

You can create a dataset for use with Kyvos. When you create a dataset, choose the connection from which the data needs to be fetched. As you apply settings to sort and filter the columns, you can preview the results to make sure you get the results you are seeking.

The total number of columns is shown above the list of their names. If the column count has changed, for example, if any columns are hidden, the column total includes additional information and a link. For example, "Total 29 columns out of which 5 columns are hidden." When you click the link, the information dialog informs you that column details have been updated and lists the number of columns added and deleted, and recommends that you review the updated file. Click Dismiss to close the dialog.

Points to know: 

  • From Kyvos 2026.3 onwards, a search option is now available when selecting a connection during dataset creation. The search box is displayed only when the number of connections exceeds seven.

  • From Kyvos 2026.2 onwards, Kyvos Lakehouse now allows direct reading of Parquet and Apache Iceberg data stored in S3 bucket. The system supports storage in Amazon S3, which eliminates the necessity for external catalogs or SQL engines.

  • Kyvos supports global parameters for datasets, making it easier to manage parameter values (like dbname and tablename) during environment migrations (e.g., from UAT to Production). You can set these parameters in the connection settings, and Kyvos will automatically use the correct values based on the target environment. This reduces manual work, improves deployment flexibility, and simplifies environment management. When users define parameters as connection properties, the parameter names must be prefixed with kyvos. param..

    Example:

    Suppose the query is:
    SELECT * FROM <dbname>.EMPLOYEE

    • On the dataset, you can define a parameter named dbname and assign the required value.

    • On the connection, users can add a property named kyvos.param.dbname and set its corresponding value.

    If Kyvos resolves the parameter value as devdb, the final query becomes:
    SELECT * FROM devdb.EMPLOYEE

  • You can preview the entire dataset, replacing the previous limitation of a partial dataset preview.
    NOTE: Opting to preview the full dataset may result in higher-than-expected costs. Additionally, the execution time required to generate the dataset preview could increase, depending on the size of the data.

  • You can view a column marked as "Modified" when changes are made to it in a dataset. Additionally, you can view the details of those changes. This feature helps users quickly identify modified columns and review their specifics.

  • You can mark a file as a Fact to use it as a fact table in relationships. And you can hide columns not required for analysis. 

  • Use the Actions menu (...) to validate the dataset, share it, add a note, or show related entities.

  • If your instance of Kyvos has been configured via the portal.properties file to support it, you can register a file with a Presto connection. You can format columns and preview the result.

  • To register a file with a Presto connection, see Creating a file with a Presto connection. You can also create register files by  Writing SQL queries using data in Hive

  • To learn more about the effects of some of the settings you can use while registering the file, see Logic for creating relationships and semantic model.

CSV Support for Kyvos Compute

Starting from Kyvos 2026.5, the CSV properties available while creating a CSV dataset depend on the configured Global Compute Type (Kyvos Compute or External Compute).

Based on the selected compute type, Kyvos displays only the CSV properties that are applicable to that environment.

Option 1: External Compute

When Global Compute Type is set to External Compute:

  • Both Spark-supported and Datastore-supported CSV properties are displayed. 

  • Each property indicates its supported execution environment:

    • Spark-supported  

    • Datastore-supported  

    • Supported by both Spark and Datastore 

The following properties are available when using External Compute:

Property 

Supported Values 

Default Value 

Encoding 

ASCII (External), ISO-8859-1 (External), UTF-8 (Both), UTF-16 (Kyvos), latin-1 (Kyvos) 

ascii 

Compression 

None (Both), LZO (External), auto (Kyvos), gzip (Kyvos), zstd (Kyvos) 

None 

Line Separator 

\r\n (Both), \n (Both) 

\r\n 

Escape 

Any character (Both) 

Enclosed By 

Any character (Both) 

Field Separator 

Any character (Both) 

Skip Top Lines 

Numeric value (Both) 

Contains header 

True/false (Both) 

true 

Ignore Empty Rows 

true/false (External only) 

true 

Key as Column 

true/false (External only) 

false 

File Name as Column 

True/false (Both) 

false 

Option 2: Kyvos Compute

When Global Compute Type is set to Kyvos Compute:

  • Only Datastore-supported CSV properties are displayed. 

The following properties are available:

Property 

Supported Values 

Default Value 

Encoding 

utf-8, utf-16, latin-1 

utf-8 

Compression 

auto, none, gzip, zstd 

auto 

Line Separator 

\n, \r\n 

\n 

Escape 

Double Quote (") 

Double Quote (") 

Enclosed By 

Double Quote (") 

Double Quote (") 

Field Separator 

Any character 

Contains header 

true/false 

true 

Skip Top Lines 

Numeric value 

File Name as Column 

 

true/flase 

false 

Copyright Kyvos, Inc. 2026. All rights reserved.