Creating a dataset

✅ Enterprise: AWS, Azure, GCP, and On-Premises

✅ Marketplace: AWS, Azure, and GCP

✅ SaaS: AWS, Azure, and GCP

✅ Single Node Installation (SNI)

This section details how to create a dataset in Kyvos by selecting a data connection, defining the input type (such as HDFS, table, or SQL), and preparing the data for use in semantic models and analysis.

You can create a dataset for use with Kyvos. When you create a dataset, choose the connection from which the data needs to be fetched. As you apply settings to sort and filter the columns, you can preview the results to make sure you get the results you are seeking.

The total number of columns is shown above the list of their names. If the column count has changed, for example, if any columns are hidden, the column total includes additional information and a link. For example, "Total 29 columns out of which 5 columns are hidden." When you click the link, the information dialog informs you that column details have been updated and lists the number of columns added and deleted, and recommends that you review the updated file. Click Dismiss to close the dialog.

Points to know:

From Kyvos 2026.3 onwards, a search option is now available when selecting a connection during dataset creation. The search box is displayed only when the number of connections exceeds seven.
From Kyvos 2026.2 onwards, Kyvos Lakehouse now allows direct reading of Parquet and Apache Iceberg data stored in S3 bucket. The system supports storage in Amazon S3, which eliminates the necessity for external catalogs or SQL engines.
Kyvos supports global parameters for datasets, making it easier to manage parameter values (like dbname and tablename) during environment migrations (e.g., from UAT to Production). You can set these parameters in the connection settings, and Kyvos will automatically use the correct values based on the target environment. This reduces manual work, improves deployment flexibility, and simplifies environment management. When users define parameters as connection properties, the parameter names must be prefixed with kyvos. param..
Example:
Suppose the query is:
SELECT * FROM <dbname>.EMPLOYEE
- On the dataset, you can define a parameter named dbname and assign the required value.
- On the connection, users can add a property named kyvos.param.dbname and set its corresponding value.
If Kyvos resolves the parameter value as devdb, the final query becomes:
SELECT * FROM devdb.EMPLOYEE
You can preview the entire dataset, replacing the previous limitation of a partial dataset preview.
NOTE: Opting to preview the full dataset may result in higher-than-expected costs. Additionally, the execution time required to generate the dataset preview could increase, depending on the size of the data.
You can view a column marked as "Modified" when changes are made to it in a dataset. Additionally, you can view the details of those changes. This feature helps users quickly identify modified columns and review their specifics.
You can mark a file as a Fact to use it as a fact table in relationships. And you can hide columns not required for analysis.
Use the Actions menu (...) to validate the dataset, share it, add a note, or show related entities.
If your instance of Kyvos has been configured via the portal.properties file to support it, you can register a file with a Presto connection. You can format columns and preview the result.
To register a file with a Presto connection, see Creating a file with a Presto connection. You can also create register files by Writing SQL queries using data in Hive.
To learn more about the effects of some of the settings you can use while registering the file, see Logic for creating relationships and semantic model.

CSV Support for Kyvos Compute

Starting from Kyvos 2026.5, the CSV properties available while creating a CSV dataset depend on the configured Global Compute Type (Kyvos Compute or External Compute).

Based on the selected compute type, Kyvos displays only the CSV properties that are applicable to that environment.

Option 1: External Compute

When Global Compute Type is set to External Compute:

Both Spark-supported and Datastore-supported CSV properties are displayed.
Each property indicates its supported execution environment:
- Spark-supported
- Datastore-supported
- Supported by both Spark and Datastore

The following properties are available when using External Compute:

Property	Supported Values	Default Value
Encoding	ASCII (External), ISO-8859-1 (External), UTF-8 (Both), UTF-16 (Kyvos), latin-1 (Kyvos)	ascii
Compression	None (Both), LZO (External), auto (Kyvos), gzip (Kyvos), zstd (Kyvos)	None
Line Separator	\r\n (Both), \n (Both)	\r\n
Escape	Any character (Both)	\
Enclosed By	Any character (Both)	"
Field Separator	Any character (Both)	,
Skip Top Lines	Numeric value (Both)	0
Contains header	True/false (Both)	true
Ignore Empty Rows	true/false (External only)	true
Key as Column	true/false (External only)	false
File Name as Column	True/false (Both)	false

Option 2: Kyvos Compute

When Global Compute Type is set to Kyvos Compute:

Only Datastore-supported CSV properties are displayed.

The following properties are available:

Property	Supported Values	Default Value
Encoding	utf-8, utf-16, latin-1	utf-8
Compression	auto, none, gzip, zstd	auto
Line Separator	\n, \r\n	\n
Escape	Double Quote (")	Double Quote (")
Enclosed By	Double Quote (")	Double Quote (")
Field Separator	Any character	,
Contains header	true/false	true
Skip Top Lines	Numeric value	0
File Name as Column	true/flase	false