Creating a dataset
This section details how to create a dataset in Kyvos by selecting a data connection, defining the input type (such as HDFS, table, or SQL), and preparing the data for use in semantic models and analysis.
You can create a dataset for use with Kyvos. When you create a dataset, choose the connection from which the data needs to be fetched. As you apply settings to sort and filter the columns, you can preview the results to make sure you get the results you are seeking.
The total number of columns is shown above the list of their names. If the column count has changed, for example, if any columns are hidden, the column total includes additional information and a link. For example, "Total 29 columns out of which 5 columns are hidden." When you click the link, the information dialog informs you that column details have been updated and lists the number of columns added and deleted, and recommends that you review the updated file. Click Dismiss to close the dialog.
Points to know:
From Kyvos 2026.3 onwards, a search option is now available when selecting a connection during dataset creation. The search box is displayed only when the number of connections exceeds seven.
From Kyvos 2026.2 onwards, Kyvos Lakehouse now allows direct reading of Parquet and Apache Iceberg data stored in S3 bucket. The system supports storage in Amazon S3, which eliminates the necessity for external catalogs or SQL engines.
Kyvos supports global parameters for datasets, making it easier to manage parameter values (like
dbnameandtablename)during environment migrations (e.g., from UAT to Production). You can set these parameters in the connection settings, and Kyvos will automatically use the correct values based on the target environment. This reduces manual work, improves deployment flexibility, and simplifies environment management. When users define parameters as connection properties, the parameter names must be prefixed withkyvos. param..Example:
Suppose the query is:
SELECT * FROM <dbname>.EMPLOYEEOn the dataset, you can define a parameter named
dbnameand assign the required value.On the connection, users can add a property named
kyvos.param.dbnameand set its corresponding value.
If Kyvos resolves the parameter value as
devdb, the final query becomes:SELECT * FROM devdb.EMPLOYEEYou can preview the entire dataset, replacing the previous limitation of a partial dataset preview.
NOTE: Opting to preview the full dataset may result in higher-than-expected costs. Additionally, the execution time required to generate the dataset preview could increase, depending on the size of the data.You can view a column marked as "Modified" when changes are made to it in a dataset. Additionally, you can view the details of those changes. This feature helps users quickly identify modified columns and review their specifics.
You can mark a file as a Fact to use it as a fact table in relationships. And you can hide columns not required for analysis.
Use the Actions menu (...) to validate the dataset, share it, add a note, or show related entities.
If your instance of Kyvos has been configured via the portal.properties file to support it, you can register a file with a Presto connection. You can format columns and preview the result.
To register a file with a Presto connection, see Creating a file with a Presto connection. You can also create register files by Writing SQL queries using data in Hive.
To learn more about the effects of some of the settings you can use while registering the file, see Logic for creating relationships and semantic model.
CSV Support for Kyvos Compute
Starting from Kyvos 2026.5, the CSV properties available while creating a CSV dataset depend on the configured Global Compute Type (Kyvos Compute or External Compute).
Based on the selected compute type, Kyvos displays only the CSV properties that are applicable to that environment.
Option 1: External Compute
When Global Compute Type is set to External Compute:
Both Spark-supported and Datastore-supported CSV properties are displayed.
Each property indicates its supported execution environment:
Spark-supported
Datastore-supported
Supported by both Spark and Datastore
The following properties are available when using External Compute:
Property | Supported Values | Default Value |
Encoding | ASCII (External), ISO-8859-1 (External), UTF-8 (Both), UTF-16 (Kyvos), latin-1 (Kyvos) | ascii |
Compression | None (Both), LZO (External), auto (Kyvos), gzip (Kyvos), zstd (Kyvos) | None |
Line Separator | \r\n (Both), \n (Both) | \r\n |
Escape | Any character (Both) | \ |
Enclosed By | Any character (Both) | " |
Field Separator | Any character (Both) | , |
Skip Top Lines | Numeric value (Both) | 0 |
Contains header | True/false (Both) | true |
Ignore Empty Rows | true/false (External only) | true |
Key as Column | true/false (External only) | false |
File Name as Column | True/false (Both) | false |
Option 2: Kyvos Compute
When Global Compute Type is set to Kyvos Compute:
Only Datastore-supported CSV properties are displayed.
The following properties are available:
Property | Supported Values | Default Value |
Encoding | utf-8, utf-16, latin-1 | utf-8 |
Compression | auto, none, gzip, zstd | auto |
Line Separator | \n, \r\n | \n |
Escape | Double Quote (") | Double Quote (") |
Enclosed By | Double Quote (") | Double Quote (") |
Field Separator | Any character | , |
Contains header | true/false | true |
Skip Top Lines | Numeric value | 0 |
File Name as Column
| true/flase | false |