Combine datasets 

General overview

Create a new dataset by combining two different datasets. The tool is useful for adding new records and columns to an existing dataset, e.g., every month, when new sensor readings are available. 

Select the primary dataset and the secondary dataset you want to combine. The order of datasets is important! The output dataset will contain the full copy of the primary dataset data (no loss) plus column values or additional rows from the secondary dataset. As a result of this tool, a new output dataset is created in the selected database.

Combine dataset does not fill gaps in primary dataset, it only adds new columns or new time stamps into primary dataset from secondary dataset.

 

 

 

#

Component

Description

1

Primary dataset

Select a dataset in current project.

2

Secondary dataset

Select a dataset in current project.

3

Clip secondary dataset

Optionally, the secondary dataset can be restricted by a period, so you have more control over which records will be added. 

4

Name of combined dataset

Output dataset name. The suggested name avoids duplicate names in the selected database or in current project.

5

Database

Select database to store the combined dataset. By default, it is the database where the primary dataset is stored.

6

Combine records

Option to select which rows in the secondary dataset should be filtered out

  • Skip rows (time stamps) in the secondary dataset which do not exist in the primary dataset

  • Include all rows (time stamps) in the secondary dataset and combine them with the primary dataset

7

Combine columns 

Option to select which columns in the secondary dataset should be filtered out

  • Skip columns (parameters) in the secondary dataset which do not exist in the primary dataset

  • Include all columns (parameters) in the secondary dataset and combine them with the primary dataset

8

Combine columns with the same name

Option to handle columns with the same name

  • Columns from the secondary dataset will be renamed and added as a new column

  • Columns can be merged into one column

9

Note

In case the value for one particular time stamp exists in both primary and secondary datasets, value from the primary dataset is taken and the value from the secondary dataset is ignored. 

10

Combine

Combines the two datasets

Combining measured values

There are multiple options how to combine datasets:

  • skip columns or rows in the secondary dataset that do not exist in the primary dataset, or 

  • include all columns or all rows from the secondary dataset

There are two options how to handle case when a column with the same name exists in both datasets:

  • columns from the secondary dataset will be renamed and added as a new column, or

  • columns can be merged into one column

In case the value for one particular time stamp exists in both the primary and secondary dataset, value from the primary dataset is taken and the value from the secondary dataset is ignored. 

Optionally, the secondary dataset can be restricted by a period, so you have more control over which records will be added. 

Visualizations of different combined dataset options are below:

image-20240730-093526.png

Combining flags

Flag columns are combined with the same logic as measured columns (parameters).  Flag column names are taken from the primary dataset. In case there are flag columns with identical names, some columns will be automatically renamed.

In case there are flag columns that are not assigned to any parameter, these columns will be part of the combined dataset. Such flag columns will not be merged, when there are columns with identical names, they are always renamed.

Combining maintenance log

Maintenance logs are combined as well. Based on parametrization maintenance log from the secondary dataset can be clipped to a shorter date range.

Combining QC status

QC statuses are combined too. QC status is removed in case both primary and secondary datasets do not have particular QC status. It means that one of the datasets didn't have quality control done, and therefore also combined dataset is considered that it didn't have quality control done.

When is combining datasets not allowed

In the following cases, it is not allowed to combine datasets: 

  • Database of the primary, secondary, or combined dataset is not available

  • One of the datasets is empty

In case the option to merge columns is selected, the following scenarios are not allowed: 

  • Merging columns of the same name but different parameter type

  • Merging GTI columns of the same name but different GTI configuration

  • Merging columns of the same name but values in different units

Warning while combining datasets

Users are informed about the following cases before datasets are combined. It is up to the user to reconsider whether to continue and combine datasets or to cancel the operation and check the dataset or metadata first.

Reason for a warning

Combined dataset 

Reason for a warning

Combined dataset 

Site location that is not identical.

Combined dataset will have location from the primary dataset

Merging GTI columns with same name, but one dataset has a missing GTI config.

Combined dataset will contain the existing GTI config on the column.

The time step is not identical.

Combined dataset will have the smaller time step from both datasets.

The instrument name is not the same.

Combined dataset will have instruments from the primary dataset.

The primary or secondary dataset contains flag column(s) not linked to any parameter.

Combined dataset contains all the flag columns which are not linked to any column

 

To see more details about which columns are affected, use the "Show Details..." option.