/
Importing Dataset from a Text File

Importing Dataset from a Text File

General overview

In Analyst you can import datasets from delimited text files (e.g. CSV). The process ends up with the newly created Analyst dataset linked with the current project, i.e., accessible to the user. One of the main advantages of using Analyst file importer is that all used import instructions are automatically saved as a template for later use on similar files. Some data providers have logs about maintenance of sensors, which can be imported in Analyst too.

To start importing files go to Menu > Dataset > Import > Import files.

Reading structure of files

After selecting the data file you got from your data provider, the import procedure will try to automatically detect its structure and content. In most cases, it requires some minimal effort to describe file structure for the first time.

#

Component

Description

1

Load template

Import template is a JSON text file (with the ".impt" file extension) storing all the import instructions used in the import process, see also the "Create new template" button. When the import template is loaded from the disk, the import instructions will be pre-filled according to the template. Users can always modify the values. Loading templates are only available at the start of the import process.

2

Add files

Add one or multiple files for importing. Selected files are added into a table named "Include files for import". Supported file types are all text data files and Excel files. Collect all necessary files for your dataset.

Remove files

Removes selected files from the "Include files for import" table. To remove all files, press CTRL+A and push the button or press the Del key. A single file can be also removed by the Del key.

3

Include files

Includes selected files for importing. After collecting all data files needed for your dataset (in the "Include files for import" table), you can find certain files with a different structure that will require another run of import procedure. You will include them in the next run.

Exclude files

Exclude selected files from the current import procedure. You can still include them in the next procedure.

4

Files for import

Table with three sortable columns: file (filename), worksheet (Excel sheet name if applicable), and folder. Each row represents one data file or one Excel worksheet. Users can multi-select the data files to perform include or exclude actions, remove files and push the "Use with each selected file" button (see below). Users can also include or exclude files by clicking the checkboxes on each table row. It is mandatory to include at least one file to continue with the import.

5

Column separators

Automatically detected one or multiple column separators. Not applicable for importing Excel worksheets.

6

Skip lines from top

Users can skip lines from the top of the text file or Excel worksheet.

Skip lines starting with

Another way of exclusion of certain lines. In some data formats, the header lines are indicated by a certain character e.g, "#". Works with text files and Excel worksheets.

7

Decimal separator

Decimal numbers formatting options. In some cultures, a comma is the decimal separator.

8

Thousands separator

Allows importing culture-specific numeric formats. Use one or more separators according to your input data.

9

File encoding

Only for text files. Users can select original encoding to properly read the data, especially the header. Default is UTF-8.

10

View original file

Only for text files. A simple preview window is provided for a selected file. The First 500 kilobytes of data is displayed for content inspection. 

11

Select columns for import

Data preview is displayed for any selected file (or Excel worksheet). Users can check if the reading instructions were properly applied. The First 1000 rows are displayed in the preview. The user is asked to select at least one column for import (mandatory step to continue). Usually, the user will select column(s) with time information plus data columns. IMPORTANT: current selection of columns will be applied across all files included in the import process. If you have multiple different structures in your files, perform more separate import procedures.

Clear

Clear the selected columns in the "Select columns for import" table.

Select all

Selects all columns in the "Select columns for import" table. The same action can be performed by pressing CTRL+A keys.

12

Record number with column names

The record where the column names are. Use zero for files without a header or if you want to not keep existing columns names.

13

Read files in chunks

Reading big files in chunks prevents the import process from consuming all the available RAM memory. This option has effect only with text files and it will be enabled automatically based on data size and available RAM.

Recognizing timestamps

One the of key steps during the import process is to make sure that time stamps will be correctly imported. For that purpose, Analyst provides a specific tool.

#

Component

Description

1

Select columns with date and time

In the data preview table (showing 1000 top records) user is asked to select all columns having time information. This step is mandatory. Analyst will not store data without time information.

2

Date and time format examples

When building your actual time format (in the "Active formats" table) users can get inspired by common date and time formats. Analyst uses format notations presented here: Date and time formats. You can make selected formats active either by double-clicking the item or by pushing the button "Use selected format".

3

Use selected format

Make selected example time format as active. The "Results" section is immediately refreshed for the top 1000 records.

4

Active formats

Analyst will try to auto-detect the time format according to data. Users can modify the format by double-clicking on it. Users can add more formats from examples and later modify them. If no explicit format is included, Analyst will try to parse timestamps anyway (can be slow). It is recommended to use explicit format unless the format is ISO 8601 in all records.

Remove selected active formats

Removes formats from the table of active formats.

5

Results

This table shows values from the selected date and time columns that haven't been recognized as timestamps. Users can review the values and modify active formats accordingly. First, only results for 1000 data records are displayed. The advantage of this approach is to be able to quickly fine-tune time formats without demanding to parse the whole data. At any time the user can press the button "Get results for all records" to go over all imported records. It is mandatory to have some timestamps recognized to continue with the import process. See also additional results below (duplicates, time steps, etc.).

Get results for all records

Recognizes timestamps in all imported records and displays complete results. Otherwise, results represent only the top 1000 records. This action is enforced when the user leaves this screen by clicking the button "Next".

6

 

Duplicate timestamps

Explore duplicate timestamps after the timestamp recognition process. Records with redundant timestamps will be removed. Only the first occurrence is kept. Note the validity of the information, either for all data or for the top 1000 records.

First / Last timestamp

First and last valid timestamp found in the data.

Detected time steps

Information about how many different time steps (time-frequency) have been found in the data.

Prevailing time step

Prevailing time step in the data e.g., majority of records can have 5-minute frequency, but there can still be some records with 10-minute step.

7

Import period from / to 

Limit the period (in dates) of the imported data if you want only the subset of all records.

8

 

 

Other options - Get date from

There are special cases when the date information is included only as part of file names or Excel worksheet names. These type of data files contains only time information in the records, so day, month and year information is not present in data.

Other options - Custom date format

Applicable only when the date information is included as part of file names or Excel worksheet names. You can specify the date format here. Use the same notation as described above.

Treat data as TMY (one non-leap year)

Specific feature for importing typical meteorological year data (TMY). TMY providers use multiple approaches for the time representation e.g., sorted timestamps with the 'unexpected' year 1900 or 2059, unsorted timestamps where each month comes from a different year, day of year instead of date, ordinals. To make TMY datasets visually comparable in Analyst, the Analyst will align all imported TMYs with the year 1900 at the beginning of data (1st January).  After the import, the user can observe also the year 1901 or 1899 on the edges of the data. The reason is the original time alignment or the subsequent time zone manipulation.

9

Show table

Opens a tabular preview of the top 5000 records as they will look after the complete timestamp recognition process.

Show plot

Opens a first plot of the imported data. From the plotted data series you can quickly spot some problems in timestamp recognition e.g., swapped day and month information in some DateTime formats.

Date and time formats

More information can be found here: Date and Time Formats

Adjusting time reference

Analyst internally stores timestamps in the UTC time reference. This will be checked in the time reference check anyway as a part of the quality control. If your data is not in UTC and you know it has a constant offset from UTC, you can shift the data during the import process. More importantly, if the data is in a legal time zone with daylight saving time (DST) shifts, then you can select the time zone name from a combo box and the data will be converted to UTC according to the time zone rules.

 

#

Component

Description

1

Offset from UTC

Set the UTC offset in hours and minutes (negative or positive). The preview table will be refreshed providing the shifted timestamps for the user to review.

2

Legal time zone

Select legal time zone from the list. Some time zones can observe daylight saving time shifts. The preview table will be refreshed.

3

Interval representation

A timestamp typically serves as a label for a bigger interval of time. Measured values that occurred within the interval are aggregated before the data logger writes the output. The timestamp label can be put at the start, at the end, or at the center of the interval (the Analyst default). Analyst requires timestamps to be placed at the centers of intervals. The reason is the representativeness of solar geometry calculation. If you choose the "start of interval" option, your timestamps will be shifted forward by half of the interval size to reach the centers of intervals. The same is done with the "end of interval" but in the backward direction. No action is performed when you choose "center of interval". For instantaneous values, measured at the exact time, use the "center of interval" option. 

4

Preview table

Users can preview how the timestamps will look after options on this screen are applied.

 

Used imported data

Before the import is finished, you can choose how to use the new data:

  1. Store as a new dataset, creating the metadata

  2. Add the imported data to an existing dataset

  3. Attach the data as a maintenance log

Creating a new dataset from the imported data

 

 

#

Component

Description

1

Active template

Name of import template, if used.

2

Decide how to use the imported data

In this stage of the import process, the imported data are ready to be interpreted and stored. Here the user will decide how to go further. The options are:

  1. creating a new dataset

  2. append data into an existing dataset (e.g., the provider sent new measurements from the existing station)

  3. attach data as maintenance log (e.g., cleaning events) 

3

Name

Dataset name. This property is required.

Database

Where the new dataset will be stored. This property is required.

Add existing database

Handy option to link an existing database file from your computer.

Create new database

Handy option to create a new empty database file.

Type

Types of Analyst datasets: ground measurements (the default), modeled time series, TMYs

Latitude and Longitude

Required properties. Identification of the site location. Enter the value in decimal degrees, optimally with at least 6 decimal places.

Altitude

Required property. The altitude of the site location in meters above sea level.

Description

Descriptive i