Importing Dataset from a Text File
General overview
In Analyst you can import datasets from delimited text files (e.g. CSV). The process ends up with the newly created Analyst dataset linked with the current project, i.e., accessible to the user. One of the main advantages of using Analyst file importer is that all used import instructions are automatically saved as a template for later use on similar files. Some data providers have logs about maintenance of sensors, which can be imported in Analyst too.
To start importing files go to Menu > Dataset > Import > Import files.
Reading structure of files
After selecting the data file you got from your data provider, the import procedure will try to automatically detect its structure and content. In most cases, it requires some minimal effort to describe file structure for the first time.
# | Component | Description |
1 | Load template | Import template is a JSON text file (with the ".impt" file extension) storing all the import instructions used in the import process, see also the "Create new template" button. When the import template is loaded from the disk, the import instructions will be pre-filled according to the template. Users can always modify the values. Loading templates are only available at the start of the import process. |
2 | Add files | Add one or multiple files for importing. Selected files are added into a table named "Include files for import". Supported file types are all text data files and Excel files. Collect all necessary files for your dataset. |
Remove files | Removes selected files from the "Include files for import" table. To remove all files, press CTRL+A and push the button or press the Del key. A single file can be also removed by the Del key. | |
3 | Include files | Includes selected files for importing. After collecting all data files needed for your dataset (in the "Include files for import" table), you can find certain files with a different structure that will require another run of import procedure. You will include them in the next run. |
Exclude files | Exclude selected files from the current import procedure. You can still include them in the next procedure. | |
4 | Files for import | Table with three sortable columns: file (filename), worksheet (Excel sheet name if applicable), and folder. Each row represents one data file or one Excel worksheet. Users can multi-select the data files to perform include or exclude actions, remove files and push the "Use with each selected file" button (see below). Users can also include or exclude files by clicking the checkboxes on each table row. It is mandatory to include at least one file to continue with the import. |
5 | Column separators | Automatically detected one or multiple column separators. Not applicable for importing Excel worksheets. |
6 | Skip lines from top | Users can skip lines from the top of the text file or Excel worksheet. |
Skip lines starting with | Another way of exclusion of certain lines. In some data formats, the header lines are indicated by a certain character e.g, "#". Works with text files and Excel worksheets. | |
7 | Decimal separator | Decimal numbers formatting options. In some cultures, a comma is the decimal separator. |
8 | Thousands separator | Allows importing culture-specific numeric formats. Use one or more separators according to your input data. |
9 | File encoding | Only for text files. Users can select original encoding to properly read the data, especially the header. Default is UTF-8. |
10 | View original file | Only for text files. A simple preview window is provided for a selected file. The First 500 kilobytes of data is displayed for content inspection. |
11 | Select columns for import | Data preview is displayed for any selected file (or Excel worksheet). Users can check if the reading instructions were properly applied. The First 1000 rows are displayed in the preview. The user is asked to select at least one column for import (mandatory step to continue). Usually, the user will select column(s) with time information plus data columns. IMPORTANT: current selection of columns will be applied across all files included in the import process. If you have multiple different structures in your files, perform more separate import procedures. |
Clear | Clear the selected columns in the "Select columns for import" table. | |
Select all | Selects all columns in the "Select columns for import" table. The same action can be performed by pressing CTRL+A keys. | |
12 | Record number with column names | The record where the column names are. Use zero for files without a header or if you want to not keep existing columns names. |
13 | Read files in chunks | Reading big files in chunks prevents the import process from consuming all the available RAM memory. This option has effect only with text files and it will be enabled automatically based on data size and available RAM. |
Recognizing timestamps
One the of key steps during the import process is to make sure that time stamps will be correctly imported. For that purpose, Analyst provides a specific tool.
# | Component | Description |
1 | Select columns with date and time | In the data preview table (showing 1000 top records) user is asked to select all columns having time information. This step is mandatory. Analyst will not store data without time information. |
2 | Date and time format examples | When building your actual time format (in the "Active formats" table) users can get inspired by common date and time formats. Analyst uses format notations presented here: Date and time formats. You can make selected formats active either by double-clicking the item or by pushing the button "Use selected format". |
3 | Use selected format | Make selected example time format as active. The "Results" section is immediately refreshed for the top 1000 records. |
4 | Active formats | Analyst will try to auto-detect the time format according to data. Users can modify the format by double-clicking on it. Users can add more formats from examples and later modify them. If no explicit format is included, Analyst will try to parse timestamps anyway (can be slow). It is recommended to use explicit format unless the format is ISO 8601 in all records. |
Remove selected active formats | Removes formats from the table of active formats. | |
5 | Results | This table shows values from the selected date and time columns that haven't been recognized as timestamps. Users can review the values and modify active formats accordingly. First, only results for 1000 data records are displayed. The advantage of this approach is to be able to quickly fine-tune time formats without demanding to parse the whole data. At any time the user can press the button "Get results for all records" to go over all imported records. It is mandatory to have some timestamps recognized to continue with the import process. See also additional results below (duplicates, time steps, etc.). |
Get results for all records | Recognizes timestamps in all imported records and displays complete results. Otherwise, results represent only the top 1000 records. This action is enforced when the user leaves this screen by clicking the button "Next". | |
6
| Duplicate timestamps | Explore duplicate timestamps after the timestamp recognition process. Records with redundant timestamps will be removed. Only the first occurrence is kept. Note the validity of the information, either for all data or for the top 1000 records. |
First / Last timestamp | First and last valid timestamp found in the data. | |
Detected time steps | Information about how many different time steps (time-frequency) have been found in the data. | |
Prevailing time step | Prevailing time step in the data e.g., majority of records can have 5-minute frequency, but there can still be some records with 10-minute step. | |
7 | Import period from / to | Limit the period (in dates) of the imported data if you want only the subset of all records. |
8
| Other options - Get date from | There are special cases when the date information is included only as part of file names or Excel worksheet names. These type of data files contains only time information in the records, so day, month and year information is not present in data. |
Other options - Custom date format | Applicable only when the date information is included as part of file names or Excel worksheet names. You can specify the date format here. Use the same notation as described above. | |
Treat data as TMY (one non-leap year) | Specific feature for importing typical meteorological year data (TMY). TMY providers use multiple approaches for the time representation e.g., sorted timestamps with the 'unexpected' year 1900 or 2059, unsorted timestamps where each month comes from a different year, day of year instead of date, ordinals. To make TMY datasets visually comparable in Analyst, the Analyst will align all imported TMYs with the year 1900 at the beginning of data (1st January). After the import, the user can observe also the year 1901 or 1899 on the edges of the data. The reason is the original time alignment or the subsequent time zone manipulation. | |
9 | Show table | Opens a tabular preview of the top 5000 records as they will look after the complete timestamp recognition process. |
Show plot | Opens a first plot of the imported data. From the plotted data series you can quickly spot some problems in timestamp recognition e.g., swapped day and month information in some DateTime formats. |
Date and time formats
More information can be found here: Date and Time Formats
Adjusting time reference
Analyst internally stores timestamps in the UTC time reference. This will be checked in the time reference check anyway as a part of the quality control. If your data is not in UTC and you know it has a constant offset from UTC, you can shift the data during the import process. More importantly, if the data is in a legal time zone with daylight saving time (DST) shifts, then you can select the time zone name from a combo box and the data will be converted to UTC according to the time zone rules.
# | Component | Description |
1 | Offset from UTC | Set the UTC offset in hours and minutes (negative or positive). The preview table will be refreshed providing the shifted timestamps for the user to review. |
2 | Legal time zone | Select legal time zone from the list. Some time zones can observe daylight saving time shifts. The preview table will be refreshed. |
3 | Interval representation | A timestamp typically serves as a label for a bigger interval of time. Measured values that occurred within the interval are aggregated before the data logger writes the output. The timestamp label can be put at the start, at the end, or at the center of the interval (the Analyst default). Analyst requires timestamps to be placed at the centers of intervals. The reason is the representativeness of solar geometry calculation. If you choose the "start of interval" option, your timestamps will be shifted forward by half of the interval size to reach the centers of intervals. The same is done with the "end of interval" but in the backward direction. No action is performed when you choose "center of interval". For instantaneous values, measured at the exact time, use the "center of interval" option. |
4 | Preview table | Users can preview how the timestamps will look after options on this screen are applied. |
Used imported data
Before the import is finished, you can choose how to use the new data:
Store as a new dataset, creating the metadata
Add the imported data to an existing dataset
Attach the data as a maintenance log
Creating a new dataset from the imported data
# | Component | Description |
---|---|---|
1 | Active template | Name of import template, if used. |
2 | Decide how to use the imported data | In this stage of the import process, the imported data are ready to be interpreted and stored. Here the user will decide how to go further. The options are:
|
3 | Name | Dataset name. This property is required. |
Database | Where the new dataset will be stored. This property is required. | |
Add existing database | Handy option to link an existing database file from your computer. | |
Create new database | Handy option to create a new empty database file. | |
Type | Types of Analyst datasets: ground measurements (the default), modeled time series, TMYs | |
Latitude and Longitude | Required properties. Identification of the site location. Enter the value in decimal degrees, optimally with at least 6 decimal places. | |
Altitude | Required property. The altitude of the site location in meters above sea level. | |
Description | Descriptive information about the new dataset. | |
4 | Columns selector | List of column names (with a parameter type). It is required to include at least one dataset column. You may not want all imported columns to be stored in the new dataset. Column order can be changed (by drag & drop) as you may want to have the most important columns first. |
5 | Column - Original name | When building metadata for columns, it is convenient to still have access to the original name of the column. |
Column - Name | Users can rename the column. By default, the original name is used, as it comes from the data file header. Analyst will try to auto-detect column properties based on typical names, e.g., Ghi, Temperature, Temp, etc. | |
Column - Parameter type | Analyst uses the concept of fixed-parameter types for columns. Choose from the pre-defined parameter types. | |
Column - Unit | Choose a unit for the selected parameter type. Analyst brings a pre-defined set of units. | |
Column - Flag column | Sometimes quality flag columns are included in data files. This option allows making a binding between the data column and its flag column. Flag columns are later used in quality control procedures. | |
Column - Test group | Users can assign test group for given column. | |
Column - Instrument | Analyst brings a pre-defined set of measuring devices. | |
Column - Height | Height of the measuring instrument above the ground in meters. | |
6 | Other options - Aggregate to | Before the imported data is stored as a dataset, the original data can be reduced (or densified - less common) to the desired time frequency. |
Other options - Required number of original records | By default, 50% occurrence of original records is required when building the new aggregated interval. For example, 5 pieces of 1-minute records are required to build one 10-minute aggregated interval. | |
Other options - Edit rules | Users can review or modify aggregation rules controlling the aggregation process. | |
Other options - Create empty records where there are not enough original records | To have a harmonious time step in your aggregated dataset enable this option. | |
Other options - Following values will be replaced by an empty value | Some data providers use specific "filling" values for missing data e.g., -9999. Such extreme values make it difficult to analyze and plot data series. The convenient Analyst approach is to convert them to 'nan', which is an ignored empty value. Users can enter a single numeric value or multiple values separated by whitespace. | |
7 | GTI configuration | In the GTI configuration, users can configure GTI parameters such as azimuth, tilt, rotation limits, axis tilt, relative spacing, etc. Currently we support:
If necessary, users can edit this configuration in the Metadata editor. |
8 | Detect parameter type | Parameter type detection should help easily identify unknown radiation types that are in the data set. Currently supported parameter types are (GHI, DNI, DIF, RHI and GTI fixed) |
Close | Close the window. | |
Create new template | Import instructions can be stored as a template JSON file to facilitate repetitive imports of similar data files. | |
Begin again | Return to the first screen of the import process (reading data files). | |
Create new dataset | Will store the new dataset (metadata and data) into the selected database. |
Adding imported data into the existing dataset
This case is much simpler than creating a new dataset. Very important is to make sure that added data belongs to the selected dataset (most importantly the same location and the meaning of matching columns). This screen can be used also for updating data in the existing dataset. This behaviour is controlled by the radio button named "In case of overlapping timestamps". For "Other options" please see the above description as this section is shared.
# | Component | Description |
1 | Select existing dataset | Dataset for data addition must be included in the Analyst project to be accessible on this screen. The overview of dataset metadata is provided in the section below. |
2 | Match columns | Mapping between existing column names and imported column names. You do not need to add values for all columns. |
3 | In case of overlapping timestamps - keep original record | Keeps existing records, so do not allow data updates. |
In case of overlapping timestamps - replace by imported record | Preference is for imported data. By this approach, users can update part of existing data or even the whole data in the existing dataset. | |
4 | Other options | Shows available options. |
5 | Show plot | Preview of the data how it will look after the addition of imported records. Imported records are visually emphasized. |
Attach imported data as maintenance log
A maintenance log is a collection of maintenance events performed on an instrument or the whole station. The maintenance log is optionally included with any dataset. Such data comes as separate data files or sensor maintenance event is included with every data record (e.g., as cleaning column), most often it is empty. Analyst can handle both situations. The latter case is imported in two steps. First, the new dataset is created and stored (just omitting the cleaning column). Then the user switches the mode by clicking the "attach as maintenance log" radio button and continues with the importing of cleaning events. In the Analyst maintenance events are stored separately from data records (see dataset metadata editor).
# | Component | Description |
1 | Select existing dataset | Dataset for maintenance log attachment must be included in the Analyst project to be accessible on this screen. The overview of dataset metadata is provided in the section below. |
2 | Attach events to existing column | Maintenance events can be performed on a specific instrument (i.e., one or a couple of columns) or the whole station (all columns, pre-selected option) |
Filter events by | Extract only relevant data from imported data records. Users can filter by specific values of a column. | |
3 | In case of overlapping timestamps - keep original record | Keeps existing records, so do not allow data updates. |
In case of overlapping timestamps - replace by imported record | Preference is for imported data. By this approach, users can update part of existing data or even the whole data in the existing dataset. | |
4 | Show plot | Shows plot of the maintenance log events placed on top of data series. New additions are visually emphasized. |
Attach maintenance log | Saves the imported maintenance events into an existing log. |