Description of the data set for competition

Statistical classification/clustering of data from a flame plasma electrochemical sensor

Original by Christian Hennig

Prepared for AG DANK16 by Achim Mucha (22.08.2016)

Original data

These are data for 120 particulates. For every particulate, there are eight time series of voltage values, corresponding to eight electrodes. Every time series consisted of 301 voltage values, so that for each particulate a 2408-dimensional vector was observed. Values have been transformed subtracting the first time point so that this is zero for each of the eight electrodes, because according to the chemists this is not informative.

File format

The file contains a standard data matrix with 121 rows and 2409 columns. Each element of a row is delimited by a semicolon. The first column contains the names of the particulate, namely simply 1, 2, ..., 120. The first row contains the names the variables "Name", "E1T1", "E1T2", ..., "E6T38", ..., "E8T301". The further 120 rows contain the data. Concretely, there are the 301 voltage measurements for electrode 1, then electrode 2 and so on. For example, "E6T38" means time point 38 on electrode 6. Altogether the data matrix contains 288960 (= 120 * 8 * 301) measurement values. For data control: minimum is -0.3487, maximum is 0.64583, and average is 0.004037086.

Please click here for download.