Wiki

Clone wiki

enmap-box-idl / Data Format Definition

Concept of Storing Raster Data

The EnMAP-Box stores raster data in a flat binary format with metadata in an associated text file. The structure of the text file corresponds to ENVI type headers. This way, an easy interchange of image data is possible between the EnMAP-Box and several other software applications, while the simple structure allows for easy programmatic implementation.

Distinguishing between data and metadata, the raster data is stored as a flat byte stream in a binary file, usually in a band sequential order. An associated ASCII text encoded header file contains all information necessary to read the data file. It can be expanded easily, e.g. by user defined values.

The following section gives a short introduction to this file format and how it is used by the EnMAP-Box.

Naming Convention for Raster Data Files

The header file associated with a binary data file must meet one of the following conditions:

  1. It has the file name of the binary file plus the extension “.hdr”, or
  2. It has the file name of the binary file, where the last part starting with a dot is replaced by “.hdr

In case the first condition matches, the EnMAP-Box will neglect a header file that matches on condition number two. The following table gives examples of file names according or disagreeing to this convention.

Table 1: Examples for names of raster files

File EncodingFile NamesFilename as shown in FileList
Matching file names
Binary
Text
Biomass
Biomass.hdr
Biomass
Binary
Text
Biomass.sample1
Biomass.sample1.hdr
Biomass.sample1
Binary
Text
Biomass.sample1
Biomass.hdr
Biomass.sample1
Binary
Text
Biomass.hdr
Biomass.hdr.hdr
Biomass.hdr
Binary
Binary
Text
Biomass.sample1
Biomass.sample2
Biomass.hdr
Biomass.sample1
Biomass.sample2
Binary
Text
Text
Biomass.sample1
Biomass.hdr
Biomass.sample1.hdr
Biomass.sample1
Non-matching file names
Binary
Text
Biomass.hdr
Biomass
Text
Text
Biomass
Biomass.hdr

Binary Data File

On physical data drives the binary data is stored as flat binary stream. The order of pixel value positions in this stream depends on the selected type of data interleave. Three types are supported: Band Sequential (BSQ), Band Interleaved by Line (BIL) and Band Interleaved by Pixel (BIP).

Figure 1 gives an example for a raster data set that consists of nine pixels. Each pixel has a numeric value for one of the three bands, represented by the colors red, green and blue. The shading effect highlights a pixels position within a row or the number of its column, respectively. Since non-interrupted reading and writing operations on physical data storages are usually faster than interrupted ones, the selective use of the interleave can accelerate the working progress and reduce memory requirements.

image001.gif
Figure 1: Illustration of a raster image with nine pixels and three bands.

Using the BSQ interleave pixel values are stored sequentially in order of the pixel column, the pixel row and the order of bands. This leads to a byte stream where the pixel values of one band can be read at once, as shown in Figure 2.

image002.gif
Figure 2: Byte stream of pixel values according to the used interleave.

The BIP interleave results in a byte stream where the pixel values are stored in order of the band first, followed by the order columns and lines. This allows the fastest access to the full spectral profile of a single pixel. BIL interleave is a compromise between BSQ and BIL. The pixel values are stored in order of its column, band and row (or line). This method can be advantageous for applications that required the full spectral information of a complete image line.

Header File Information

The header file is an ASCII coded text file that describes the data stored within a binary data file. It always starts with the text ENVI in the first row. The following rows contain the required and non-required meta information tags. A single tag can be structured as:

1.	<tag name> = <tag value>

or

2.	<tag name> = {<tag value1>,<tag value2>,…, <tag value n>}

Depending on the type of file and its usage in the EnMAP-Box different tags are required. Table 2 lists the tags that are necessary for all files in ENVI File Format that are supported and used by the EnMAP-Box. The tags in Table 3 are not obligatory but helpful for the “daily work” with the EnMAP-Box.

Table 2: Obligatory meta tags in an ENVI FILE Format header file

Meta tagDescription / Tag Value
ENVIFirst row of the header file
samplesNumber of samples / columns / pixels in a row
linesNumber of lines / rows / pixels in a column
bandsNumber of bands / layers / spectral dimension
data typeIDL data type
1 : byte (1 byte, from 0 to 255)
2 : integer (2 bytes, from -24 to 24-1)
3 : long nteger (4 bytes, from -28 to 28-1)
4 : float (4 bytes)
5 : double (8 bytes)
12 : unsigned integer (2 bytes, from 0 o 28)
13 : unsigned long integer (4 bytes, from 0 to 216)
14 : long integer 64bit (8 bytes, from -232 to 232-1)
15 : unsigned long integer 64bit (8 bytes, from 0 to 264)
interleaveData storage order / interleave type
bsq : band sequential
bil : band interleave by line
bip : band interleave by pixel
byte orderByte order
0 : little endian
1 : big endian
file typeENVI Standard
ENVI Classification
ENVI Spectral Library
unknown or other types are mapped to ENVI Standard implicitly


Table 3: Optional meta tags

Meta tagDescription / Tag Value
map infoGeographic coordinate information of the raster image using the format:
map info = {reference, pixel x, pixel y, pixel easting, pixel northing, x-size of pixel, y-size of pixel, projection zone, North or South, Datum, size unit}

Example:
map info = {UTM, 1, 1, 390749.250, 5820819.800, 3.5, 3.5, 33, North, WGS-84, units=Meters}
Providing this tag allows the EnMAP-Box to link the representation of different images.
data ignore valueA pixel value that is given to undefined, invalid or masked pixels.
descriptionGeneral file description
band namesDescription for each single band
default bandsDefault bands to display using the RGB color scheme
default band = {<band R>, <band G>, <band B>}
wavelengthList of wavelengths, number of elements must be equal to the number of bands (tag wavelength units must be set)
wavelength unitsPhysical unit of spectral values, usually Nanometers or Micrometers
fwhmFull width at half maximum values for each band. (tag wavelength units must be set)
data gain valuesGain values for each band
data offset valuesData offset value for each band
spectra namesNames for each spectrum in a Spectral Library

Files used by EnMAP-Box

Standard Image Files

Standard images files can contain multiple bands of categorical or continuous values. They use the meta tag file type = ENVI Standard, which is also used implicitly in case the value of file type is unknown or unspecified.


Example header for file type ENVI Standard: Hymap_Berlin-A_Image.hdr

ENVI
samples = 300
lines = 300
bands = 114
data type = 2
interleave = bsq
byte order = 0
file type = envi standard
default bands = { 26, 73, 15}
map info = { UTM, 1.000, 1.000, 390749.250, 5820819.800, 
3.5000000000e+000,3.5000000000e+000, 
33, North, WGS-84, units=Meters}
wavelength units = micrometers
fwhm = {0.015000000, 0.015000000, ... , 0.017000000}
wavelength = {0.45200000, 0.46440000, ... , 2.4546000}

Classification Files

These files store categorical data values, e.g. for classification results. They use the meta tag file type = ENVI Classification. It is implied that the raster image potentially contains pixel values from 0 to the value given by number of classes.

The value zero is reserved to represent pixels that are unclassified. For instance, a reference data set might have all pixels set to zero that are not used for the validation of a classification. When using this file type it is required to support all meta tags listed in Table 2 and Table 4.

Table 4: Meta tags additionally required for Classification Files

Meta tagDescription / Tag Value
file typefile type = ENVI Classification
classesThe number of classes including the class unclassified
class lookupSpecification of color representation. Each class is assigned to a specific RGB value
class namesA name for each class including class unclassified


Example for file type ENVI Classification: Hymap_Berlin-A_Classification-Training-Sample.hdr

ENVI
samples = 300
lines   = 300
bands   = 1
file type = ENVI Classification
data type = 1
interleave = bsq
classes = 6
class lookup = {
   0,   0,   0,   
   0, 255,   0,
 255,   0,   0,
 255, 255,   0,
   0, 255, 255,
   0,   0, 255 }
class names = {
Unclassified, vegetation, built-up, impervious, soil, water}
byte order = 0
map info = {UTM, 1.000, 1.000, 390749.250, 5820819.800, 
3.5000000000e+000, 3.5000000000e+000, 33, North, WGS-84, 
units=Meters}
band names = {Classification Band}

Regression Files

Regression files are used to store continuous numerical values (in opposite to classification files), as it is the case for many regression references and estimation results. When using these files it is required to support the data ignore value, e.g. data ignore value = -1. This allows marking unlabeled pixels and allows distinguishing between valid pixels and pixels that got masked out. Even if no pixel of an image is unspecified, the data ignore value tells other routines a value that can be used. When using this file type it is required to support all meta tags listed in Table 2 and Table 5.

Table 5: Meta tags additionally required for Regression Files

Meta tagDescription / Tag Value
file typefile type = ENVI Standard
data ignore valueThe value of masked pixels
bandsbands = 1


Example for Regression image with file type ENVI Standard Hymap_Berlin-B_Regression-GroundTruth.hdr

ENVI
samples = 300
lines   = 300
bands   = 1
file type = ENVI Standard (IN TESTDATA STILL REGRESSION)
data type = 4
interleave = bsq
byte order = 0
map info = {UTM, 1.000, 1.000, 385271.750, 5821155.750, 1.0500010500e+001, 1.0500010500e+001, 33, North, WGS-84, units=Meters}
data ignore value = -1
band names = {Regression Band}

Masks

Mask files can be used to exclude pixels from certain processes. This allows to constrain operations on regions of interest only and to reduce computational costs. Any raster file that is described by the meta tags given in Table 2 can be used as mask file. It is assumed that all pixels with a value of zero mark a position that is to be masked and neglected during a specific operation. This can be changed by setting the mask value explicitly to data ignore value = <your mask value>.

Table 6: Meta tags additionally required for Mask Files

Meta tagDescription / Tag Value
file typeAny file type, even classification and regression files
bandsbands = 1
data ignore valueExplicit value of masked pixels. If not defined a value of zero is used by default.

Spectral Libraries

Spectral Libraries are used to store spectra without a spatial context. ENVI Spectral Library files store each spectral profile in a separate image line. Therefore the number of lines is lines = <number of profiles> and the number of samples is samples = <number of wavebands>. Furthermore bands = 1 and interleave = bsq. This specification means that even when using a BSQ interleave in the header the spectra are physically stored in BIP format. The EnMAP-Box uses this by permuting the header information in the following way: interleave = bip, bands = <number of wavebands>, samples = 1, lines = <number of profiles>. By doing so, EnMAP-Box applications can handle Spectral Libraries like normal images, e.g. for parameterizing a supervised classifier.

Table 7: Meta tags additionally required for Spectral Libraries

Meta tagDescription / Tag Value
file typefile type = ENVI Spectral Library
interleaveinterleave = bsq
bandsbands = 1
linesNumber of spectral profiles
sampleNumber of wavebands


Example for file type ENVI Spectral Library: Spectral Library with five spectra, each having values within 235 channels.

ENVI
description = {Spectral Library Example}
samples = 235
lines   = 5
bands   = 1
header offset = 0
file type = ENVI Spectral Library
data type = 5
interleave = bsq
byte order = 0
reflectance scale factor = 1.000000
band names = { Spectral Library}
spectra names = {
 Spectrum1, Spectrum2, Spectrum3, 
 Spectrum4, Spectrum5}
Wavelength units = Nanometers
wavelength = {
  423.709991, 429.450012, 434.910004, 440.179993,  445.299988,    
  450.320007, . . . 
}

Updated