2. HDF5 format¶
2.1. Introduction¶
Like many scientific data formats (CGNS, MED, SILO), Amelet HDF is based upon HDF5.
HDF5 (http://www.hdfgroup.org/HDF5) is a very flexible file format, that is developed by the hdfgroup.
- According to the web page:
“HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections.”
The main features of HDF5 are :
The data model can represent very complex data objects
A portable data format
A library that runs on all platform and implements a high-level API with C, C++, Fortran and Java interfaces
Access time and storage optimization
Tools for viewing the data collection
A complete documentation and set of examples (tests) for all languages
XML would be a really good candidate to express such a data, queries can be performed by technologies like XPath. Unfortunately, Amelet HDF aims at to be scalable, portable and cross language and there is no XML solution for the Fortran world to read/write XML documents.
2.1.1. Editing tools¶
Furthermore, the hdfgroup provides tools to view or manipulate HDF5 files :
hdfview
http://www.hdfgroup.org/hdf-java-html/hdfview/ can :view a file
create new file
modify the content
modify attributes
gif2h5
- Converts a GIF file into HDF5h5import
- Imports ASCII or binary data into HDF5h5diff
- Compares two HDF5 files and reports the differencesh5repack
- Copies an HDF5 file to a new file with or without compression/chunkingh52gif
- Converts an HDF5 file into GIFh5cc, h5fc, h5c++
- Simplifies compiling an HDF5 applicationh5dump
- Enables the user to examine the contents of an HDF5 file and dump those contents to an ASCII fileh5jam/h5unjam
- Add/Remove text to/from user block at the beginning of an HDF5 file.h5ls
- Lists selected information about file objects in the specified formath5repart
- Repartitions a file or family of filesh5copy
- Copies objects to a new HDF5 fileh5mkgrp
- Makes a group in an HDF5 fileh5stat
- Displays object and metadata information for an HDF5 file
The python world offers a very good editing tools of HDF5 documents :
h5py : “The HDF5 library is a versatile, mature library designed for the storage of numerical data. The h5py package provides a simple, Pythonic interface to HDF5. A straightforward high-level interface allows the manipulation of HDF5 files, groups and datasets using established Python and NumPy metaphors.” ( http://h5py.alfven.org/ )
pytables is python module to handle HDF5 format as pyh5 does ( http://www.pytables.org/moin )
vitables (http://vitables.berlios.de) is based upon the python pytables module (http://www.pytables.org/moin), is a graphical interface to pytables
Languages for technical computing have also some HDF5 capabilities :
Matlab provides capabilities to read/write HDF5 with the functions
hdf5info
,hdf5read
andhdf5write
.
2.1.2. Data organization¶
An HDF5 file is hierarchicaly organized like a file system (there are directories and files), the main kinds of objects are :
Group. It looks like a directory in a file system. It can contain other objects.
Dataset. It represents a multi-dimension typed matrix an is contained in a group as a file is contained by a directory in a file system.
Table. It is a special dataset and represents multi-column data.
Each object is located by an absolute or relative path from the root node or from another node.
Each object can be described by attributes, an attribute is a pair key, value. The value of an attribute can be one of all HDF5 supported types : integer, real, boolean, string.
A file can then be represented by a tree structure like directories and files in a file system explorer tool. Group are directories and datasets (and tables) are files :
data.h5/
|-- dataset1[@type=a_type]
|-- table1
|-- group1
| |-- dataset2
| |-- dataset3
| |-- table2
| |-- group2
| | |-- table3
| | `-- table4
| `-- table5
|-- table2
`-- dataset3
The h5 extension is often associated to HDF5 files. Elements are localized by their absolute path from the root or by their relative path from the parent group, for instance :
/group1/group2/table3
is a valid absolute path to reach table3 in group2 in group1group2/table3
is a valid relative path to reach table3 from/group1
Therefore, two elements can have the same name if they have not the same parent,
/dataset3
and /group1/dataset3
can coexist in an HDF5 file.
In this document, attributes of HDF5 elements are represented like XML
attributes,
they are preceding by @
and they are all inside square brackets, no quotes
are used around the value.
All HDF5 examples can be opened with hdfview (version 2.4), the preceding example opened with it is presented below :
2.2. HDF5 modules¶
There are two versions of HDF5 in production :
the version 1.6, the last release is 1.6.4
the new version 1.8, the last release is 1.8.2. The main feature that comes with the version 1.8 is the Lite API : “The HDF5 Lite API consists of higher-level functions which do more operations per call than the basic HDF5 interface. The purpose is to wrap intuitive functions around certain sets of features in the existing APIs. This version of the API has two sets of functions: dataset and attribute related functions.”
The HDF5 format can be read and writen from a library that is also developed by the hdfgroup, this library can be downloaded from http://www.hdfgroup.org/HDF5/release/obtain5.html.
Note
Since Amelet HDF specification is dedicated to scientific applications, examples will be given in Fortran language and sometimes in C language.
Amelet HDF can be read almost thanks to the API Lite.
First of all, to manipulate an HDF5 file, the modules which have to be loaded are:
( see example1.f90 )
! The HDF5 API
use hdf5
! The lite API
use h5lt
2.3. Open and close a file¶
The first step is the initialization of the HDF5 library, then we can open a file :
( see example2.f90 )
! Variable declaration
character(len=*), parameter :: filename = "data.h5"
integer(hid_t) :: file_id
integer :: hdferr
! Library initialization (native type reading)
call h5open_f(hdferr)
! Generally, if hdferr is negative a problem occured
if (hdferr < 0) then
print *, "h5open_f, KO"
end if
! Open a file
call h5fopen_f(filename, H5F_ACC_RDONLY_F, file_id, hdferr, H5P_DEFAULT_F)
H5F_ACC_RDONLY_F
is an HDF5 constant indicating the file is opened in the read only modefile_id
is the file identifier returned by HDF5hdferr
is the error code returned by the function
Note
Take care at the unfamiliar hid_t
type of file_id
,
fortran type kind must be respected
Finaly, close the file:
! Close filename file
call h5fclose_f(file_id, hdferr)
As we can see, in Fortran, the last argument is always hdferr
or whatever
integer variable. This argument is the return error code of HDF5 functions.
If hdferr
is negative something went wrong.
It’s a good habit to check ``hdferr`` value., though for the sake of clarity it is last time we perform checks in the examples.
2.4. The HDF5 lite API¶
Amelet HDF is designed to be easily readable by a person. This legibility is found again at source code level. In order to aid in performing this task, HDF5 provides an API for higher-level functions which do more operations per call than the basic HDF5 interface, therefore it becomes straightforward to walk through an Amelet HDF file.
For instance, it is possible to read the number of records and the number of fields of a table with a single function :
! Table's name
character(len=*), parameter :: table_absolute_name = "/a_table"
! Number of columns (fields) in a table
integer(hsize_t) :: nfields
! Number of rows (records) in a table
integer(hsize_t) :: nrecords
! Error code
integer :: hdferr
call h5tbget_table_info_f(file_id, table_absolute_name, &
nfields, nrecords, hdferr)
Amelet HDF can be almost entirely read with the lite API, used functions are presented in the next section.
2.4.1. Query for table’s information¶
It is possible to get table’s information with the
function h5tbget_table_info_f
. The function returns :
The number of columns (fields) of a table
the number of rows (lines) of a table.
(see read-table.f90)
The signature of h5tbget_table_info_f
is :
! The parent id
integer(hid_t) :: loc_id
! Table name
character(len=*), parameter :: table_name = "/a_table"
! Number of columns (fields) in a table
integer(hsize_t) :: nfields
! Number of rows (records) in a table
integer(hsize_t) :: nrecords
! Error code
integer :: hdferr
call h5tbget_table_info_f(file_id, table_name, nfields, nrecords, hdferr)
2.4.2. Read the records of a table¶
Table’s records can be read with the function h5tbread_field_name_f
. It
takes an already allocated buffer and returns :
the buffer containing the read values of the named column
(see read-table.f90)
! The file id
integer(hid_t) :: file_id
! Table name
character(len=*), parameter :: table_name = "/a_table"
! The field's name to be read
character(len=*), parameter :: field_name = "a_field"
! the reading start row
integer(hsize_t) :: start
! Number of read rows
integer(hsize_t) :: nrecords
! The type size
integer(size_t) :: type_size
! If data are real
real, dimension(nrecords) :: data_buffer
! Error code
integer :: hdferr
call h5tbread_field_name_f(file_id, table_name, field_name, &
start, nrecords, type_size, data_buffer, hdferr)
2.4.3. Check the presence of an attribute¶
! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "/an_attribute"
! Does attribute exist ?
logical :: attribute_exists
! Error code
integer :: hdferr
call h5aexists_by_name_f(file_id, element_name, attribute_name, &
attribute_exists, hdferr, H5P_DEFAULT_F)
2.4.4. Read attribute’s information¶
The h5ltget_attribute_info_f
can read attribute information, it returns :
The dimensions of the attribute (an attribute can be an array).
The class identifier
The size of the datatype in bytes
! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "an_attribute"
! Dimensions
integer(hsize_t), dimension(:), allocatable :: dims
! Type class
integer :: type_class
! Type size in bytes
integer(size_t) :: type_size
! Error code
integer :: hdferr
call h5ltget_attribute_info_f(file_id, element_name, attribute_name, &
dims, type_class, type_size, hdferr)
2.4.5. Read a string attribute¶
! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "an_attribute"
! Attribute's value
character(len=20) :: attribute_value = ""
! Error code
integer :: hdferr
call h5ltget_attribute_string_f(file_id, element_name, attribute_name, &
attribute_value, hdferr)
2.4.6. Read an integer attribute¶
! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "an_attribute"
! Attribute's value
integer :: attribute_value
! Error code
integer :: hdferr
call h5ltget_attribute_int_f(file_id, element_name, attribute_name, &
attribute_value, hdferr)
2.4.7. Read a float attribute¶
! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "an_attribute"
! Attribute's value
real :: attribute_value
! Error code
integer :: hdferr
call h5ltget_attribute_float_f(file_id, element_name, attribute_name, &
attribute_value, hdferr)
2.4.8. Read a double attribute¶
! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Attribute's name
character(len=*), parameter :: attribute_name = "an_attribute"
! Attribute's value
double precision :: attribute_value
! Error code
integer :: hdferr
call h5ltget_attribute_double_f(file_id, element_name, attribute_name, &
attribute_value, hdferr)
2.4.9. Read a dataset’s information¶
The function h5ltget_dataset_info_f
read dataset’s information,
it returns :
The dimensions of the dataset
The class identifier
The size of the datatype in bytes
! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Dimensions
integer(hsize_t), dimension(*) :: dims
! Type class
integer :: type_class
! Type size in bytes
integer(size_t) :: type_size
! Error code
integer :: hdferr
call h5ltget_dataset_info_f(file_id, element_name, &
dims, type_class, type_size, hdferr)
2.4.10. Read a float dataset¶
Dataset’s values can be read with the function h5ltread_dataset_float_f
,
the data buffer memory must be allocated before the call.
! File id
integer(hid_t) :: file_id
! Element's name
character(len=*), parameter :: element_name = "/an_element"
! Dimensions
integer(hsize_t), dimension(*) :: dims
! Dateset values
real, dimension(dims) :: dataset_value
! Type class
integer :: type_class
! Type size in bytes
integer(size_t) :: type_size
! Error code
integer :: hdferr
call h5ltread_dataset_float_f(file_id, element_name, &
dataset_value, dims, hdferr)
2.4.11. Inquire if a dataset exists¶
h5ltfind_dataset_f
inquires if a dataset exist. It returns 1 if the dataset
exists and returns 0 otherwise.
! file or group identifier
integer(hid_t), intent(in) :: loc_id
! name of the dataset
character(len=*), parameter :: dataset_name = "/an_element"
! error code
integer :: hdferr
result = h5ltfind_dataset_f(loc_id, dataset_name, hdferr)
2.4.12. Groups functions¶
In addition, some querry functions about groups are used.
Read the number of members in a group :
! file or group identifier
integer(hid_t) :: loc_id
! name of the group
character(len=*), parameter :: group_name = "/an_element"
! number of members in the group
integer :: nmembers
! error code
integer :: hdferr
call h5gn_members_f(loc_id, group_name, nmembers, hdferr)
Read the name of the members of a group :
! File or group identifier
integer(hid_t) :: loc_id
! Name of the group
character(len=*), parameter :: element_name = "an_element"
! Index of the member
integer :: index
! Name of the member
character(len=*), parameter :: member_name = "an_attribute"
! Possible member types
! H5G_LINK_F if member is a link
! H5G_GROUP_F if member is a group
! H5G_DATASET_F if member is a dataset
! H5G_TYPE_F if member is a type
integer :: member_type
! Error code
integer :: hdferr
call h5gget_obj_info_idx_f(file_id, group_name, index, &
member_name, member_type, hdferr)
2.5. Integers and reals¶
By default in Amelet HDF, all integers are 32bit integers.
As for as the reals, Amelet HDF objects definition doesn’t require reals written on more than 32bits. So by default, all reals are 32bit floats and complex are 2x32bit complex (see The complex type).
Longer reals can be used in Vector
and DataSet
(see Vector and DataSet) to take into
account the precision of computed numerical data.
Practically, HDFView and h5dump show the data type, it is useful to check when writting data in Amelet HDF format.
2.6. The complex type¶
Natively HDF5 does not propose the complex number type. However it offers a very powerful mechanism to create our own type.
There are two ways to organize a complex number :
as an array of two elements : A(dim=2) = (r, i) is a two element array and A(0) (A(1) in Fortran) is the real part and A(1) (A(2) in Fortran) is the imaginary part.
as a dictionary with two key/value pairs with A(“r”) = r and A(“i”) = i.
Amelet HDF uses the compound approach, although it is not the simpliest formulation cause it is not accessible from the API lite, it is the strategy followed by some other tools like octave or pytables.
That is to say a complex number is always a compound datatype of two element (r, i).
2.6.1. Read a complex type¶
Even if a complex type is a compound structure, the too real or double numbers are written as if they were two consecutive elements of an array :
! File or group identifier
integer(hid_t) :: loc_id
! Name of the group
character(len=*), parameter :: element_name = "/an_element"
! The complex attribute is a 2 elements array
real, dimension(2) :: complex_attribute = (/0.0, 0.0/)
! Error code
integer :: hdferr
call h5ltget_attribute_float_f(loc_id, element_name, "complex_attribute", &
complex_attribute, hdferr)
print *, "\nComplex attribute value :", complex_attribute
complex_attribute
is defined as a two real element array. The
h5ltget_attribute_float_f
function fills in the array with the r
field
and i
filed of the compound complex attribute structure. Therefore,
complex_attribute(1)
equals r
and
complex_attribute(2)
equals i
.
2.7. Table and Dataset¶
We have seen HDF5 defines tables and datasets. A dataset is a multidimensional matrix, each cell contains data of the same nature (integer, float, ….). A table is like a spreadsheet, it has many columns which can contain different nature data.
In Amelet HDF, datasets are used by default when the data’s nature are identical even if data can be seen by column.
For example, consider the data structure (name, path), a list of (name, path) can be written with two columns :
name |
path |
---|---|
$name1 |
$path1 |
$name2 |
$path2 |
$name3 |
$path3 |
Tables are presented with column headers as HDFView does.
One would create an HDF5 two string column table but since the two columns contain string.
Warning
Amelet HDF has made the choice to use a (n x 2) dataset
$name1 |
$path1 |
$name2 |
$path2 |
$name3 |
$path3 |
Note
In fact, a table is a (n x 1) dataset with a compound datatype.