1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-09 23:45:36 +00:00

Documentation update (briefly) covering serialisation changes. For review

This commit is contained in:
Michael Marshall 2021-06-01 15:49:37 +01:00
parent 2b1fcd78c3
commit 393727b93b
3 changed files with 87 additions and 1 deletions

1
.gitignore vendored
View File

@ -88,6 +88,7 @@ Thumbs.db
# build directory #
###################
build*/*
Documentation/_build
# IDE related files #
#####################

Binary file not shown.

View File

@ -1787,7 +1787,7 @@ Hdf5Writer Hdf5Reader HDF5
Write interfaces, similar to the XML facilities in QDP++ are presented. However,
the serialisation routines are automatically generated by the macro, and a virtual
reader adn writer interface enables writing to any of a number of formats.
reader and writer interface enables writing to any of a number of formats.
**Example**::
@ -1814,6 +1814,91 @@ reader adn writer interface enables writing to any of a number of formats.
}
Eigen tensor support -- added 2019H1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The Serialisation library was expanded in 2019 to support de/serialisation of
Eigen tensors. De/serialisation of existing types was not changed. Data files
without Eigen tensors remain compatible with earlier versions of Grid and other readers.
Conversely, data files containing serialised Eigen tensors is a breaking change.
Eigen tensor serialisation support was added to BaseIO, which was modified to provide a Traits class
to recognise Eigen tensors with elements that are either: primitive scalars (arithmetic and complex types);
or Grid tensors.
**Traits determining de/serialisable scalars**::
// Is this an Eigen tensor
template<typename T> struct is_tensor : std::integral_constant<bool,
std::is_base_of<Eigen::TensorBase<T, Eigen::ReadOnlyAccessors>, T>::value> {};
// Is this an Eigen tensor of a supported scalar
template<typename T, typename V = void> struct is_tensor_of_scalar : public std::false_type {};
template<typename T> struct is_tensor_of_scalar<T, typename std::enable_if<is_tensor<T>::value && is_scalar<typename T::Scalar>::value>::type> : public std::true_type {};
// Is this an Eigen tensor of a supported container
template<typename T, typename V = void> struct is_tensor_of_container : public std::false_type {};
template<typename T> struct is_tensor_of_container<T, typename std::enable_if<is_tensor<T>::value && isGridTensor<typename T::Scalar>::value>::type> : public std::true_type {};
Eigen tensors are regular, multidimensional objects, and each Reader/Writer
was extended to support this new datatype. Where the Eigen tensor contains
a Grid tensor, the dimensions of the data written are the dimensions of the
Eigen tensor plus the dimensions of the underlying Grid scalar. Dimensions
of size 1 are preserved.
**New Reader/Writer methods for multi-dimensional data**::
template <typename U>
void readMultiDim(const std::string &s, std::vector<U> &buf, std::vector<size_t> &dim);
template <typename U>
void writeMultiDim(const std::string &s, const std::vector<size_t> & Dimensions, const U * pDataRowMajor, size_t NumElements);
On readback, the Eigen tensor rank must match the data being read, but the tensor
dimensions will be resized if necessary. Resizing is not possible for Eigen::TensorMap<T>
because these tensors use a buffer provided at construction, and this buffer cannot be changed.
Deserialisation failures cause Grid to assert.
HDF5 Optimisations -- added June 2021
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Grid serialisation is intended to be light, deterministic and provide a layer of abstraction over
multiple file formats. HDF5 excels at handling multi-dimensional data, and the Grid HDF5Reader/HDF5Writer exploits this.
When serialising nested ``std::vector<T>``, where ``T`` is an arithmetic or complex type,
the Hdf5Writer writes the data as an Hdf5 DataSet object.
However, nested ``std::vector<std::vector<...T>>`` might be "ragged", i.e. not necessarily regular. E.g. a 3d nested
``std::vector`` might contain 2 rows, the first being a 2x2 block and the second row being a 1 x 2 block.
A bug existed whereby this was not checked on write, so nested, ragged vectors
were written as a regular dataset, with a buffer under/overrun and jumbled contents.
Clearly this was not used in production, as the bug went undetected until now. Fixing this bug
is an opportunity to further optimise the HDF5 file format.
The goals of this change are to:
* Make changes to the Hdf5 file format only -- i.e. do not impact other file formats
* Implement file format changes in such a way that they are transparent to the Grid reader
* Correct the bug for ragged vectors of numeric / complex types
* Extend the support of nested std::vector<T> to arbitrarily nested Grid tensors
The trait class ``element`` has been redefined to ``is_flattenable``, which is a trait class for
potentially "flattenable" objects. These are (possibly nested) ``std::vector<T>`` where ``T`` is
an arithmetic, complex or Grid tensor type. Flattenable objects are tested on write
(with the function ``isRegularShape``) to see whether they actually are regular.
Flattenable, regular objects are written to a multidimensional HDF5 DataSet.
Otherwise, an Hdf5 sub group is created with the object "name", and each element of the outer dimension is
recursively written to as object "name_n", where n is a 0-indexed number.
On readback (by Grid)), the presence of a subgroup containing the attribute ``Grid_vector_size`` triggers a
"ragged read", otherwise a read from a DataSet is attempted.
Data parallel field IO
-----------------------