diff --git a/.gitignore b/.gitignore index 5338acb9..40156f9d 100644 --- a/.gitignore +++ b/.gitignore @@ -88,6 +88,7 @@ Thumbs.db # build directory # ################### build*/* +Documentation/_build # IDE related files # ##################### diff --git a/documentation/Grid.pdf b/documentation/Grid.pdf index 868c6db4..df3304eb 100644 Binary files a/documentation/Grid.pdf and b/documentation/Grid.pdf differ diff --git a/documentation/manual.rst b/documentation/manual.rst index d51f07c1..e545bdaf 100644 --- a/documentation/manual.rst +++ b/documentation/manual.rst @@ -1787,7 +1787,7 @@ Hdf5Writer Hdf5Reader HDF5 Write interfaces, similar to the XML facilities in QDP++ are presented. However, the serialisation routines are automatically generated by the macro, and a virtual -reader adn writer interface enables writing to any of a number of formats. +reader and writer interface enables writing to any of a number of formats. **Example**:: @@ -1814,6 +1814,91 @@ reader adn writer interface enables writing to any of a number of formats. } +Eigen tensor support -- added 2019H1 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Serialisation library was expanded in 2019 to support de/serialisation of +Eigen tensors. De/serialisation of existing types was not changed. Data files +without Eigen tensors remain compatible with earlier versions of Grid and other readers. +Conversely, data files containing serialised Eigen tensors is a breaking change. + +Eigen tensor serialisation support was added to BaseIO, which was modified to provide a Traits class +to recognise Eigen tensors with elements that are either: primitive scalars (arithmetic and complex types); +or Grid tensors. + +**Traits determining de/serialisable scalars**:: + + // Is this an Eigen tensor + template struct is_tensor : std::integral_constant, T>::value> {}; + // Is this an Eigen tensor of a supported scalar + template struct is_tensor_of_scalar : public std::false_type {}; + template struct is_tensor_of_scalar::value && is_scalar::value>::type> : public std::true_type {}; + // Is this an Eigen tensor of a supported container + template struct is_tensor_of_container : public std::false_type {}; + template struct is_tensor_of_container::value && isGridTensor::value>::type> : public std::true_type {}; + + +Eigen tensors are regular, multidimensional objects, and each Reader/Writer +was extended to support this new datatype. Where the Eigen tensor contains +a Grid tensor, the dimensions of the data written are the dimensions of the +Eigen tensor plus the dimensions of the underlying Grid scalar. Dimensions +of size 1 are preserved. + +**New Reader/Writer methods for multi-dimensional data**:: + + template + void readMultiDim(const std::string &s, std::vector &buf, std::vector &dim); + template + void writeMultiDim(const std::string &s, const std::vector & Dimensions, const U * pDataRowMajor, size_t NumElements); + + +On readback, the Eigen tensor rank must match the data being read, but the tensor +dimensions will be resized if necessary. Resizing is not possible for Eigen::TensorMap +because these tensors use a buffer provided at construction, and this buffer cannot be changed. +Deserialisation failures cause Grid to assert. + + +HDF5 Optimisations -- added June 2021 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Grid serialisation is intended to be light, deterministic and provide a layer of abstraction over +multiple file formats. HDF5 excels at handling multi-dimensional data, and the Grid HDF5Reader/HDF5Writer exploits this. +When serialising nested ``std::vector``, where ``T`` is an arithmetic or complex type, +the Hdf5Writer writes the data as an Hdf5 DataSet object. + +However, nested ``std::vector>`` might be "ragged", i.e. not necessarily regular. E.g. a 3d nested +``std::vector`` might contain 2 rows, the first being a 2x2 block and the second row being a 1 x 2 block. +A bug existed whereby this was not checked on write, so nested, ragged vectors +were written as a regular dataset, with a buffer under/overrun and jumbled contents. + +Clearly this was not used in production, as the bug went undetected until now. Fixing this bug +is an opportunity to further optimise the HDF5 file format. + +The goals of this change are to: + +* Make changes to the Hdf5 file format only -- i.e. do not impact other file formats + +* Implement file format changes in such a way that they are transparent to the Grid reader + +* Correct the bug for ragged vectors of numeric / complex types + +* Extend the support of nested std::vector to arbitrarily nested Grid tensors + + +The trait class ``element`` has been redefined to ``is_flattenable``, which is a trait class for +potentially "flattenable" objects. These are (possibly nested) ``std::vector`` where ``T`` is +an arithmetic, complex or Grid tensor type. Flattenable objects are tested on write +(with the function ``isRegularShape``) to see whether they actually are regular. + +Flattenable, regular objects are written to a multidimensional HDF5 DataSet. +Otherwise, an Hdf5 sub group is created with the object "name", and each element of the outer dimension is +recursively written to as object "name_n", where n is a 0-indexed number. + +On readback (by Grid)), the presence of a subgroup containing the attribute ``Grid_vector_size`` triggers a +"ragged read", otherwise a read from a DataSet is attempted. + + Data parallel field IO -----------------------