This is really annoying -- it is very hard to thread the loops with the index
recursion on buffer offset in the red-black case. Must think of a good threading
solution here.
preparing to pass integrator, smearing, bc's as policy classes to hmc.
Propose to unify "integrator" and integrator algorithm in a base/derived
way to override step. Want to read through ForceGradient to ensure
that abstraction covers the force gradient case.
to a little figure out what Guido had done & why -- but there is a neat saving of force
evaluations across the nesting time boundary making use of linearity of the leapP in dt.
I cleaned up the printing, reduced the volume of code, in the process sharing printing
between all integrators. Placed an assert that the total integration time for all integrators
must match at end of trajectory.
Have now verified e-dH = 1 for nested integrators in Wilson/Wilson runs with both
Omelyan and with Leapfrog so substantial confidence gained.
Number of IO MPI tasks can be varied by selecting which
dimensions use parallel IO and which dimensions use Serial send to boss
I/O.
Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes
doing the I/O.
Interpolates nicely between ALL nodes write their data, a single boss per time-plane
in processor space [old UKQCD fortran code did this], and a single node doing all I/O.
Not sure I have the transfer sizes big enough and am not overly convinced fstream
is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero.
Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations
on my MacOS + OpenMPI and Clang environment.
It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from
each node in order to gather bigger chunks at the syscall level.
That would push us up to the circa 8x 18*4*8 == 4KB size write chunk, and by taking, say, x/y non
parallel we get to 16MB contiguous chunks written in multi 4KB transactions
per IOnode in 64^3 lattices for configuration I/O.
I suspect this is fine for system performance.
derived relationship. Have Text/Binary/Xml versions of
Reader & Writer.
Any new Reader/Writer class inheriting the interface can give object serialisation
to any desired format now.
new file: lib/serialisation/BaseIO.h
modified: lib/serialisation/BinaryIO.h
modified: lib/serialisation/Serialisation.h
modified: lib/serialisation/TextIO.h
modified: lib/serialisation/XmlIO.h
The test uses the Xml, Binary and Text formats as well as cout << Object.
Test_serialisation has an example of *code* *free* object serialisation
to both ostream and to XML using macro magic.
Implementing TextReader/TextWriter, YAML, JSON etc.. should be trivial
and we can use configure time options to select the default "Reader" typedef.
Present done with
"using XMLPolicy::Reader"
to pick up the default serialisation strategy.
Azusa is working hard on the rectangle term and we'll hopefully start reproducing plaquettes
from RBC-UKQCD parameters soon !
My new laptop is pretty warm and is starting to groan ;)