Number of IO MPI tasks can be varied by selecting which
dimensions use parallel IO and which dimensions use Serial send to boss
I/O.
Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes
doing the I/O.
Interpolates nicely between ALL nodes write their data, a single boss per time-plane
in processor space [old UKQCD fortran code did this], and a single node doing all I/O.
Not sure I have the transfer sizes big enough and am not overly convinced fstream
is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero.
Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations
on my MacOS + OpenMPI and Clang environment.
It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from
each node in order to gather bigger chunks at the syscall level.
That would push us up to the circa 8x 18*4*8 == 4KB size write chunk, and by taking, say, x/y non
parallel we get to 16MB contiguous chunks written in multi 4KB transactions
per IOnode in 64^3 lattices for configuration I/O.
I suspect this is fine for system performance.
So valence sector looks ok.
FermionOperatorImpl.h provides the policy classes.
Expect HMC will introduce a smearing policy and a fermion representation change policy template
param. Will also probably need multi-precision work.
* HMC is running even-odd and non-checkerboarded (checked 4^4 wilson fermion/wilson gauge).
There appears to be a bug in the multi-level integrator -- <e-dH> passes with single level but
not with multi-level.
In any case there looks to be quite a bit to clean up.
This is the "const det" style implementation that is not appropriate yet for clover since
it assumes that Mee is indept of the gauge fields. Easily fixed in future.
So valence sector looks ok.
FermionOperatorImpl.h provides the policy classes.
Expect HMC will introduce a smearing policy and a fermion representation change policy template
param. Will also probably need multi-precision work.
* HMC is running even-odd and non-checkerboarded (checked 4^4 wilson fermion/wilson gauge).
There appears to be a bug in the multi-level integrator -- <e-dH> passes with single level but
not with multi-level.
In any case there looks to be quite a bit to clean up.
This is the "const det" style implementation that is not appropriate yet for clover since
it assumes that Mee is indept of the gauge fields. Easily fixed in future.
Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types.
Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types.
Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types.
Example of multiple levels in the WilsonFermion hmc test.
Merge remote-tracking branch 'upstream/master'
Conflicts:
lib/qcd/hmc/HMC.h
lib/qcd/hmc/integrators/Integrator.h
lib/qcd/hmc/integrators/Integrator_algorithm.h
tests/Test_simd.cc
Example of multiple levels in the WilsonFermion hmc test.
Merge remote-tracking branch 'upstream/master'
Conflicts:
lib/qcd/hmc/HMC.h
lib/qcd/hmc/integrators/Integrator.h
lib/qcd/hmc/integrators/Integrator_algorithm.h
tests/Test_simd.cc
Example of multiple levels in the WilsonFermion hmc test.
Merge remote-tracking branch 'upstream/master'
Conflicts:
lib/qcd/hmc/HMC.h
lib/qcd/hmc/integrators/Integrator.h
lib/qcd/hmc/integrators/Integrator_algorithm.h
tests/Test_simd.cc
exp(-DeltaH) = 1 now, and plaquette is sensible. Will reproduce an old Wilson Gauge
Wilson Fermion SCRI plaquette with precision in mass matching shortly.
exp(-DeltaH) = 1 now, and plaquette is sensible. Will reproduce an old Wilson Gauge
Wilson Fermion SCRI plaquette with precision in mass matching shortly.
exp(-DeltaH) = 1 now, and plaquette is sensible. Will reproduce an old Wilson Gauge
Wilson Fermion SCRI plaquette with precision in mass matching shortly.
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01
Substantial effort on timing and logging infrastructure
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01
Substantial effort on timing and logging infrastructure
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01
Substantial effort on timing and logging infrastructure