Binary IO file for generic Grid array parallel I/O.

Number of IO MPI tasks can be varied by selecting which dimensions use parallel IO and which dimensions use Serial send to boss I/O. Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes doing the I/O. Interpolates nicely between ALL nodes write their data, a single boss per time-plane in processor space [old UKQCD fortran code did this], and a single node doing all I/O. Not sure I have the transfer sizes big enough and am not overly convinced fstream is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero. Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations on my MacOS + OpenMPI and Clang environment. It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from each node in order to gather bigger chunks at the syscall level. That would push us up to the circa 8x 18*4*8 == 4KB size write chunk, and by taking, say, x/y non parallel we get to 16MB contiguous chunks written in multi 4KB transactions per IOnode in 64^3 lattices for configuration I/O. I suspect this is fine for system performance.
2025-11-06 22:59:32 +00:00 · 2015-08-26 13:40:29 +01:00
parent 612957f057
commit dc814f30da
14 changed files with 840 additions and 410 deletions
--- a/tests/Test_GaugeAction.cc
+++ b/tests/Test_GaugeAction.cc
@@ -50,7 +50,7 @@ int main (int argc, char ** argv)
  NerscField header;
  
  std::string file("./ckpoint_lat.4000");
-  readNerscConfiguration(Umu,header,file);
+  NerscIO::readConfiguration(Umu,header,file);

  for(int mu=0;mu<Nd;mu++){
    U[mu] = PeekIndex<LorentzIndex>(Umu,mu);
--- a/tests/Test_cayley_ldop_cr.cc
+++ b/tests/Test_cayley_ldop_cr.cc
@@ -42,7 +42,7 @@ int main (int argc, char ** argv)

  NerscField header;
  std::string file("./ckpoint_lat.400");
-  readNerscConfiguration(Umu,header,file);
+  NerscIO::readConfiguration(Umu,header,file);

  //  SU3::ColdConfiguration(RNG4,Umu);
  //  SU3::TepidConfiguration(RNG4,Umu);
--- a/tests/Test_dwf_hdcr.cc
+++ b/tests/Test_dwf_hdcr.cc
@@ -388,7 +388,7 @@ int main (int argc, char ** argv)

  NerscField header;
  std::string file("./ckpoint_lat.4000");
-  readNerscConfiguration(Umu,header,file);
+  NerscIO::readConfiguration(Umu,header,file);

  //  SU3::ColdConfiguration(RNG4,Umu);
  //  SU3::TepidConfiguration(RNG4,Umu);
--- a/tests/Test_nersc_io.cc
+++ b/tests/Test_nersc_io.cc
@@ -28,7 +28,7 @@ int main (int argc, char ** argv)
  
  NerscField header;
  std::string file("./ckpoint_lat.4000");
-  readNerscConfiguration(Umu,header,file);
+  NerscIO::readConfiguration(Umu,header,file);

  for(int mu=0;mu<Nd;mu++){
    U[mu] = PeekIndex<LorentzIndex>(Umu,mu);
@@ -89,6 +89,13 @@ int main (int argc, char ** argv)
  TComplex TcP = sum(cPlaq);
  Complex ll= TensorRemove(TcP);
  std::cout<<GridLogMessage << "coarsened plaquettes sum to " <<ll*PlaqScale<<std::endl;
+
+  std::string clone2x3("./ckpoint_clone2x3.4000");
+  std::string clone3x3("./ckpoint_clone3x3.4000");
+  int precision32 = 0;
+
+  NerscIO::writeConfiguration(Umu,clone3x3,0,precision32);
+  NerscIO::writeConfiguration(Umu,clone2x3,1,precision32);
  
  Grid_finalize();
 }