1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-09-20 17:25:37 +01:00
Commit Graph

46 Commits

Author SHA1 Message Date
paboyle
bd08dc4f45 Pragma use for nvcc, warning elimination. 2018-01-24 13:15:43 +00:00
paboyle
dda151250f Emacs format 2018-01-12 23:59:58 +00:00
paboyle
e9be293444 Better messaging 2017-10-26 01:59:30 +01:00
paboyle
b8654be0ef 64 bit safe offsets 2017-10-25 23:49:23 +01:00
paboyle
be66e7dd95 Merge branch 'develop' into feature/multi-communicator 2017-08-19 23:12:38 +01:00
Guido Cossu
175f393f9d Binary IO error checking 2017-08-04 12:14:10 +01:00
Guido Cossu
8bd869da37 Correcting a bug in the IO routines 2017-07-27 15:12:50 +01:00
paboyle
54e94360ad Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit 2017-06-24 23:10:24 +01:00
paboyle
8e9be9f84f Updates for SciDAC IO 2017-06-18 00:10:42 +01:00
paboyle
91199a8ea0 openmpi is not const safe 2017-06-13 12:21:29 +01:00
paboyle
3bfd1f13e6 I/O improvements 2017-06-11 23:14:10 +01:00
paboyle
092dcd4e04 MPI I/O only if MPI compiled 2017-06-02 22:50:25 +01:00
paboyle
094c3d091a Improved and RNG's now survive checkpoint 2017-06-02 00:38:58 +01:00
Peter Boyle
21421656ab Big changes improving the code to use MPI IO 2017-06-01 17:36:53 -04:00
paboyle
1e429a0d57 Added MPI version 2017-05-30 23:41:07 +01:00
paboyle
b8b5934193 Attempts to speed up the parallel IO 2017-05-25 13:32:24 +01:00
paboyle
a8c10b1933 Use a global-X x Local-Y chunksize for parallel binary I/O.
Gives O(32 x 8 x 18*8*8) chunk size on configuration I/O.

At 150KB should be getting close to packet sizes and 4MB filesystem
block sizes that are reasonably (!?) performant. We shall see once I move
this off my laptop and over to BNL and time it.
2017-05-25 11:43:33 +01:00
Guido Cossu
74f451715f Fix for Mac compilation on the size_t uint64_t types 2017-05-01 15:12:07 +01:00
Guido Cossu
8c540333d5 Merge branch 'develop' into feature/hmc_generalise 2017-04-05 14:41:04 +01:00
paboyle
75112a632a IO improvements to fail on IO error 2017-03-28 02:28:04 -04:00
Guido Cossu
a783282b8b Merge branch 'develop' into feature/hmc_generalise 2016-11-10 18:13:07 +00:00
Guido Cossu
1d666771f9 Debugging the RNG, eliminate the barrier after broadcast 2016-10-26 16:08:23 +01:00
Guido Cossu
d50055cd96 Making the ILDG support optional 2016-10-26 09:48:01 +01:00
Guido Cossu
47c7159177 ILDG reader/writer works
Fill the xml header with the required information, todo.
2016-10-24 21:57:54 +01:00
Guido Cossu
f415db583a Adding ILDG format 2016-10-24 15:48:22 +01:00
Guido Cossu
f55c16f984 Adding a barrier in the RNG save 2016-10-24 11:02:14 +01:00
Guido Cossu
df67e013ca More debug output for the RNG 2016-10-22 13:34:17 +01:00
Guido Cossu
3e990c9d0a Reverting the broadcast change 2016-10-22 13:26:43 +01:00
Guido Cossu
4b740fc8fd Debugging the RNG state save 2016-10-22 13:06:00 +01:00
997fd882ff Merge branch 'develop' into feature/feynman-rules
# Conflicts:
#	lib/Threads.h
#	lib/qcd/action/fermion/WilsonFermion.cc
#	lib/qcd/action/fermion/WilsonFermion.h
#	lib/qcd/utils/SUn.h
#	lib/simd/Grid_avx.h
#	lib/simd/Intel512common.h
2016-10-19 18:35:18 +01:00
Guido Cossu
590675e2ca Csum in hex format 2016-10-19 17:26:25 +01:00
Guido Cossu
8c65bdf6d3 Printing checksum for the RNG file 2016-10-19 16:56:11 +01:00
paboyle
a123dcd7e9 Static required for shmem. Reading same object twice requires csum reset 2016-10-12 00:29:57 +01:00
Guido Cossu
eda4dd622e Some more edit 2016-10-11 15:45:20 +01:00
Guido Cossu
11b4c80b27 Added support for hmc and binary IO for a general field 2016-10-07 13:37:29 +01:00
Guido Cossu
d9b5fbd374 In the middle of adding a general binary writer 2016-10-04 11:24:08 +01:00
Guido Cossu
f76f281e58 Cleaning files after fix 2016-09-09 11:34:25 +01:00
paboyle
d4e57f4bc6 IO Bandwidth reporting 2016-03-16 02:30:16 -07:00
Peter Boyle
6aeaf6f568 Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
turned up problems on the BlueWaters Cray.

Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00
Peter Boyle
7f927a541c Shmem related fixes for shmem compile 2016-02-11 07:37:39 -06:00
paboyle
aae8bf31a7 Global edit adding copyright and license info to every source file. 2016-01-02 14:51:32 +00:00
paboyle
31ca609d12 HMC checkpointing .
Need a general HMC framework to work in restart.
2015-12-20 02:29:51 +00:00
paboyle
5710966324 Options to use mersenne twister OR ranlux48 via --enable-rng flag at configure time.
Can save and restore RNG state via new (serial) I/O routines in a NERSC header style file.
Store a Parallel (one per site) and a single serial RNG file.
2015-12-19 18:32:25 +00:00
Peter Boyle
d35d63b171 Algorithm in 2015-11-04 04:27:44 -06:00
Peter Boyle
76d752585b Started a tidy up in the HMC sector. Now comfortable with the two level integrators;
to a little figure out what Guido had done & why -- but there is a neat saving of force
evaluations across the nesting time boundary making use of linearity of the leapP in dt.

I cleaned up the printing, reduced the volume of code, in the process sharing printing
between all integrators. Placed an assert that the total integration time for all integrators
must match at end of trajectory.

Have now verified e-dH = 1 for nested integrators in Wilson/Wilson runs with both
Omelyan and with Leapfrog so substantial confidence gained.
2015-08-29 17:18:43 +01:00
Peter Boyle
dc814f30da Binary IO file for generic Grid array parallel I/O.
Number of IO MPI tasks can be varied by selecting which
dimensions use parallel IO and which dimensions use Serial send to boss
I/O.

Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes
doing the I/O.

Interpolates nicely between ALL nodes write their data, a single boss per time-plane
in processor space [old UKQCD fortran code did this], and a single node doing all I/O.

Not sure I have the transfer sizes big enough and am not overly convinced fstream
is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero.

Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations
on my MacOS + OpenMPI and Clang environment.

It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from
each node in order to gather bigger chunks at the syscall level.

That would push us up to the circa 8x 18*4*8 == 4KB size write chunk, and by taking, say, x/y non
parallel we get to 16MB contiguous chunks written in multi 4KB transactions
per IOnode in 64^3 lattices for configuration I/O.

I suspect this is fine for system performance.
2015-08-26 13:40:29 +01:00