1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-10 03:17:07 +01:00
Commit Graph

62 Commits

Author SHA1 Message Date
a48ee6f0f2 Don't use MPI3_leader any more. No real gain and complex 2017-02-07 01:31:24 -05:00
73547cca66 MPI3 working i think 2017-02-07 01:30:02 -05:00
123c673db7 Policy to control async or sync SendRecv 2017-02-07 01:24:54 -05:00
61f82216e2 Communicator Policy, NodeCount distinct from Rank count 2017-02-07 01:22:53 -05:00
f85b35314d Fix a routine for single node processor coor from rank 2016-11-08 11:49:13 +00:00
6e548a8ad5 Linux compile needed 2016-11-04 11:34:16 +00:00
f41a230b32 Decrease mpi3l verbose 2016-11-02 19:54:03 +00:00
757a928f9a Improvement to use own SHM_OPEN call to avoid openmpi bug. 2016-11-02 12:37:46 +00:00
32375aca65 Semaphore sleep/wake up on remote processes. 2016-11-02 09:27:20 +00:00
bb94ddd0eb Tidy up of mpi3; also some cleaning of the dslash controls. 2016-11-02 08:07:09 +00:00
791cb050c8 Comms improvements 2016-11-01 11:35:43 +00:00
09f66100d3 MPI 3 compile on non-linux 2016-10-25 06:01:12 +01:00
d7d92af09d Travis fail fix attempt 2016-10-25 01:45:53 +01:00
d97a27f483 Verbose 2016-10-25 01:05:31 +01:00
7c3363b91e Compiles all comms targets 2016-10-25 00:04:17 +01:00
b94478fa51 mpi, mpi3, shmem all compile.
mpi, mpi3 pass single node multi-rank
2016-10-24 23:45:31 +01:00
b6a65059a2 Update to use shared memory to contain the stencil comms buffers
Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions
2016-10-24 17:30:43 +01:00
c190221fd3 Internal SHM comms in non-simd directions working
Need to fix simd directions
2016-10-22 18:14:27 +01:00
910b8dd6a1 use simd type 2016-10-21 22:35:29 +01:00
09fd5c43a7 Reasonably fast version 2016-10-21 15:17:39 +01:00
fad96cf250 StencilBufs 2016-10-21 13:36:00 +01:00
f331809c27 Use variable type for loop 2016-10-21 13:35:37 +01:00
306160ad9a bcopy threaded 2016-10-21 12:07:28 +01:00
a762b1fb71 MPI3 working with a bounce through shared memory on my laptop.
Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node.
2016-10-21 09:03:26 +01:00
b58adc6a4b commVector 2016-10-20 17:00:15 +01:00
5fe2b85cbd MPI3 and shared memory support 2016-10-20 16:58:01 +01:00
32bc7a6ab8 MPI back out of change that hangs
AVX2 for clang, gcc needs the -mfma flag.
2016-08-05 10:36:00 +01:00
62601bb649 Bug fix 2016-07-08 20:46:29 +01:00
ef97e32152 Adding persistent communicators 2016-07-08 17:16:08 +01:00
680645f849 Merge branch 'release/v0.5.0' 2016-06-30 15:15:03 -07:00
5e02392f9c Fixed compilation error for benchmark_dwf
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
86187d7cca Removed write to stdout in constructor for MPI CartesianCommunicator 2016-06-14 15:34:20 +01:00
d6b64f47d9 Uint64 sum for IO rates 2016-03-16 02:27:22 -07:00
a359f7a9f5 Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-11 16:07:07 -08:00
b606deb3f0 Uint64 gsum 2016-03-11 16:06:54 -08:00
090e7aa930 Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
e55c35734b Fix a nocompile 2016-03-03 20:33:28 +00:00
6aeaf6f568 Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
turned up problems on the BlueWaters Cray.

Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00
a3fbabf404 Bug fix 2016-02-18 18:08:24 +00:00
41c2b09184 Shmem comms [NO MPI] target added. The dwf test runs and passes.
Not really shaken out to my satisfaction though as I want more tests done, so don't declare as working.
But committing my current while I try a few experimentals.
2016-02-14 14:24:38 -06:00
294dbf1bf0 Compile on OpenMPI shmem 2016-02-11 23:45:51 +00:00
7f927a541c Shmem related fixes for shmem compile 2016-02-11 07:37:39 -06:00
e2f73e3ead Updates for shmem 2016-02-10 16:50:32 -08:00
5c57d4f403 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-11 11:36:45 -05:00
5924e5a562 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	configure
	lib/qcd/action/Actions.h
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-06 03:44:57 -05:00
aae8bf31a7 Global edit adding copyright and license info to every source file. 2016-01-02 14:51:32 +00:00
dc814f30da Binary IO file for generic Grid array parallel I/O.
Number of IO MPI tasks can be varied by selecting which
dimensions use parallel IO and which dimensions use Serial send to boss
I/O.

Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes
doing the I/O.

Interpolates nicely between ALL nodes write their data, a single boss per time-plane
in processor space [old UKQCD fortran code did this], and a single node doing all I/O.

Not sure I have the transfer sizes big enough and am not overly convinced fstream
is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero.

Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations
on my MacOS + OpenMPI and Clang environment.

It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from
each node in order to gather bigger chunks at the syscall level.

That would push us up to the circa 8x 18*4*8 == 4KB size write chunk, and by taking, say, x/y non
parallel we get to 16MB contiguous chunks written in multi 4KB transactions
per IOnode in 64^3 lattices for configuration I/O.

I suspect this is fine for system performance.
2015-08-26 13:40:29 +01:00
neo
48bf4878c1 Experimental support for ARM 2015-06-09 15:46:21 +09:00
58a4f32298 merge to the head 2015-06-05 10:15:31 +01:00
1d0df449e8 Reorganise of file naming 2015-06-03 12:47:05 +01:00