1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-10 03:17:07 +01:00
Commit Graph

57 Commits

Author SHA1 Message Date
6e548a8ad5 Linux compile needed 2016-11-04 11:34:16 +00:00
f41a230b32 Decrease mpi3l verbose 2016-11-02 19:54:03 +00:00
757a928f9a Improvement to use own SHM_OPEN call to avoid openmpi bug. 2016-11-02 12:37:46 +00:00
32375aca65 Semaphore sleep/wake up on remote processes. 2016-11-02 09:27:20 +00:00
bb94ddd0eb Tidy up of mpi3; also some cleaning of the dslash controls. 2016-11-02 08:07:09 +00:00
791cb050c8 Comms improvements 2016-11-01 11:35:43 +00:00
09f66100d3 MPI 3 compile on non-linux 2016-10-25 06:01:12 +01:00
d7d92af09d Travis fail fix attempt 2016-10-25 01:45:53 +01:00
d97a27f483 Verbose 2016-10-25 01:05:31 +01:00
7c3363b91e Compiles all comms targets 2016-10-25 00:04:17 +01:00
b94478fa51 mpi, mpi3, shmem all compile.
mpi, mpi3 pass single node multi-rank
2016-10-24 23:45:31 +01:00
b6a65059a2 Update to use shared memory to contain the stencil comms buffers
Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions
2016-10-24 17:30:43 +01:00
c190221fd3 Internal SHM comms in non-simd directions working
Need to fix simd directions
2016-10-22 18:14:27 +01:00
910b8dd6a1 use simd type 2016-10-21 22:35:29 +01:00
09fd5c43a7 Reasonably fast version 2016-10-21 15:17:39 +01:00
fad96cf250 StencilBufs 2016-10-21 13:36:00 +01:00
f331809c27 Use variable type for loop 2016-10-21 13:35:37 +01:00
306160ad9a bcopy threaded 2016-10-21 12:07:28 +01:00
a762b1fb71 MPI3 working with a bounce through shared memory on my laptop.
Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node.
2016-10-21 09:03:26 +01:00
b58adc6a4b commVector 2016-10-20 17:00:15 +01:00
5fe2b85cbd MPI3 and shared memory support 2016-10-20 16:58:01 +01:00
32bc7a6ab8 MPI back out of change that hangs
AVX2 for clang, gcc needs the -mfma flag.
2016-08-05 10:36:00 +01:00
62601bb649 Bug fix 2016-07-08 20:46:29 +01:00
ef97e32152 Adding persistent communicators 2016-07-08 17:16:08 +01:00
680645f849 Merge branch 'release/v0.5.0' 2016-06-30 15:15:03 -07:00
5e02392f9c Fixed compilation error for benchmark_dwf
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
86187d7cca Removed write to stdout in constructor for MPI CartesianCommunicator 2016-06-14 15:34:20 +01:00
d6b64f47d9 Uint64 sum for IO rates 2016-03-16 02:27:22 -07:00
a359f7a9f5 Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-11 16:07:07 -08:00
b606deb3f0 Uint64 gsum 2016-03-11 16:06:54 -08:00
090e7aa930 Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
e55c35734b Fix a nocompile 2016-03-03 20:33:28 +00:00
6aeaf6f568 Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
turned up problems on the BlueWaters Cray.

Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00
a3fbabf404 Bug fix 2016-02-18 18:08:24 +00:00
41c2b09184 Shmem comms [NO MPI] target added. The dwf test runs and passes.
Not really shaken out to my satisfaction though as I want more tests done, so don't declare as working.
But committing my current while I try a few experimentals.
2016-02-14 14:24:38 -06:00
294dbf1bf0 Compile on OpenMPI shmem 2016-02-11 23:45:51 +00:00
7f927a541c Shmem related fixes for shmem compile 2016-02-11 07:37:39 -06:00
e2f73e3ead Updates for shmem 2016-02-10 16:50:32 -08:00
5c57d4f403 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-11 11:36:45 -05:00
5924e5a562 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	configure
	lib/qcd/action/Actions.h
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-06 03:44:57 -05:00
aae8bf31a7 Global edit adding copyright and license info to every source file. 2016-01-02 14:51:32 +00:00
dc814f30da Binary IO file for generic Grid array parallel I/O.
Number of IO MPI tasks can be varied by selecting which
dimensions use parallel IO and which dimensions use Serial send to boss
I/O.

Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes
doing the I/O.

Interpolates nicely between ALL nodes write their data, a single boss per time-plane
in processor space [old UKQCD fortran code did this], and a single node doing all I/O.

Not sure I have the transfer sizes big enough and am not overly convinced fstream
is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero.

Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations
on my MacOS + OpenMPI and Clang environment.

It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from
each node in order to gather bigger chunks at the syscall level.

That would push us up to the circa 8x 18*4*8 == 4KB size write chunk, and by taking, say, x/y non
parallel we get to 16MB contiguous chunks written in multi 4KB transactions
per IOnode in 64^3 lattices for configuration I/O.

I suspect this is fine for system performance.
2015-08-26 13:40:29 +01:00
neo
48bf4878c1 Experimental support for ARM 2015-06-09 15:46:21 +09:00
58a4f32298 merge to the head 2015-06-05 10:15:31 +01:00
1d0df449e8 Reorganise of file naming 2015-06-03 12:47:05 +01:00
3845f267cb Domain wall fermions now invert ; have the basis set up for
Tanh/Zolo * (Cayley/PartFrac/ContFrac) * (Mobius/Shamir/Wilson)
Approx        Representation               Kernel.

All are done with space-time taking part in checkerboarding, Ls uncheckerboarded

Have only so far tested the Domain Wall limit of mobius, and at that only checked
that it
i)  Inverts
ii) 5dim DW == Ls copies of 4dim D2
iii) MeeInv Mee == 1
iv) Meo+Mee+Moe+Moo == M unprec.
v) MpcDagMpc is hermitan
vi) Mdag is the adjoint of M between stochastic vectors.

That said, the RB schur solve, RB MpcDagMpc solve, Unprec solve
all converge and the true residual becomes small; so pretty good tests.
2015-06-02 16:57:12 +01:00
neo
74e91cd925 Partial implementation of the vector types SIMD
Implementing SSE4 now
A systematic series of tests must be written.
2015-05-19 17:21:17 +09:00
neo
baa382f055 Added check of mpfr and gmp at configure time
It generates automatically the linker flags or complains if not found.
2015-05-19 13:54:55 +09:00
neo
b4cd37276b Corrected some compilation errors (zolotarev.h) and SSE4 vsplat and conj to make cshift test pass. 2015-05-18 16:48:14 +09:00
b1d2c60d07 Moving some things around for pretty 2015-05-11 19:09:49 +01:00