Azusa Yamaguchi
|
ee686a7d85
|
Compiles now
|
2016-11-03 16:58:23 +00:00 |
|
Azusa Yamaguchi
|
1c5b7a6be5
|
Staggered phases first cut, c1, c2, u0
|
2016-11-03 16:26:56 +00:00 |
|
Azusa Yamaguchi
|
164d3691db
|
Staggered
|
2016-11-01 14:24:22 +00:00 |
|
|
33d199a0ad
|
temporary thread safety in FFT
|
2016-10-25 12:56:40 +01:00 |
|
paboyle
|
b820076b91
|
Merge branch 'develop' into feature/mpi3
|
2016-10-25 06:02:33 +01:00 |
|
paboyle
|
09f66100d3
|
MPI 3 compile on non-linux
|
2016-10-25 06:01:12 +01:00 |
|
azusayamaguchi
|
d7d92af09d
|
Travis fail fix attempt
|
2016-10-25 01:45:53 +01:00 |
|
azusayamaguchi
|
460d0753a1
|
Merge branch 'develop' into feature/mpi3
Conflicts:
lib/simd/Grid_avx512.h
|
2016-10-25 01:08:51 +01:00 |
|
azusayamaguchi
|
8f8058f8a5
|
More random bits on parallel seeding
|
2016-10-25 01:05:52 +01:00 |
|
azusayamaguchi
|
d97a27f483
|
Verbose
|
2016-10-25 01:05:31 +01:00 |
|
azusayamaguchi
|
7c3363b91e
|
Compiles all comms targets
|
2016-10-25 00:04:17 +01:00 |
|
azusayamaguchi
|
b94478fa51
|
mpi, mpi3, shmem all compile.
mpi, mpi3 pass single node multi-rank
|
2016-10-24 23:45:31 +01:00 |
|
|
13bf0482e3
|
FFT optimisation
|
2016-10-24 19:25:40 +01:00 |
|
|
a795b5705e
|
memory optimisation
|
2016-10-24 19:25:15 +01:00 |
|
|
392e064513
|
fast local peek-poke
|
2016-10-24 19:24:21 +01:00 |
|
azusayamaguchi
|
b6a65059a2
|
Update to use shared memory to contain the stencil comms buffers
Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions
|
2016-10-24 17:30:43 +01:00 |
|
azusayamaguchi
|
ea25a4d9ac
|
Works
|
2016-10-23 06:10:05 +01:00 |
|
azusayamaguchi
|
c190221fd3
|
Internal SHM comms in non-simd directions working
Need to fix simd directions
|
2016-10-22 18:14:27 +01:00 |
|
azusayamaguchi
|
0fcd2e7188
|
Simplify the comms structure prior to implementing Shared memory direct bouncs
|
2016-10-21 22:44:10 +01:00 |
|
azusayamaguchi
|
910b8dd6a1
|
use simd type
|
2016-10-21 22:35:29 +01:00 |
|
azusayamaguchi
|
75ebd3a0d1
|
Typo fixes and rotate for CLANG
|
2016-10-21 22:34:29 +01:00 |
|
azusayamaguchi
|
09fd5c43a7
|
Reasonably fast version
|
2016-10-21 15:17:39 +01:00 |
|
azusayamaguchi
|
f22317748f
|
Merge branch 'feature/mpi3' of https://github.com/paboyle/Grid into feature/mpi3
|
2016-10-21 13:36:35 +01:00 |
|
azusayamaguchi
|
6a9eae6b6b
|
Reporting improvements
|
2016-10-21 13:36:18 +01:00 |
|
azusayamaguchi
|
fad96cf250
|
StencilBufs
|
2016-10-21 13:36:00 +01:00 |
|
azusayamaguchi
|
f331809c27
|
Use variable type for loop
|
2016-10-21 13:35:37 +01:00 |
|
paboyle
|
2c54a53d0a
|
Compile verbose reduce
|
2016-10-21 12:12:14 +01:00 |
|
paboyle
|
306160ad9a
|
bcopy threaded
|
2016-10-21 12:07:28 +01:00 |
|
azusayamaguchi
|
20a091c3ed
|
Intel vs. Clang intrinsics differences absorbed
|
2016-10-21 09:08:36 +01:00 |
|
azusayamaguchi
|
202078eb1b
|
Cray / OpenSHMEM ordering differs
|
2016-10-21 09:07:20 +01:00 |
|
paboyle
|
a762b1fb71
|
MPI3 working with a bounce through shared memory on my laptop.
Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node.
|
2016-10-21 09:03:26 +01:00 |
|
paboyle
|
5b5925b8e5
|
Forgot to add
|
2016-10-20 17:09:40 +01:00 |
|
paboyle
|
b58adc6a4b
|
commVector
|
2016-10-20 17:00:15 +01:00 |
|
paboyle
|
f9d5e95d72
|
allocator template typedefs moved to AlignedAllocator
|
2016-10-20 16:59:39 +01:00 |
|
paboyle
|
4f8e636a43
|
commVector
|
2016-10-20 16:59:16 +01:00 |
|
paboyle
|
9b39f35ae6
|
commVector different for SHMEM compat
|
2016-10-20 16:58:53 +01:00 |
|
paboyle
|
5fe2b85cbd
|
MPI3 and shared memory support
|
2016-10-20 16:58:01 +01:00 |
|
paboyle
|
c7cccaaa69
|
Comm vector for shmem
|
2016-10-20 16:57:31 +01:00 |
|
paboyle
|
cbcfea466f
|
MPI3
|
2016-10-20 16:57:14 +01:00 |
|
paboyle
|
4955672fc3
|
MPI3
|
2016-10-20 16:57:00 +01:00 |
|
paboyle
|
8c043da5b7
|
SHMEM and comms allocator made different
|
2016-10-20 16:56:05 +01:00 |
|
paboyle
|
3cbe974eb4
|
Layout
|
2016-10-20 16:55:21 +01:00 |
|
paboyle
|
7af9b87318
|
Cache face tables to improve performance.
Extract merge now looking poor.
|
2016-10-18 09:51:37 +01:00 |
|
paboyle
|
811ca45473
|
GNU clang hack for AVX512 since there are missing reduce intrinsics in Clang 3.9 and GCC-6 AVX512 support
|
2016-10-17 16:23:21 +01:00 |
|
paboyle
|
bc1a4d40ba
|
Faster integer handling avoid push_back
|
2016-10-17 16:16:44 +01:00 |
|
paboyle
|
c8079e6621
|
Time the face gateher in x-dir more carefully
|
2016-10-13 22:28:50 +01:00 |
|
azusayamaguchi
|
8b0d171c9a
|
32bit issue on the KNL code variant where byte offsets were stored
|
2016-10-12 17:49:32 +01:00 |
|
azusayamaguchi
|
8bbd9ebc27
|
Reversing changes to Stencil class
|
2016-10-12 13:47:20 +01:00 |
|
azusayamaguchi
|
6472b431f0
|
__rdpmc needed for gcc, clang++
|
2016-10-12 12:29:08 +01:00 |
|
azusayamaguchi
|
bd205a3293
|
Fixing for non x86 and non KNL
|
2016-10-12 12:09:15 +01:00 |
|