paboyle
cd0da81196
Merge branch 'feature/bgq-asm' of https://github.com/paboyle/Grid into feature/bgq-asm
2017-02-16 18:52:30 -05:00
paboyle
d68907fc3e
Debug temp
2017-02-16 18:51:35 -05:00
paboyle
be3a8249c6
Faster gather
2017-02-16 23:51:15 +00:00
paboyle
bd600702cf
Vectorise the XYZT face gathering better.
...
Hard coded for simd_layout <= 2 in any given spread out direction; full generality is inconsistent
with efficiency.
2017-02-15 11:11:04 +00:00
paboyle
485ad6fde0
Stencil working in SHM MPI3
2017-02-07 01:20:39 -05:00
Peter Boyle
ff2f559a57
Remove inline on gather optimised path
2016-12-27 17:45:19 +00:00
Peter Boyle
3d21297bbb
Call the fast path compressor for wilson kernels to avoid if else on projector
2016-12-27 11:23:13 +00:00
Peter Boyle
83fa038bdf
Streaming stores
2016-12-08 16:58:42 +00:00
azusayamaguchi
d7d92af09d
Travis fail fix attempt
2016-10-25 01:45:53 +01:00
azusayamaguchi
b94478fa51
mpi, mpi3, shmem all compile.
...
mpi, mpi3 pass single node multi-rank
2016-10-24 23:45:31 +01:00
azusayamaguchi
b6a65059a2
Update to use shared memory to contain the stencil comms buffers
...
Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions
2016-10-24 17:30:43 +01:00
azusayamaguchi
ea25a4d9ac
Works
2016-10-23 06:10:05 +01:00
azusayamaguchi
c190221fd3
Internal SHM comms in non-simd directions working
...
Need to fix simd directions
2016-10-22 18:14:27 +01:00
azusayamaguchi
0fcd2e7188
Simplify the comms structure prior to implementing Shared memory direct bouncs
2016-10-21 22:44:10 +01:00
paboyle
c7cccaaa69
Comm vector for shmem
2016-10-20 16:57:31 +01:00
paboyle
7af9b87318
Cache face tables to improve performance.
...
Extract merge now looking poor.
2016-10-18 09:51:37 +01:00
paboyle
bc1a4d40ba
Faster integer handling avoid push_back
2016-10-17 16:16:44 +01:00
paboyle
c8079e6621
Time the face gateher in x-dir more carefully
2016-10-13 22:28:50 +01:00
azusayamaguchi
8b0d171c9a
32bit issue on the KNL code variant where byte offsets were stored
2016-10-12 17:49:32 +01:00
azusayamaguchi
8bbd9ebc27
Reversing changes to Stencil class
2016-10-12 13:47:20 +01:00
paboyle
2d4a45c758
Typecast pointer
2016-10-12 09:14:15 +01:00
paboyle
42cd148f5e
Base pointer for comms buffer under AVX512 assembly
2016-10-11 16:06:06 +01:00
Guido Cossu
2e453dfbf5
Added some instrumentation to benchmark the force computation
2016-10-06 17:52:45 +01:00
paboyle
4089984431
Timing hooks
2016-10-06 09:25:12 +01:00
Antonin Portelli
0724f7af75
QPX single precision implementation
2016-09-19 18:09:12 +01:00
paboyle
a0676beeb1
Open up dependency on Eigen and FFTW
2016-07-07 22:31:07 +01:00
paboyle
bdaa5b1767
Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking.
2016-06-30 14:35:02 -07:00
paboyle
8fcefc021a
Improved the prefetching when using cache blocking codes
2016-06-30 14:35:02 -07:00
paboyle
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
paboyle
1e554350ac
The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug.
2016-04-29 16:49:18 -07:00
paboyle
b27bac4669
Updates for simd in one dir
2016-04-19 15:34:10 -07:00
Peter Boyle
c9fadf97a5
Simplify the compressor interface again.
2016-02-17 18:16:45 -06:00
Peter Boyle
81395e85d1
Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it.
2016-02-16 13:56:44 -06:00
Peter Boyle
340a29b735
More careful sequencing of comms
2016-02-15 16:04:59 -06:00
Peter Boyle
42a9ac71d2
BUg fix, wait till complete.
2016-02-14 16:21:21 -06:00
Peter Boyle
41c2b09184
Shmem comms [NO MPI] target added. The dwf test runs and passes.
...
Not really shaken out to my satisfaction though as I want more tests done, so don't declare as working.
But committing my current while I try a few experimentals.
2016-02-14 14:24:38 -06:00
Peter Boyle
7f927a541c
Shmem related fixes for shmem compile
2016-02-11 07:37:39 -06:00
paboyle
fc6ad65751
Pushed the overlap comms tweaks
2016-01-11 06:34:22 -08:00
paboyle
dafc74020c
Overlap comms compute improvements in hand op kernels, and better timing from Edison and Cori
2016-01-10 16:54:27 -08:00
paboyle
d19321dfde
Overlap comms compute changes
2016-01-10 19:20:16 +00:00
paboyle
c99d748da6
Timing reports in benchmarks now reflect the asynch comms thread statistics
2016-01-04 14:42:16 +00:00
paboyle
331768dcff
Added overlap comms compute mode
2016-01-03 01:38:11 +00:00
paboyle
aae8bf31a7
Global edit adding copyright and license info to every source file.
2016-01-02 14:51:32 +00:00
paboyle
145a295231
Bug fix for stencil with large shifts (3+), would be important to naik term for example but did not
...
impact Wilson based nearest neighbour stencils.
2015-12-30 19:29:48 +00:00
Peter Boyle
880ff88362
Comms optimisation
2015-11-06 05:22:18 -06:00
Peter Boyle
ec5af35166
EO bug fix when spread out in x-direction
2015-11-04 09:56:58 +00:00
Peter Boyle
7d3512ab21
Gparity valence test now working.
...
Interface in FermionOperator will change a lot in future
2015-08-14 00:01:04 +01:00
Peter Boyle
1d0df449e8
Reorganise of file naming
2015-06-03 12:47:05 +01:00