e099dcdae7
Merge branch 'develop' into feature/bgq-asm
2017-02-23 00:25:29 +00:00
3ae92fa2e6
Global changes to parallel_for structure.
...
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
1a30455a10
1000 iters on bmark for more accurate timing
2017-02-20 17:47:01 -05:00
aca7a3ef0a
Optimisation control improvements
2017-02-10 18:22:31 -05:00
2bf4688e83
Running on BNL KNL
2017-02-07 01:32:10 -05:00
060da786e9
Comms benchmark improvements
2017-02-07 01:07:39 -05:00
a37e71f362
New automatic implementation of gamma matrices, Meson and SeqGamma are broken
2017-01-23 19:13:43 -08:00
55cb22ad67
Z mobius bmark
2016-12-18 00:55:37 +00:00
ff71a8e847
Ready for sim
2016-12-08 17:00:32 +00:00
e27c6b217c
Updating
2016-12-01 12:42:53 +00:00
cd01c1dbe9
Ls 16 more relevant
2016-11-30 22:11:10 +00:00
bd0430b34f
Serialisation in malloc fixed
2016-11-29 22:27:55 +00:00
2f92b4860b
Test the full Mooee sector
2016-11-29 00:15:08 +00:00
433afd36f5
Makefile rule for simple_* objects
2016-11-19 01:33:13 +01:00
042ae5b87c
generic 256bits SIMD
2016-11-15 12:16:15 +00:00
33dc1f51b5
Final sign off commits from Cori-1
2016-11-09 04:11:03 -08:00
757a928f9a
Improvement to use own SHM_OPEN call to avoid openmpi bug.
2016-11-02 12:37:46 +00:00
bb94ddd0eb
Tidy up of mpi3; also some cleaning of the dslash controls.
2016-11-02 08:07:09 +00:00
791cb050c8
Comms improvements
2016-11-01 11:35:43 +00:00
b6a65059a2
Update to use shared memory to contain the stencil comms buffers
...
Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions
2016-10-24 17:30:43 +01:00
c190221fd3
Internal SHM comms in non-simd directions working
...
Need to fix simd directions
2016-10-22 18:14:27 +01:00
a762b1fb71
MPI3 working with a bounce through shared memory on my laptop.
...
Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node.
2016-10-21 09:03:26 +01:00
81f2aeaece
KNL streaming stores, and KNL performance coutners
2016-10-12 11:45:22 +01:00
2e453dfbf5
Added some instrumentation to benchmark the force computation
2016-10-06 17:52:45 +01:00
4089984431
Timing hooks
2016-10-06 09:25:12 +01:00
0fd179fb33
Merge branch 'develop' into feature/hirep
2016-09-01 12:59:53 +01:00
fd5614738d
Merge branch 'develop' into feature/hirep
2016-08-30 18:21:36 +01:00
5a68715be3
Richards sweep test
2016-08-05 10:51:57 +01:00
32bc7a6ab8
MPI back out of change that hangs
...
AVX2 for clang, gcc needs the -mfma flag.
2016-08-05 10:36:00 +01:00
b65e72e521
Merge pull request #43 from rprollins/bench/output-format
...
Benchmark_dwf_sweep and Benchmark_zmm output formats
2016-08-04 16:47:01 +01:00
629283726b
build system: local Grid link flag moved to configure.ac
2016-08-03 15:07:42 +01:00
9e5b934d21
improved LAPACK configuration
2016-08-02 17:26:54 +01:00
e9f30cab2c
first working version for the new build system
2016-07-30 17:53:18 +01:00
df6c9f55d1
Use common benchmark output format for dwf_sweep and zmm
2016-07-20 17:38:56 +01:00
f4dd5062d7
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2016-07-15 19:26:06 +01:00
9db2c6525d
updating benchmarks for red black 4d for Ls vectorised code
2016-07-14 23:44:02 +01:00
ef97e32152
Adding persistent communicators
2016-07-08 17:16:08 +01:00
5028969d4b
Added generators for the adjoint representation
2016-07-08 15:40:11 +01:00
a0676beeb1
Open up dependency on Eigen and FFTW
2016-07-07 22:31:07 +01:00
fdfbf11c6d
Merge branch 'develop' into temporary-smearing
2016-07-04 18:45:10 +01:00
9cb90f714e
Merge remote-tracking branch 'origin/develop' into temporary-smearing
2016-07-04 17:28:40 +01:00
bfe14000a9
Double compile fix
2016-07-01 16:33:51 +01:00
680645f849
Merge branch 'release/v0.5.0'
2016-06-30 15:15:03 -07:00
2d8bb4c594
Tweaks
2016-06-30 14:35:01 -07:00
51cb2d4328
update file lists
2016-06-30 14:35:01 -07:00
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
565e9329ba
Changed the colouring classes
2016-06-30 16:51:03 +01:00
5e02392f9c
Fixed compilation error for benchmark_dwf
...
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
05acc22920
placeholder for non temporal loads optimisation
2016-06-07 13:18:21 -07:00