paboyle
e7cba358c2
Temporary update to reflect the new dropping of std::vector in Lattice
...
Will update again to hide the internals in an interface
2018-01-25 23:31:41 +00:00
paboyle
d74c21a386
GLobal edit for QCD namespace removal & NAMESPACE macros
2018-01-15 09:37:58 +00:00
Peter Boyle
bfb68e6f02
Merge pull request #130 from giltirn/gparity-handunroll
...
Gparity handunroll
2017-09-21 10:11:00 +01:00
Christopher Kelly
d36d2fb40d
Added ability to override default Ls in Benchmark_dwf
2017-08-28 06:53:56 -07:00
paboyle
ae56e556c6
finalise issue on new OPA revert
2017-08-20 02:53:12 +01:00
Peter Boyle
7d88198387
Merge branch 'develop' into feature/multi-communicator
2017-08-19 13:03:35 -04:00
Peter Boyle
14d53e1c9e
Threaded MPI calls patches
2017-07-29 13:08:10 -04:00
Peter Boyle
b73bd151bb
Switch off counters by default
2017-06-30 10:16:35 +01:00
Peter Boyle
694b305cab
Update to reporting
2017-06-30 10:16:13 +01:00
Guido Cossu
20999c1370
Merge branch 'develop' into feature/hmc_generalise
2017-05-05 12:47:17 +01:00
Peter Boyle
945767c6d8
More info
2017-05-03 20:26:35 -04:00
Peter Boyle
92e364a35f
Better reporting in benchmark for MPI3
2017-05-03 15:43:36 -04:00
Guido Cossu
4063238943
Adding HMC test file example for Mobius + smearing
2017-05-01 13:44:00 +01:00
Guido Cossu
3344788fa1
Merge branch 'develop' into feature/hmc_generalise
2017-05-01 12:13:56 +01:00
paboyle
738c1a11c2
longer nloop
2017-04-26 08:43:20 +01:00
paboyle
ab66bac4e6
Think I'm getting on top of the reduced cost exterior precomputed list of links
2017-04-25 08:50:26 +01:00
paboyle
c429ace748
Cleaner OpenMP use
2017-04-22 20:28:42 +01:00
Peter Boyle
1d1b225497
Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).
2017-04-22 09:05:28 -04:00
paboyle
fc4ab9ccd5
Working half precision comms
2017-04-20 11:20:26 +01:00
Guido Cossu
8c540333d5
Merge branch 'develop' into feature/hmc_generalise
2017-04-05 14:41:04 +01:00
paboyle
e099dcdae7
Merge branch 'develop' into feature/bgq-asm
2017-02-23 00:25:29 +00:00
paboyle
1a30455a10
1000 iters on bmark for more accurate timing
2017-02-20 17:47:01 -05:00
paboyle
aca7a3ef0a
Optimisation control improvements
2017-02-10 18:22:31 -05:00
Guido Cossu
8b6a6c8236
Resolving small merge conflict
2017-02-09 16:20:24 +00:00
Guido Cossu
e0571c872b
Merge branch 'develop' into feature/hmc_generalise
2017-02-09 16:12:00 +00:00
paboyle
2bf4688e83
Running on BNL KNL
2017-02-07 01:32:10 -05:00
a37e71f362
New automatic implementation of gamma matrices, Meson and SeqGamma are broken
2017-01-23 19:13:43 -08:00
Guido Cossu
0bd296dda4
Adding check of the Dag part in the benchmark
2016-12-14 03:15:09 +00:00
paboyle
33dc1f51b5
Final sign off commits from Cori-1
2016-11-09 04:11:03 -08:00
paboyle
bb94ddd0eb
Tidy up of mpi3; also some cleaning of the dslash controls.
2016-11-02 08:07:09 +00:00
azusayamaguchi
b6a65059a2
Update to use shared memory to contain the stencil comms buffers
...
Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions
2016-10-24 17:30:43 +01:00
azusayamaguchi
c190221fd3
Internal SHM comms in non-simd directions working
...
Need to fix simd directions
2016-10-22 18:14:27 +01:00
paboyle
a762b1fb71
MPI3 working with a bounce through shared memory on my laptop.
...
Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node.
2016-10-21 09:03:26 +01:00
azusayamaguchi
81f2aeaece
KNL streaming stores, and KNL performance coutners
2016-10-12 11:45:22 +01:00
Guido Cossu
2e453dfbf5
Added some instrumentation to benchmark the force computation
2016-10-06 17:52:45 +01:00
paboyle
4089984431
Timing hooks
2016-10-06 09:25:12 +01:00
paboyle
9db2c6525d
updating benchmarks for red black 4d for Ls vectorised code
2016-07-14 23:44:02 +01:00
paboyle
a0676beeb1
Open up dependency on Eigen and FFTW
2016-07-07 22:31:07 +01:00
Guido Cossu
5e02392f9c
Fixed compilation error for benchmark_dwf
...
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
paboyle
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
paboyle
8ac021de73
Added a test an fixed it for red black precon Ls innermost vectorised DWF
2016-06-07 13:16:56 -07:00
paboyle
786ca52c43
Problems remain in the red black preconditioning of the Ls vectorisation
2016-06-06 07:05:51 -07:00
paboyle
53d06046b0
Compiling updates for KNL
2016-06-03 03:47:54 -07:00
paboyle
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
paboyle
c77b7ee897
AddSub based alternate SU3 routine
2016-03-28 17:55:22 -06:00
paboyle
e17c773a0b
Longer runs for vtune
2016-03-16 02:29:13 -07:00
Peter Boyle
f7be108e35
100 iters faster
2016-02-15 16:03:04 -06:00
paboyle
fc6ad65751
Pushed the overlap comms tweaks
2016-01-11 06:34:22 -08:00
paboyle
02452afd36
Optional overlap of comms with compute
2016-01-04 14:18:40 +00:00
paboyle
aae8bf31a7
Global edit adding copyright and license info to every source file.
2016-01-02 14:51:32 +00:00