e099dcdae7
Merge branch 'develop' into feature/bgq-asm
2017-02-23 00:25:29 +00:00
1a30455a10
1000 iters on bmark for more accurate timing
2017-02-20 17:47:01 -05:00
aca7a3ef0a
Optimisation control improvements
2017-02-10 18:22:31 -05:00
2bf4688e83
Running on BNL KNL
2017-02-07 01:32:10 -05:00
a37e71f362
New automatic implementation of gamma matrices, Meson and SeqGamma are broken
2017-01-23 19:13:43 -08:00
33dc1f51b5
Final sign off commits from Cori-1
2016-11-09 04:11:03 -08:00
bb94ddd0eb
Tidy up of mpi3; also some cleaning of the dslash controls.
2016-11-02 08:07:09 +00:00
b6a65059a2
Update to use shared memory to contain the stencil comms buffers
...
Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions
2016-10-24 17:30:43 +01:00
c190221fd3
Internal SHM comms in non-simd directions working
...
Need to fix simd directions
2016-10-22 18:14:27 +01:00
a762b1fb71
MPI3 working with a bounce through shared memory on my laptop.
...
Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node.
2016-10-21 09:03:26 +01:00
81f2aeaece
KNL streaming stores, and KNL performance coutners
2016-10-12 11:45:22 +01:00
2e453dfbf5
Added some instrumentation to benchmark the force computation
2016-10-06 17:52:45 +01:00
4089984431
Timing hooks
2016-10-06 09:25:12 +01:00
9db2c6525d
updating benchmarks for red black 4d for Ls vectorised code
2016-07-14 23:44:02 +01:00
a0676beeb1
Open up dependency on Eigen and FFTW
2016-07-07 22:31:07 +01:00
5e02392f9c
Fixed compilation error for benchmark_dwf
...
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
8ac021de73
Added a test an fixed it for red black precon Ls innermost vectorised DWF
2016-06-07 13:16:56 -07:00
786ca52c43
Problems remain in the red black preconditioning of the Ls vectorisation
2016-06-06 07:05:51 -07:00
53d06046b0
Compiling updates for KNL
2016-06-03 03:47:54 -07:00
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
c77b7ee897
AddSub based alternate SU3 routine
2016-03-28 17:55:22 -06:00
e17c773a0b
Longer runs for vtune
2016-03-16 02:29:13 -07:00
f7be108e35
100 iters faster
2016-02-15 16:03:04 -06:00
fc6ad65751
Pushed the overlap comms tweaks
2016-01-11 06:34:22 -08:00
02452afd36
Optional overlap of comms with compute
2016-01-04 14:18:40 +00:00
aae8bf31a7
Global edit adding copyright and license info to every source file.
2016-01-02 14:51:32 +00:00
3ce10aa975
Fix a regression failure on Mobius; chroma regression added
2015-12-10 22:55:00 +00:00
1cc0d7b811
Bigger ncall as timing loops got small on cori
2015-11-07 00:04:40 -08:00
27813cf518
More timing detail reported
2015-11-06 05:27:13 -06:00
c26220e9ab
EO benchmark as well as non-eo
2015-11-04 09:54:48 +00:00
84a66476ab
Rework/global edit to enforce type templating of fermion operators.
...
Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types.
2015-08-10 20:47:44 +01:00
d1afebf71e
Sizable improvement in multigrid for unsquared.
...
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01
Substantial effort on timing and logging infrastructure
2015-07-24 01:31:13 +09:00
638d2cda11
Change the SIMD command correctly with precision = double vs. single and
...
connect the "Real" default precisoin to a configure flag.
Have RealF, RealD and Real types, where Real is compile target dependent single/double,
RealF is single and RealD is double etc..
2015-07-01 22:45:15 +01:00
8ad81bed32
big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
...
near the bleeding edge I guess
2015-06-30 15:01:44 +01:00
84b5c7217d
CG test written and passes i.e. converges with small true residual
...
in RedBlack MpcDagMpc, Unprec MdagM and Schur red black solver for
each of.
DomainWallFermion
MobiusFermion
MobiusZolotarevFermion
ScaledShamirFermion
ScaledShamirZolotarevFermion
2015-06-03 10:54:03 +01:00