9db2c6525d
updating benchmarks for red black 4d for Ls vectorised code
2016-07-14 23:44:02 +01:00
ef97e32152
Adding persistent communicators
2016-07-08 17:16:08 +01:00
a0676beeb1
Open up dependency on Eigen and FFTW
2016-07-07 22:31:07 +01:00
bfe14000a9
Double compile fix
2016-07-01 16:33:51 +01:00
680645f849
Merge branch 'release/v0.5.0'
2016-06-30 15:15:03 -07:00
2d8bb4c594
Tweaks
2016-06-30 14:35:01 -07:00
51cb2d4328
update file lists
2016-06-30 14:35:01 -07:00
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
5e02392f9c
Fixed compilation error for benchmark_dwf
...
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
05acc22920
placeholder for non temporal loads optimisation
2016-06-07 13:18:21 -07:00
8ac021de73
Added a test an fixed it for red black precon Ls innermost vectorised DWF
2016-06-07 13:16:56 -07:00
786ca52c43
Problems remain in the red black preconditioning of the Ls vectorisation
2016-06-06 07:05:51 -07:00
53d06046b0
Compiling updates for KNL
2016-06-03 03:47:54 -07:00
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
f2ae9682ff
Remove some timing hacks
2016-04-19 15:14:32 -07:00
528eb773ad
Merged.
...
Merge branch 'master' of https://github.com/paboyle/Grid
2016-04-19 22:24:34 +01:00
c323425496
Small change
2016-04-11 10:38:43 +01:00
650e02b344
Smaller vols too
2016-04-06 06:52:09 -07:00
a524ca2a4b
New benchmark update
2016-04-06 03:35:56 -07:00
23a7176b71
Loop over volumes
2016-04-06 03:22:11 -07:00
b1192a8908
Benchmark_zmm added
2016-04-06 03:00:07 -07:00
e8dddb1596
Adding extra benchmark
2016-04-06 10:32:54 +01:00
c77b7ee897
AddSub based alternate SU3 routine
2016-03-28 17:55:22 -06:00
e17c773a0b
Longer runs for vtune
2016-03-16 02:29:13 -07:00
f7be108e35
100 iters faster
2016-02-15 16:03:04 -06:00
fc6ad65751
Pushed the overlap comms tweaks
2016-01-11 06:34:22 -08:00
02452afd36
Optional overlap of comms with compute
2016-01-04 14:18:40 +00:00
331768dcff
Added overlap comms compute mode
2016-01-03 01:38:11 +00:00
aae8bf31a7
Global edit adding copyright and license info to every source file.
2016-01-02 14:51:32 +00:00
3ce10aa975
Fix a regression failure on Mobius; chroma regression added
2015-12-10 22:55:00 +00:00
1cc0d7b811
Bigger ncall as timing loops got small on cori
2015-11-07 00:04:40 -08:00
27813cf518
More timing detail reported
2015-11-06 05:27:13 -06:00
16c7993434
Merge branch 'master' of github.com:paboyle/Grid
...
Conflicts:
lib/simd/Grid_avx512.h
lib/simd/Grid_imci.h
2015-11-04 03:32:10 -08:00
32762346ad
Better run time on KNC
2015-11-04 03:25:34 -08:00
0f48658a27
Update minor
2015-11-04 03:23:46 -08:00
dfc1de6f60
Merge branch 'master' of github.com:paboyle/Grid
2015-11-04 05:14:26 -06:00
b3d70a3bb2
Ncall change
2015-11-04 09:55:21 +00:00
c26220e9ab
EO benchmark as well as non-eo
2015-11-04 09:54:48 +00:00
3726fe7481
Bigger vec length
2015-10-09 00:42:54 +02:00
af89c40462
Better timing tweaks to give sensible results on 24 threads on Edison dual ivybridge nodes.
2015-09-28 16:09:04 -07:00
9f4f65cb46
Added a decoupled memory system benchmark to remove thread synch overhead
2015-09-26 18:23:57 -07:00
9183380946
Gparity test added; partial implementation -- this is Chris K's doubled lattice only
...
and have to regress this with the 2 flavour implementation.
2015-08-12 09:49:33 +01:00
84a66476ab
Rework/global edit to enforce type templating of fermion operators.
...
Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types.
2015-08-10 20:47:44 +01:00
d1afebf71e
Sizable improvement in multigrid for unsquared.
...
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01
Substantial effort on timing and logging infrastructure
2015-07-24 01:31:13 +09:00
31a0c8d783
Merge branch 'master' of https://github.com/paboyle/Grid
2015-07-01 22:51:04 +01:00
39271b02dd
Modified memory bw test to display word size
2015-07-01 22:46:53 +01:00
638d2cda11
Change the SIMD command correctly with precision = double vs. single and
...
connect the "Real" default precisoin to a configure flag.
Have RealF, RealD and Real types, where Real is compile target dependent single/double,
RealF is single and RealD is double etc..
2015-07-01 22:45:15 +01:00
9143f071d7
Merge branch 'master' of https://github.com/paboyle/Grid
2015-06-30 15:17:46 +01:00
8ad81bed32
big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
...
near the bleeding edge I guess
2015-06-30 15:01:44 +01:00