paboyle
bfe14000a9
Double compile fix
2016-07-01 16:33:51 +01:00
paboyle
680645f849
Merge branch 'release/v0.5.0'
2016-06-30 15:15:03 -07:00
paboyle
2d8bb4c594
Tweaks
2016-06-30 14:35:01 -07:00
paboyle
51cb2d4328
update file lists
2016-06-30 14:35:01 -07:00
paboyle
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
Guido Cossu
5e02392f9c
Fixed compilation error for benchmark_dwf
...
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
paboyle
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
paboyle
05acc22920
placeholder for non temporal loads optimisation
2016-06-07 13:18:21 -07:00
paboyle
8ac021de73
Added a test an fixed it for red black precon Ls innermost vectorised DWF
2016-06-07 13:16:56 -07:00
paboyle
786ca52c43
Problems remain in the red black preconditioning of the Ls vectorisation
2016-06-06 07:05:51 -07:00
paboyle
53d06046b0
Compiling updates for KNL
2016-06-03 03:47:54 -07:00
paboyle
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
paboyle
f2ae9682ff
Remove some timing hacks
2016-04-19 15:14:32 -07:00
paboyle
528eb773ad
Merged.
...
Merge branch 'master' of https://github.com/paboyle/Grid
2016-04-19 22:24:34 +01:00
paboyle
c323425496
Small change
2016-04-11 10:38:43 +01:00
paboyle
650e02b344
Smaller vols too
2016-04-06 06:52:09 -07:00
paboyle
a524ca2a4b
New benchmark update
2016-04-06 03:35:56 -07:00
paboyle
23a7176b71
Loop over volumes
2016-04-06 03:22:11 -07:00
paboyle
b1192a8908
Benchmark_zmm added
2016-04-06 03:00:07 -07:00
paboyle
e8dddb1596
Adding extra benchmark
2016-04-06 10:32:54 +01:00
paboyle
c77b7ee897
AddSub based alternate SU3 routine
2016-03-28 17:55:22 -06:00
paboyle
e17c773a0b
Longer runs for vtune
2016-03-16 02:29:13 -07:00
Peter Boyle
f7be108e35
100 iters faster
2016-02-15 16:03:04 -06:00
paboyle
fc6ad65751
Pushed the overlap comms tweaks
2016-01-11 06:34:22 -08:00
paboyle
02452afd36
Optional overlap of comms with compute
2016-01-04 14:18:40 +00:00
paboyle
331768dcff
Added overlap comms compute mode
2016-01-03 01:38:11 +00:00
paboyle
aae8bf31a7
Global edit adding copyright and license info to every source file.
2016-01-02 14:51:32 +00:00
paboyle
3ce10aa975
Fix a regression failure on Mobius; chroma regression added
2015-12-10 22:55:00 +00:00
paboyle
1cc0d7b811
Bigger ncall as timing loops got small on cori
2015-11-07 00:04:40 -08:00
Peter Boyle
27813cf518
More timing detail reported
2015-11-06 05:27:13 -06:00
paboyle
16c7993434
Merge branch 'master' of github.com:paboyle/Grid
...
Conflicts:
lib/simd/Grid_avx512.h
lib/simd/Grid_imci.h
2015-11-04 03:32:10 -08:00
paboyle
32762346ad
Better run time on KNC
2015-11-04 03:25:34 -08:00
paboyle
0f48658a27
Update minor
2015-11-04 03:23:46 -08:00
Peter Boyle
dfc1de6f60
Merge branch 'master' of github.com:paboyle/Grid
2015-11-04 05:14:26 -06:00
Peter Boyle
b3d70a3bb2
Ncall change
2015-11-04 09:55:21 +00:00
Peter Boyle
c26220e9ab
EO benchmark as well as non-eo
2015-11-04 09:54:48 +00:00
Peter Boyle
3726fe7481
Bigger vec length
2015-10-09 00:42:54 +02:00
paboyle
af89c40462
Better timing tweaks to give sensible results on 24 threads on Edison dual ivybridge nodes.
2015-09-28 16:09:04 -07:00
Peter Boyle
9f4f65cb46
Added a decoupled memory system benchmark to remove thread synch overhead
2015-09-26 18:23:57 -07:00
Peter Boyle
9183380946
Gparity test added; partial implementation -- this is Chris K's doubled lattice only
...
and have to regress this with the 2 flavour implementation.
2015-08-12 09:49:33 +01:00
Peter Boyle
84a66476ab
Rework/global edit to enforce type templating of fermion operators.
...
Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types.
2015-08-10 20:47:44 +01:00
Peter Boyle
d1afebf71e
Sizable improvement in multigrid for unsquared.
...
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01
Substantial effort on timing and logging infrastructure
2015-07-24 01:31:13 +09:00
Peter Boyle
31a0c8d783
Merge branch 'master' of https://github.com/paboyle/Grid
2015-07-01 22:51:04 +01:00
paboyle
39271b02dd
Modified memory bw test to display word size
2015-07-01 22:46:53 +01:00
Peter Boyle
638d2cda11
Change the SIMD command correctly with precision = double vs. single and
...
connect the "Real" default precisoin to a configure flag.
Have RealF, RealD and Real types, where Real is compile target dependent single/double,
RealF is single and RealD is double etc..
2015-07-01 22:45:15 +01:00
Peter Boyle
9143f071d7
Merge branch 'master' of https://github.com/paboyle/Grid
2015-06-30 15:17:46 +01:00
Peter Boyle
8ad81bed32
big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
...
near the bleeding edge I guess
2015-06-30 15:01:44 +01:00
Peter Boyle
93916f400d
Update Benchmark_comms.cc
2015-06-25 10:59:53 +01:00
Peter Boyle
f07a17ba2c
Assist for generating file lists contained in Make.inc files for convenience when things are added
2015-06-03 13:07:00 +01:00
Peter Boyle
84b5c7217d
CG test written and passes i.e. converges with small true residual
...
in RedBlack MpcDagMpc, Unprec MdagM and Schur red black solver for
each of.
DomainWallFermion
MobiusFermion
MobiusZolotarevFermion
ScaledShamirFermion
ScaledShamirZolotarevFermion
2015-06-03 10:54:03 +01:00