1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-10-26 01:29:34 +00:00
Commit Graph

86 Commits

Author SHA1 Message Date
paboyle
2d8bb4c594 Tweaks 2016-06-30 14:35:01 -07:00
paboyle
51cb2d4328 update file lists 2016-06-30 14:35:01 -07:00
paboyle
6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
paboyle
55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
paboyle
05acc22920 placeholder for non temporal loads optimisation 2016-06-07 13:18:21 -07:00
paboyle
8ac021de73 Added a test an fixed it for red black precon Ls innermost vectorised DWF 2016-06-07 13:16:56 -07:00
paboyle
786ca52c43 Problems remain in the red black preconditioning of the Ls vectorisation 2016-06-06 07:05:51 -07:00
paboyle
53d06046b0 Compiling updates for KNL 2016-06-03 03:47:54 -07:00
paboyle
139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
paboyle
f2ae9682ff Remove some timing hacks 2016-04-19 15:14:32 -07:00
paboyle
528eb773ad Merged.
Merge branch 'master' of https://github.com/paboyle/Grid
2016-04-19 22:24:34 +01:00
paboyle
c323425496 Small change 2016-04-11 10:38:43 +01:00
paboyle
650e02b344 Smaller vols too 2016-04-06 06:52:09 -07:00
paboyle
a524ca2a4b New benchmark update 2016-04-06 03:35:56 -07:00
paboyle
23a7176b71 Loop over volumes 2016-04-06 03:22:11 -07:00
paboyle
b1192a8908 Benchmark_zmm added 2016-04-06 03:00:07 -07:00
paboyle
e8dddb1596 Adding extra benchmark 2016-04-06 10:32:54 +01:00
paboyle
c77b7ee897 AddSub based alternate SU3 routine 2016-03-28 17:55:22 -06:00
paboyle
e17c773a0b Longer runs for vtune 2016-03-16 02:29:13 -07:00
Peter Boyle
f7be108e35 100 iters faster 2016-02-15 16:03:04 -06:00
paboyle
fc6ad65751 Pushed the overlap comms tweaks 2016-01-11 06:34:22 -08:00
paboyle
02452afd36 Optional overlap of comms with compute 2016-01-04 14:18:40 +00:00
paboyle
331768dcff Added overlap comms compute mode 2016-01-03 01:38:11 +00:00
paboyle
aae8bf31a7 Global edit adding copyright and license info to every source file. 2016-01-02 14:51:32 +00:00
paboyle
3ce10aa975 Fix a regression failure on Mobius; chroma regression added 2015-12-10 22:55:00 +00:00
paboyle
1cc0d7b811 Bigger ncall as timing loops got small on cori 2015-11-07 00:04:40 -08:00
Peter Boyle
27813cf518 More timing detail reported 2015-11-06 05:27:13 -06:00
paboyle
16c7993434 Merge branch 'master' of github.com:paboyle/Grid
Conflicts:
	lib/simd/Grid_avx512.h
	lib/simd/Grid_imci.h
2015-11-04 03:32:10 -08:00
paboyle
32762346ad Better run time on KNC 2015-11-04 03:25:34 -08:00
paboyle
0f48658a27 Update minor 2015-11-04 03:23:46 -08:00
Peter Boyle
dfc1de6f60 Merge branch 'master' of github.com:paboyle/Grid 2015-11-04 05:14:26 -06:00
Peter Boyle
b3d70a3bb2 Ncall change 2015-11-04 09:55:21 +00:00
Peter Boyle
c26220e9ab EO benchmark as well as non-eo 2015-11-04 09:54:48 +00:00
Peter Boyle
3726fe7481 Bigger vec length 2015-10-09 00:42:54 +02:00
paboyle
af89c40462 Better timing tweaks to give sensible results on 24 threads on Edison dual ivybridge nodes. 2015-09-28 16:09:04 -07:00
Peter Boyle
9f4f65cb46 Added a decoupled memory system benchmark to remove thread synch overhead 2015-09-26 18:23:57 -07:00
Peter Boyle
9183380946 Gparity test added; partial implementation -- this is Chris K's doubled lattice only
and have to regress this with the 2 flavour implementation.
2015-08-12 09:49:33 +01:00
Peter Boyle
84a66476ab Rework/global edit to enforce type templating of fermion operators.
Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types.
2015-08-10 20:47:44 +01:00
Peter Boyle
d1afebf71e Sizable improvement in multigrid for unsquared.
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01

Substantial effort on timing and logging infrastructure
2015-07-24 01:31:13 +09:00
Peter Boyle
31a0c8d783 Merge branch 'master' of https://github.com/paboyle/Grid 2015-07-01 22:51:04 +01:00
paboyle
39271b02dd Modified memory bw test to display word size 2015-07-01 22:46:53 +01:00
Peter Boyle
638d2cda11 Change the SIMD command correctly with precision = double vs. single and
connect the "Real" default precisoin to a configure flag.
Have RealF, RealD and Real types, where Real is compile target dependent single/double,
RealF is single and RealD is double etc..
2015-07-01 22:45:15 +01:00
Peter Boyle
9143f071d7 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-30 15:17:46 +01:00
Peter Boyle
8ad81bed32 big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:01:44 +01:00
Peter Boyle
93916f400d Update Benchmark_comms.cc 2015-06-25 10:59:53 +01:00
Peter Boyle
f07a17ba2c Assist for generating file lists contained in Make.inc files for convenience when things are added 2015-06-03 13:07:00 +01:00
Peter Boyle
84b5c7217d CG test written and passes i.e. converges with small true residual
in RedBlack MpcDagMpc, Unprec MdagM and Schur red black solver for
each of.

DomainWallFermion
MobiusFermion
MobiusZolotarevFermion
ScaledShamirFermion
ScaledShamirZolotarevFermion
2015-06-03 10:54:03 +01:00
Peter Boyle
69f4d58381 Reorg; moving prec/unprec/schur CG for Wilson and DWF into tests as these are really tests and not benchmarks
(no performance reports, only convergence test).
2015-06-02 17:25:26 +01:00
Peter Boyle
3845f267cb Domain wall fermions now invert ; have the basis set up for
Tanh/Zolo * (Cayley/PartFrac/ContFrac) * (Mobius/Shamir/Wilson)
Approx        Representation               Kernel.

All are done with space-time taking part in checkerboarding, Ls uncheckerboarded

Have only so far tested the Domain Wall limit of mobius, and at that only checked
that it
i)  Inverts
ii) 5dim DW == Ls copies of 4dim D2
iii) MeeInv Mee == 1
iv) Meo+Mee+Moe+Moo == M unprec.
v) MpcDagMpc is hermitan
vi) Mdag is the adjoint of M between stochastic vectors.

That said, the RB schur solve, RB MpcDagMpc solve, Unprec solve
all converge and the true residual becomes small; so pretty good tests.
2015-06-02 16:57:12 +01:00
Peter Boyle
5644ab1e19 Large scale change to support 5d fermion formulations.
Have 5d replicated wilson with 4d gauge working and matrix regressing
to Ls copies of wilson.
2015-05-31 15:09:02 +01:00