1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-05-04 01:14:12 +01:00
Commit Graph

1752 Commits

Author SHA1 Message Date
paboyle adbc7c1188 Adding files for multiple implementations (cache opt) and Ls vectorisation
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.

The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.

This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.

Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00
paboyle 62601bb649 Bug fix 2016-07-08 20:46:29 +01:00
paboyle ef97e32152 Adding persistent communicators 2016-07-08 17:16:08 +01:00
paboyle c667d9fdcc Trying to make compile clean on travis; seem to have a make -j 4 problem with fftw 2016-07-07 23:26:39 +01:00
paboyle 7dbb94bab2 Update 2016-07-07 22:51:37 +01:00
paboyle 236dcc820b typo fix 2016-07-07 22:46:11 +01:00
paboyle a42a441a6a Rename the reconfigure script to ./autogen.sh 2016-07-07 22:35:45 +01:00
paboyle a0676beeb1 Open up dependency on Eigen and FFTW 2016-07-07 22:31:07 +01:00
paboyle fc4a043663 Colors and banner clean up 2016-07-02 16:15:38 +01:00
paboyle 61ba50665e Merge branch 'hotfix/v0.5.1' into develop 2016-07-01 16:34:30 +01:00
paboyle bfe14000a9 Double compile fix 2016-07-01 16:33:51 +01:00
paboyle 1ceff48133 Merge branch 'release/v0.5.0' into develop 2016-06-30 15:15:59 -07:00
paboyle 680645f849 Merge branch 'release/v0.5.0' 2016-06-30 15:15:03 -07:00
paboyle 3fc6e03ad1 Version file v0.5.0 2016-06-30 14:44:09 -07:00
paboyle 2d6614f3a1 Merge branch 'feature/knl-cache-opt' into develop 2016-06-30 14:36:20 -07:00
paboyle 4e041b5103 Merge branch 'feature/knl-cache-opt' of https://github.com/paboyle/Grid into feature/knl-cache-opt 2016-06-30 14:36:08 -07:00
paboyle 712b9a3489 Asm only for avx512 2016-06-30 14:35:02 -07:00
paboyle bdaa5b1767 Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking. 2016-06-30 14:35:02 -07:00
paboyle 8fcefc021a Improved the prefetching when using cache blocking codes 2016-06-30 14:35:02 -07:00
paboyle 1445189361 COntrol the prefetch strategy 2016-06-30 14:35:02 -07:00
paboyle 05c884a62a Prefetch change 2016-06-30 14:35:01 -07:00
paboyle a25bec87d9 Prefetch during save 2016-06-30 14:35:01 -07:00
paboyle 2d8bb4c594 Tweaks 2016-06-30 14:35:01 -07:00
paboyle 51cb2d4328 update file lists 2016-06-30 14:35:01 -07:00
paboyle 6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
paboyle c8b35d960c Merge branch 'develop' of https://github.com/paboyle/Grid into feature/knl-cache-opt 2016-06-30 14:30:49 -07:00
paboyle 532f41dd61 Asm only for avx512 2016-06-30 14:00:34 -07:00
paboyle 661b0ab45d Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking. 2016-06-30 13:07:42 -07:00
paboyle 4bc08ed995 Improved the prefetching when using cache blocking codes 2016-06-26 12:54:14 -07:00
paboyle b2933a0557 COntrol the prefetch strategy 2016-06-25 12:55:25 -07:00
paboyle db057cc276 Prefetch change 2016-06-25 12:54:50 -07:00
paboyle 22e88eaf54 Prefetch during save 2016-06-25 12:54:14 -07:00
paboyle 09fe3caebd Tweaks 2016-06-25 11:08:05 -07:00
Guido Cossu 5e02392f9c Fixed compilation error for benchmark_dwf
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
paboyle 17a8f51a9b update file lists 2016-06-19 11:59:10 -07:00
paboyle 1b7f88dd00 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-19 11:45:58 -07:00
portelli d6737e4bd8 Travis fix for Linux clang builds 2016-06-14 19:15:08 +01:00
portelli d539888e57 Merge pull request #37 from rprollins/fix/mpi_communicator
Removed write to stdout in constructor for MPI CartesianCommunicator
2016-06-14 17:25:40 +01:00
Richard Rollins 86187d7cca Removed write to stdout in constructor for MPI CartesianCommunicator 2016-06-14 15:34:20 +01:00
paboyle 87418e7df1 Slightly faster prefetching perf. 2016-06-13 02:32:52 -07:00
paboyle 55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
Azusa Yamaguchi d9408893b3 Prefetching in the normal kernel implementation. 2016-06-08 05:43:48 -07:00
paboyle 05acc22920 placeholder for non temporal loads optimisation 2016-06-07 13:18:21 -07:00
paboyle 8ac021de73 Added a test an fixed it for red black precon Ls innermost vectorised DWF 2016-06-07 13:16:56 -07:00
paboyle e503ef5590 Cleaned up 2016-06-07 00:11:36 +01:00
paboyle a7682b0060 Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS 2016-06-06 23:48:21 +01:00
paboyle d4c9d71fc8 Merge branch 'master' of https://github.com/paboyle/Grid 2016-06-06 07:06:54 -07:00
paboyle 786ca52c43 Problems remain in the red black preconditioning of the Ls vectorisation 2016-06-06 07:05:51 -07:00
Peter Boyle 048ac04abc Update Benchmark_dwf.cc 2016-06-03 13:44:41 +01:00
Peter Boyle f78d89bcbe Update Lebesgue.cc
kill verbose
2016-06-03 13:33:42 +01:00