1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-13 04:37:05 +01:00
Commit Graph

1788 Commits

Author SHA1 Message Date
6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
c8b35d960c Merge branch 'develop' of https://github.com/paboyle/Grid into feature/knl-cache-opt 2016-06-30 14:30:49 -07:00
532f41dd61 Asm only for avx512 2016-06-30 14:00:34 -07:00
661b0ab45d Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking. 2016-06-30 13:07:42 -07:00
565e9329ba Changed the colouring classes 2016-06-30 16:51:03 +01:00
4bc08ed995 Improved the prefetching when using cache blocking codes 2016-06-26 12:54:14 -07:00
b2933a0557 COntrol the prefetch strategy 2016-06-25 12:55:25 -07:00
db057cc276 Prefetch change 2016-06-25 12:54:50 -07:00
22e88eaf54 Prefetch during save 2016-06-25 12:54:14 -07:00
09fe3caebd Tweaks 2016-06-25 11:08:05 -07:00
5e02392f9c Fixed compilation error for benchmark_dwf
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
17a8f51a9b update file lists 2016-06-19 11:59:10 -07:00
1b7f88dd00 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-19 11:45:58 -07:00
d6737e4bd8 Travis fix for Linux clang builds 2016-06-14 19:15:08 +01:00
d539888e57 Merge pull request #37 from rprollins/fix/mpi_communicator
Removed write to stdout in constructor for MPI CartesianCommunicator
2016-06-14 17:25:40 +01:00
86187d7cca Removed write to stdout in constructor for MPI CartesianCommunicator 2016-06-14 15:34:20 +01:00
87418e7df1 Slightly faster prefetching perf. 2016-06-13 02:32:52 -07:00
55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
d9408893b3 Prefetching in the normal kernel implementation. 2016-06-08 05:43:48 -07:00
05acc22920 placeholder for non temporal loads optimisation 2016-06-07 13:18:21 -07:00
8ac021de73 Added a test an fixed it for red black precon Ls innermost vectorised DWF 2016-06-07 13:16:56 -07:00
e503ef5590 Cleaned up 2016-06-07 00:11:36 +01:00
a7682b0060 Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS 2016-06-06 23:48:21 +01:00
d4c9d71fc8 Merge branch 'master' of https://github.com/paboyle/Grid 2016-06-06 07:06:54 -07:00
786ca52c43 Problems remain in the red black preconditioning of the Ls vectorisation 2016-06-06 07:05:51 -07:00
048ac04abc Update Benchmark_dwf.cc 2016-06-03 13:44:41 +01:00
f78d89bcbe Update Lebesgue.cc
kill verbose
2016-06-03 13:33:42 +01:00
53d06046b0 Compiling updates for KNL 2016-06-03 03:47:54 -07:00
5d3a1a025d timers flag 2016-06-03 03:25:38 -07:00
139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
1c0e922585 Merge pull request #35 from aportelli/master
empty SIMD fix
2016-05-27 16:49:13 +01:00
9d5f693cbe empty SIMD fix 2016-05-24 10:56:27 +01:00
5c90c3b457 Merge pull request #34 from aportelli/master
Polymorphic lattices & various small updates
2016-05-24 10:50:04 +01:00
91e04056f9 fix of the empty SIMD 2016-05-12 19:24:10 +01:00
3789e3f31c additional fixed in slice functions 2016-05-12 18:35:38 +01:00
0c66719210 const fix in slice functions 2016-05-12 13:01:35 +01:00
3a5b5c8bec Save an old tar of tree 2016-05-12 03:20:17 -07:00
fdbe071213 space added 2016-05-12 02:59:51 -07:00
4bc21ec7cb thread CL argument fix 2016-05-11 15:21:29 +01:00
e3083b6dfc Merge commit 'ab894186589224d570e0ecef8eea06443194a8ab' 2016-05-11 15:20:41 +01:00
ab89418658 Precision change going in; useful for mixed precision algorithms for example. 2016-05-11 15:18:47 +01:00
28cd99882c Subslicing 2016-05-11 15:06:54 +01:00
aceaee774c ExtractSlice / InsertSlice for lower dimensional lattices where the lattice is not
distributed in the orthogonal direction.
Useful for fermion 4d/5d etc..
2016-05-11 14:12:02 +01:00
f8f9fd6f22 Merge pull request #33 from aportelli/master
Travis for clang 3.8 + various updates/fixes
2016-05-05 22:57:13 +01:00
101aa769eb LatticeBase contain the grid pointer and a virtual destructor to allow polymorphic lattice pointers 2016-05-04 12:15:31 -07:00
0bf99bfde5 log polish 2016-05-04 12:14:49 -07:00
64bf6fe54e macro to dump NERSC header to a stream 2016-05-04 12:14:38 -07:00
1161d566b9 minor code cleaning 2016-05-02 19:32:11 -07:00
c698b16d75 function to generate Chroma-style gamma matrix products 2016-05-01 18:30:35 -07:00
c4c89336fe SliceSum: shutting down warning about non-threaded code for now 2016-05-01 18:29:57 -07:00