1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-14 09:45:36 +00:00
Commit Graph

14 Commits

Author SHA1 Message Date
Peter Boyle
4548523ecc This modification eliminates what looks like a compiler bug
on Intel 2017.
2018-03-08 04:41:16 -08:00
paboyle
3277bda130 View introduction to prepare for accelerator offload.
Probably same problem exists for stencil object
2018-03-04 16:38:08 +00:00
paboyle
70e276e1ab parallel_for elimination -> thread_loop 2018-01-28 01:01:14 +00:00
paboyle
c4f82e072b _grid becomes private ; use Grid()§ 2018-01-27 00:04:12 +00:00
paboyle
85771e97e9 Hide internal data 2018-01-26 23:04:46 +00:00
paboyle
19234fb40e Namespace, format 2018-01-14 23:44:16 +00:00
paboyle
fc4ab9ccd5 Working half precision comms 2017-04-20 11:20:26 +01:00
paboyle
b9e8ea3aaa conjugate coefficient on the dagger 2017-03-30 13:43:13 +09:00
paboyle
4e7ab3166f Refactoring header layout 2017-02-22 18:09:33 +00:00
paboyle
3ae92fa2e6 Global changes to parallel_for structure.
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
paboyle
bd0430b34f Serialisation in malloc fixed 2016-11-29 22:27:55 +00:00
paboyle
90e70790f3 Feature for z-Mobius prep 2016-08-15 22:31:29 +01:00
paboyle
980ff18956 Solving the instantiation no compile issue 2016-07-15 17:19:44 +01:00
paboyle
adbc7c1188 Adding files for multiple implementations (cache opt) and Ls vectorisation
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.

The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.

This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.

Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00