paboyle
44188a5c6f
AVX512 fix
2018-03-05 00:32:24 +00:00
paboyle
3277bda130
View introduction to prepare for accelerator offload.
...
Probably same problem exists for stencil object
2018-03-04 16:38:08 +00:00
paboyle
70e276e1ab
parallel_for elimination -> thread_loop
2018-01-28 01:01:14 +00:00
paboyle
2d0bcc2606
Zero changes, acceleartor on kernels and some thread loop changes
2018-01-27 23:47:38 +00:00
paboyle
c4f82e072b
_grid becomes private ; use Grid()§
2018-01-27 00:04:12 +00:00
paboyle
85771e97e9
Hide internal data
2018-01-26 23:04:46 +00:00
paboyle
72acb0e48f
Namespace, indent
2018-01-14 23:41:59 +00:00
paboyle
fc4ab9ccd5
Working half precision comms
2017-04-20 11:20:26 +01:00
paboyle
9fd23faadf
Pretty layout
2017-03-30 13:44:45 +09:00
paboyle
4e7ab3166f
Refactoring header layout
2017-02-22 18:09:33 +00:00
paboyle
3ae92fa2e6
Global changes to parallel_for structure.
...
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
paboyle
3e6945cd65
Fixing AVX Z-mobius
2016-12-18 02:05:11 +00:00
paboyle
87be03006a
AVX 512 code broke other compiles; fixing
2016-12-18 01:45:09 +00:00
Peter Boyle
fa6acccf55
Zmobius asm
2016-12-18 00:56:19 +00:00
Peter Boyle
fe187e9ed3
Compiles and passes under ZMobius with assembler
2016-12-10 00:47:48 +00:00
Peter Boyle
0091b50f49
Zmobius working -- not asm yet
2016-12-09 22:51:32 +00:00
Peter Boyle
fb8d4b2357
Lots of debug on performance Mobius
2016-12-08 17:28:28 +00:00
Peter Boyle
e27c6b217c
Updating
2016-12-01 12:42:53 +00:00
paboyle
6adf35da54
Faster Mobius
2016-12-01 11:39:04 +00:00
paboyle
bd0430b34f
Serialisation in malloc fixed
2016-11-29 22:27:55 +00:00
paboyle
90e70790f3
Feature for z-Mobius prep
2016-08-15 22:31:29 +01:00
paboyle
980ff18956
Solving the instantiation no compile issue
2016-07-15 17:19:44 +01:00
paboyle
adbc7c1188
Adding files for multiple implementations (cache opt) and Ls vectorisation
...
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.
The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.
This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.
Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00