44188a5c6f
AVX512 fix
2018-03-05 00:32:24 +00:00
3277bda130
View introduction to prepare for accelerator offload.
...
Probably same problem exists for stencil object
2018-03-04 16:38:08 +00:00
70e276e1ab
parallel_for elimination -> thread_loop
2018-01-28 01:01:14 +00:00
2d0bcc2606
Zero changes, acceleartor on kernels and some thread loop changes
2018-01-27 23:47:38 +00:00
c4f82e072b
_grid becomes private ; use Grid()§
2018-01-27 00:04:12 +00:00
85771e97e9
Hide internal data
2018-01-26 23:04:46 +00:00
72acb0e48f
Namespace, indent
2018-01-14 23:41:59 +00:00
fc4ab9ccd5
Working half precision comms
2017-04-20 11:20:26 +01:00
9fd23faadf
Pretty layout
2017-03-30 13:44:45 +09:00
4e7ab3166f
Refactoring header layout
2017-02-22 18:09:33 +00:00
3ae92fa2e6
Global changes to parallel_for structure.
...
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
3e6945cd65
Fixing AVX Z-mobius
2016-12-18 02:05:11 +00:00
87be03006a
AVX 512 code broke other compiles; fixing
2016-12-18 01:45:09 +00:00
fa6acccf55
Zmobius asm
2016-12-18 00:56:19 +00:00
fe187e9ed3
Compiles and passes under ZMobius with assembler
2016-12-10 00:47:48 +00:00
0091b50f49
Zmobius working -- not asm yet
2016-12-09 22:51:32 +00:00
fb8d4b2357
Lots of debug on performance Mobius
2016-12-08 17:28:28 +00:00
e27c6b217c
Updating
2016-12-01 12:42:53 +00:00
6adf35da54
Faster Mobius
2016-12-01 11:39:04 +00:00
bd0430b34f
Serialisation in malloc fixed
2016-11-29 22:27:55 +00:00
90e70790f3
Feature for z-Mobius prep
2016-08-15 22:31:29 +01:00
980ff18956
Solving the instantiation no compile issue
2016-07-15 17:19:44 +01:00
adbc7c1188
Adding files for multiple implementations (cache opt) and Ls vectorisation
...
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.
The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.
This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.
Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00