Yong-Chull Jang
3cb8cb7282
'typename' is added to compile with AVX512 using GCC7.2.0; a semicolon was missing in Grid_avx512.h and the bug is fixed. Option SKL is added to configure script for skylake processor specific AVX512 operations. Code can be compiled with --enable-simd=SKL using GCC 7.2.0, but Test_simd fails. AVX512 support for complex double type with non-intel compilers makes this error; it needs a review.
2017-12-23 14:54:07 -05:00
paboyle
fc4ab9ccd5
Working half precision comms
2017-04-20 11:20:26 +01:00
paboyle
9fd23faadf
Pretty layout
2017-03-30 13:44:45 +09:00
paboyle
4e7ab3166f
Refactoring header layout
2017-02-22 18:09:33 +00:00
paboyle
3ae92fa2e6
Global changes to parallel_for structure.
...
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
paboyle
3e6945cd65
Fixing AVX Z-mobius
2016-12-18 02:05:11 +00:00
paboyle
87be03006a
AVX 512 code broke other compiles; fixing
2016-12-18 01:45:09 +00:00
Peter Boyle
fa6acccf55
Zmobius asm
2016-12-18 00:56:19 +00:00
Peter Boyle
fe187e9ed3
Compiles and passes under ZMobius with assembler
2016-12-10 00:47:48 +00:00
Peter Boyle
0091b50f49
Zmobius working -- not asm yet
2016-12-09 22:51:32 +00:00
Peter Boyle
fb8d4b2357
Lots of debug on performance Mobius
2016-12-08 17:28:28 +00:00
Peter Boyle
e27c6b217c
Updating
2016-12-01 12:42:53 +00:00
paboyle
6adf35da54
Faster Mobius
2016-12-01 11:39:04 +00:00
paboyle
bd0430b34f
Serialisation in malloc fixed
2016-11-29 22:27:55 +00:00
paboyle
90e70790f3
Feature for z-Mobius prep
2016-08-15 22:31:29 +01:00
paboyle
980ff18956
Solving the instantiation no compile issue
2016-07-15 17:19:44 +01:00
paboyle
adbc7c1188
Adding files for multiple implementations (cache opt) and Ls vectorisation
...
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.
The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.
This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.
Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00