portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2024-11-15 18:25:37 +00:00

Author	SHA1	Message	Date
Yong-Chull Jang	53a9260a94	patch to compile with AVX512 for SkyLake Xeon processor using GCC7.2.0. Beside bug fixes in the source code, a option 'SKL' is added to configure.ac for SkyLake processor specific AVX512 instruction flags when using GCC. Code can be compiled with --enable-simd=SKL using GCC 7.2.0, but Test_simd fails. AVX512 support for complex double type with non-intel compilers makes this error.	2018-01-27 10:00:38 -05:00
paboyle	fc4ab9ccd5	Working half precision comms	2017-04-20 11:20:26 +01:00
paboyle	9fd23faadf	Pretty layout	2017-03-30 13:44:45 +09:00
paboyle	4e7ab3166f	Refactoring header layout	2017-02-22 18:09:33 +00:00
paboyle	3ae92fa2e6	Global changes to parallel_for structure. Move the comms flags to more sensible names	2017-02-21 05:24:27 -05:00
paboyle	3e6945cd65	Fixing AVX Z-mobius	2016-12-18 02:05:11 +00:00
paboyle	87be03006a	AVX 512 code broke other compiles; fixing	2016-12-18 01:45:09 +00:00
Peter Boyle	fa6acccf55	Zmobius asm	2016-12-18 00:56:19 +00:00
Peter Boyle	fe187e9ed3	Compiles and passes under ZMobius with assembler	2016-12-10 00:47:48 +00:00
Peter Boyle	0091b50f49	Zmobius working -- not asm yet	2016-12-09 22:51:32 +00:00
Peter Boyle	fb8d4b2357	Lots of debug on performance Mobius	2016-12-08 17:28:28 +00:00
Peter Boyle	e27c6b217c	Updating	2016-12-01 12:42:53 +00:00
paboyle	6adf35da54	Faster Mobius	2016-12-01 11:39:04 +00:00
paboyle	bd0430b34f	Serialisation in malloc fixed	2016-11-29 22:27:55 +00:00
paboyle	90e70790f3	Feature for z-Mobius prep	2016-08-15 22:31:29 +01:00
paboyle	980ff18956	Solving the instantiation no compile issue	2016-07-15 17:19:44 +01:00
paboyle	adbc7c1188	Adding files for multiple implementations (cache opt) and Ls vectorisation of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely in the vector direction, s-hopping terms involve rotations. The serial dependence of the LDU inversion for Mobius and 4d even odd checkerboarding is removed by simply applying Ls^2 operations (vectorised many ways) as a dense matrix operation. This should give similar throughput but high flops (non-compulsory flops) but enable use of the KNL cache friendly kernels throughout the code. Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512 with single precision.	2016-07-14 22:59:21 +01:00

17 Commits