portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2025-04-18 09:45:55 +01:00

Author	SHA1	Message	Date
paboyle	1b7f88dd00	Enable reordering of the loops in the assembler for cache friendly. This gets in the way of L2 prefetching however. Do next next link in stencil prefetching.	2016-06-19 11:45:58 -07:00
paboyle	87418e7df1	Slightly faster prefetching perf.	2016-06-13 02:32:52 -07:00
paboyle	55f65b81b5	Improvements to the assembler interface that let us move chunks of the site and s loop into the kernels. This will save on function call overhead and guarantee L2 prefetching strategy is right since OMP can't distribute the sub-chunks of work.	2016-06-09 01:12:36 -07:00
Azusa Yamaguchi	d9408893b3	Prefetching in the normal kernel implementation.	2016-06-08 05:43:48 -07:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00
paboyle	c23375cd65	Testing travis CI integration	2016-04-30 06:30:56 -07:00
paboyle	f473ef7591	Fixing the compile	2016-03-31 07:47:42 -07:00
paboyle	8052556275	Cleaning up the single/double kernel implementation switch	2016-03-31 14:51:32 +01:00
paboyle	83b15bfcdd	Better Avx512 assembly sequence for SU3 using fmaddsub to get the imag imag sign	2016-03-30 08:39:39 +01:00
paboyle	c77b7ee897	AddSub based alternate SU3 routine	2016-03-28 17:55:22 -06:00
paboyle	b6c3bc574b	Moving to a more coherent organisation of the inline assembly and arch dependencies.	2016-03-28 16:24:37 +01:00