portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2025-06-16 14:57:05 +01:00

Author	SHA1	Message	Date
Peter Boyle	f8797e1e3e	bug fix. works now and great face performance	2017-04-26 03:14:02 -04:00
Peter Boyle	5b55867a7a	Slightly cheaper Ext assembly	2017-04-24 05:36:11 -04:00
Peter Boyle	3accb1ef89	Debugged assemply split phase with interior suppression	2017-04-23 19:30:19 -04:00
Peter Boyle	5812eb8a8c	Partially fixed. But the comms-overlap does not work yet.	2017-04-22 18:50:25 -04:00
paboyle	ac58565d0a	Dangerous rewrite of the assembly. If I make a mistake the debug will be painful.	2017-04-22 19:31:04 +01:00
paboyle	2c246551d0	Overlap comms and compute options in wilson kernels	2017-02-07 01:37:10 -05:00
Peter Boyle	eabf316ed9	BGQ performance ASM	2016-12-22 21:56:08 +00:00
Guido Cossu	e1042aef77	First version of the doube prec for testing purposes It does not compile single and double version at the same time	2016-10-28 17:20:04 +01:00
azusayamaguchi	81f2aeaece	KNL streaming stores, and KNL performance coutners	2016-10-12 11:45:22 +01:00
paboyle	90e70790f3	Feature for z-Mobius prep	2016-08-15 22:31:29 +01:00
paboyle	bdaa5b1767	Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking.	2016-06-30 14:35:02 -07:00
paboyle	8fcefc021a	Improved the prefetching when using cache blocking codes	2016-06-30 14:35:02 -07:00
paboyle	05c884a62a	Prefetch change	2016-06-30 14:35:01 -07:00
paboyle	2d8bb4c594	Tweaks	2016-06-30 14:35:01 -07:00
paboyle	6d58cb2a68	Enable reordering of the loops in the assembler for cache friendly. This gets in the way of L2 prefetching however. Do next next link in stencil prefetching.	2016-06-30 14:35:01 -07:00
paboyle	87418e7df1	Slightly faster prefetching perf.	2016-06-13 02:32:52 -07:00
paboyle	55f65b81b5	Improvements to the assembler interface that let us move chunks of the site and s loop into the kernels. This will save on function call overhead and guarantee L2 prefetching strategy is right since OMP can't distribute the sub-chunks of work.	2016-06-09 01:12:36 -07:00
Azusa Yamaguchi	d9408893b3	Prefetching in the normal kernel implementation.	2016-06-08 05:43:48 -07:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00

19 Commits