Guido Cossu
e1042aef77
First version of the doube prec for testing purposes
...
It does not compile single and double version at the same time
2016-10-28 17:20:04 +01:00
azusayamaguchi
81f2aeaece
KNL streaming stores, and KNL performance coutners
2016-10-12 11:45:22 +01:00
paboyle
90e70790f3
Feature for z-Mobius prep
2016-08-15 22:31:29 +01:00
paboyle
bdaa5b1767
Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking.
2016-06-30 14:35:02 -07:00
paboyle
8fcefc021a
Improved the prefetching when using cache blocking codes
2016-06-30 14:35:02 -07:00
paboyle
05c884a62a
Prefetch change
2016-06-30 14:35:01 -07:00
paboyle
2d8bb4c594
Tweaks
2016-06-30 14:35:01 -07:00
paboyle
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
paboyle
87418e7df1
Slightly faster prefetching perf.
2016-06-13 02:32:52 -07:00
paboyle
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
Azusa Yamaguchi
d9408893b3
Prefetching in the normal kernel implementation.
2016-06-08 05:43:48 -07:00
paboyle
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00