1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-10 03:17:07 +01:00
Commit Graph

93 Commits

Author SHA1 Message Date
6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
87418e7df1 Slightly faster prefetching perf. 2016-06-13 02:32:52 -07:00
55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
d9408893b3 Prefetching in the normal kernel implementation. 2016-06-08 05:43:48 -07:00
139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
9d5f693cbe empty SIMD fix 2016-05-24 10:56:27 +01:00
91e04056f9 fix of the empty SIMD 2016-05-12 19:24:10 +01:00
c23375cd65 Testing travis CI integration 2016-04-30 06:30:56 -07:00
c79ea0dcef Fixingn IMCI 2016-04-22 21:52:54 -07:00
e3f141f82f Fixed SSE compile with typecasts 2016-04-22 10:30:30 -07:00
a6dfa2386b GCC choked on intrinsics calls that ICPC did not 2016-04-22 06:33:41 -07:00
587f80cd93 Updated to compile and pass under intel SDE 2016-04-19 15:13:54 -07:00
528eb773ad Merged.
Merge branch 'master' of https://github.com/paboyle/Grid
2016-04-19 22:24:34 +01:00
e5657510b0 Rotate support for Ls simd-ized 2016-04-19 22:24:18 +01:00
f473919526 Rotate support 2016-04-19 22:23:51 +01:00
ab56ccdd25 -Complete and working implementation of Grid_empty 2016-04-15 13:17:42 -04:00
f473ef7591 Fixing the compile 2016-03-31 07:47:42 -07:00
8052556275 Cleaning up the single/double kernel implementation switch 2016-03-31 14:51:32 +01:00
83b15bfcdd Better Avx512 assembly sequence for SU3 using fmaddsub to get the imag imag sign 2016-03-30 08:39:39 +01:00
c77b7ee897 AddSub based alternate SU3 routine 2016-03-28 17:55:22 -06:00
b6c3bc574b Moving to a more coherent organisation of the inline assembly and arch dependencies. 2016-03-28 16:24:37 +01:00
ad80f61fba AVX512 shaken out 2016-03-28 00:38:05 -06:00
165bffc2e7 Avx512 changes for assembler kernels 2016-03-26 22:25:45 -06:00
644fd6d32e Build avx512 clean 2016-03-25 09:35:33 -07:00
497e7e4c53 BG/Q compatibility fix 2016-02-23 15:57:38 +00:00
aae8bf31a7 Global edit adding copyright and license info to every source file. 2016-01-02 14:51:32 +00:00
24a5a81c53 SSE compile fix 2015-12-16 09:09:37 +00:00
3ce10aa975 Fix a regression failure on Mobius; chroma regression added 2015-12-10 22:55:00 +00:00
fa01ae5980 integer divide 2015-11-28 17:00:34 -08:00
4690acc3c8 Don't know why peter committed these as they didn't compile 2015-11-06 10:31:48 +00:00
16c7993434 Merge branch 'master' of github.com:paboyle/Grid
Conflicts:
	lib/simd/Grid_avx512.h
	lib/simd/Grid_imci.h
2015-11-04 03:32:10 -08:00
dfc1de6f60 Merge branch 'master' of github.com:paboyle/Grid 2015-11-04 05:14:26 -06:00
9b5d31ffc1 mac , mult routines
Lines# with '#' will be ignored, and an empty message aborts the commit.
2015-11-04 03:10:34 -08:00
a38762159c Inline assembly hooks for AVX 512. Better way in some ways than BAGEL to generate assembly.
Updated Grid_avx512.h
2015-11-04 03:09:06 -08:00
ffc5dab17f AMD FMA4 support added for Interlagos/BlueWaters 2015-11-04 04:29:58 -06:00
814c79f38d SIMD improvements for mac and madd use in complex for avx, sse 2015-10-09 00:38:52 +02:00
f4b6d1dfea NGO stores reenabled 2015-09-30 16:02:14 -07:00
64d64d1ab6 Updating to modify non-inlining permute routines and hopefully get better reg use and
enhance performance.
2015-09-25 08:55:04 -07:00
5ef42add2d Changes to remove warnings under icc; disambiguate AVX512 from IMCI correctly
and drop swizzles in AVX512. Don't know why these compiled.
2015-09-23 05:23:45 -07:00
neo
490009745c Small change in the HMC interface.
Example of multiple levels in the WilsonFermion hmc test.

Merge remote-tracking branch 'upstream/master'

Conflicts:
	lib/qcd/hmc/HMC.h
	lib/qcd/hmc/integrators/Integrator.h
	lib/qcd/hmc/integrators/Integrator_algorithm.h
	tests/Test_simd.cc
2015-07-30 17:16:57 +09:00
d9d4c5916a Elemental force term for Wilson dslash added and tests thereof passing.
Now need to construct pseudofermion two flavour, ratio, one flavour, ratio
action fragments.
2015-07-26 10:54:38 +09:00
neo
9adaeb061a More NEON functionalities 2015-07-21 11:52:15 +09:00
neo
0ffcdf6204 Debugged vector version of ProjectOnGroup 2015-07-06 02:24:58 +09:00
4deffd1ccb No compile fix 2015-07-02 02:03:09 +01:00
neo
e31dfa79d1 Merge remote-tracking branch 'upstream/master' 2015-06-17 02:02:51 +09:00
neo
6e5db0b1da Corrected bug in integer multiplications for SSE4 and AVX2
Merge remote-tracking branch 'upstream/master'

Conflicts:
	tests/Make.inc
2015-06-16 23:34:45 +09:00
20fe866651 Critical bug fix of sin/cos typo 2015-06-16 14:17:45 +01:00
22c8185caa Binop assist and real/complex improvements 2015-06-14 00:59:07 +01:00
42f7e5b7f8 More functions broken out into element by element 2015-06-14 00:58:14 +01:00
neo
ecf3bae150 Merge remote-tracking branch 'upstream/master' 2015-06-09 19:01:07 +09:00