1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-11 03:46:55 +01:00
Commit Graph

120 Commits

Author SHA1 Message Date
e1042aef77 First version of the doube prec for testing purposes
It does not compile single and double version at the same time
2016-10-28 17:20:04 +01:00
460d0753a1 Merge branch 'develop' into feature/mpi3
Conflicts:
	lib/simd/Grid_avx512.h
2016-10-25 01:08:51 +01:00
75ebd3a0d1 Typo fixes and rotate for CLANG 2016-10-21 22:34:29 +01:00
20a091c3ed Intel vs. Clang intrinsics differences absorbed 2016-10-21 09:08:36 +01:00
811ca45473 GNU clang hack for AVX512 since there are missing reduce intrinsics in Clang 3.9 and GCC-6 AVX512 support 2016-10-17 16:23:21 +01:00
81f2aeaece KNL streaming stores, and KNL performance coutners 2016-10-12 11:45:22 +01:00
611b5d74ba Fix for AVX+FMA3 compilation 2016-10-10 15:26:17 +01:00
0724f7af75 QPX single precision implementation 2016-09-19 18:09:12 +01:00
4d11a6f5f2 first commit for QPX intrinsics 2016-08-23 14:41:44 +01:00
17097a93ec FFTW test ran over 4 mpi processes. 2016-08-17 01:33:55 +01:00
93d29bb699 build system improvements after discussion with Peter 2016-08-04 16:19:59 +01:00
e9f30cab2c first working version for the new build system 2016-07-30 17:53:18 +01:00
4908b77d46 Fixed conflicts. PLEASE avoid making wholesale cosmetic only changes, this created
a HUGE amount of difficult to resolve and understand conflicts .

Wholesale formatting, reordering functions etc... in a central file like Tensor_class
or Grid_vector_types while others are also editing without making substantial functionality
changes creates pain.
2016-07-15 20:59:07 +01:00
f4dd5062d7 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2016-07-15 19:26:06 +01:00
8f47d0b5ab Rotation needed for hopping term in fifth dim with Ls vectorised fields 2016-07-14 23:45:36 +01:00
a0676beeb1 Open up dependency on Eigen and FFTW 2016-07-07 22:31:07 +01:00
e3d5319470 Debugged the real() and imag() functions and added tests to Test_Simd 2016-07-06 14:16:03 +01:00
fdfbf11c6d Merge branch 'develop' into temporary-smearing 2016-07-04 18:45:10 +01:00
9cb90f714e Merge remote-tracking branch 'origin/develop' into temporary-smearing 2016-07-04 17:28:40 +01:00
1a6d65c6a4 Converted set_uw and set_fj to all complex functions 2016-07-03 10:27:43 +01:00
bdaa5b1767 Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking. 2016-06-30 14:35:02 -07:00
8fcefc021a Improved the prefetching when using cache blocking codes 2016-06-30 14:35:02 -07:00
1445189361 COntrol the prefetch strategy 2016-06-30 14:35:02 -07:00
a25bec87d9 Prefetch during save 2016-06-30 14:35:01 -07:00
2d8bb4c594 Tweaks 2016-06-30 14:35:01 -07:00
6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
87418e7df1 Slightly faster prefetching perf. 2016-06-13 02:32:52 -07:00
55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
d9408893b3 Prefetching in the normal kernel implementation. 2016-06-08 05:43:48 -07:00
139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
9d5f693cbe empty SIMD fix 2016-05-24 10:56:27 +01:00
91e04056f9 fix of the empty SIMD 2016-05-12 19:24:10 +01:00
c23375cd65 Testing travis CI integration 2016-04-30 06:30:56 -07:00
c79ea0dcef Fixingn IMCI 2016-04-22 21:52:54 -07:00
e3f141f82f Fixed SSE compile with typecasts 2016-04-22 10:30:30 -07:00
a6dfa2386b GCC choked on intrinsics calls that ICPC did not 2016-04-22 06:33:41 -07:00
587f80cd93 Updated to compile and pass under intel SDE 2016-04-19 15:13:54 -07:00
528eb773ad Merged.
Merge branch 'master' of https://github.com/paboyle/Grid
2016-04-19 22:24:34 +01:00
e5657510b0 Rotate support for Ls simd-ized 2016-04-19 22:24:18 +01:00
f473919526 Rotate support 2016-04-19 22:23:51 +01:00
ab56ccdd25 -Complete and working implementation of Grid_empty 2016-04-15 13:17:42 -04:00
f473ef7591 Fixing the compile 2016-03-31 07:47:42 -07:00
8052556275 Cleaning up the single/double kernel implementation switch 2016-03-31 14:51:32 +01:00
83b15bfcdd Better Avx512 assembly sequence for SU3 using fmaddsub to get the imag imag sign 2016-03-30 08:39:39 +01:00
c77b7ee897 AddSub based alternate SU3 routine 2016-03-28 17:55:22 -06:00
b6c3bc574b Moving to a more coherent organisation of the inline assembly and arch dependencies. 2016-03-28 16:24:37 +01:00
ad80f61fba AVX512 shaken out 2016-03-28 00:38:05 -06:00
165bffc2e7 Avx512 changes for assembler kernels 2016-03-26 22:25:45 -06:00
644fd6d32e Build avx512 clean 2016-03-25 09:35:33 -07:00
2d8bb356e3 Smearing routines compile (still untested) 2016-02-25 02:43:59 +09:00