7223753355
Rotate in a direction > 2 for simd_layout
2016-04-19 15:35:15 -07:00
b27bac4669
Updates for simd in one dir
2016-04-19 15:34:10 -07:00
c8a93d6a93
Cartesian changes to allow all simd in one direction
2016-04-19 15:18:12 -07:00
04072a5e1f
Rotate is a temporary hack. Would like to merge ALL
...
permutes as rotates of length 2, and make any rotate active
over any subset of lane bits. This is hard, and requires general
permute; current intrinsics mean this is only really possible for specific
case by case encodings as presently performed. Intel could produce a general
permute.. would help. IBM did it in VMX.
2016-04-19 15:15:34 -07:00
574ea4f843
const safety
2016-04-19 15:15:11 -07:00
f2ae9682ff
Remove some timing hacks
2016-04-19 15:14:32 -07:00
587f80cd93
Updated to compile and pass under intel SDE
2016-04-19 15:13:54 -07:00
528eb773ad
Merged.
...
Merge branch 'master' of https://github.com/paboyle/Grid
2016-04-19 22:24:34 +01:00
e5657510b0
Rotate support for Ls simd-ized
2016-04-19 22:24:18 +01:00
f473919526
Rotate support
2016-04-19 22:23:51 +01:00
8f1b0afc2a
Merge pull request #28 from aportelli/master
...
Build system fix
2016-04-16 09:55:45 +01:00
1494b0f397
Merge pull request #29 from giltirn/master
...
Grid_empty implementation and Lanzcos checkerboard fix
2016-04-16 09:55:24 +01:00
ab56ccdd25
-Complete and working implementation of Grid_empty
2016-04-15 13:17:42 -04:00
cf2f69812b
build system fix
2016-04-14 15:13:55 +01:00
c323425496
Small change
2016-04-11 10:38:43 +01:00
a646260e82
Merge remote-tracking branch 'origin/master' into ckelly-dec12-2015
2016-04-06 13:57:28 -04:00
af9c8d1372
-Checkerboard fixes for Lanczos
2016-04-06 13:50:56 -04:00
650e02b344
Smaller vols too
2016-04-06 06:52:09 -07:00
a524ca2a4b
New benchmark update
2016-04-06 03:35:56 -07:00
23a7176b71
Loop over volumes
2016-04-06 03:22:11 -07:00
b1192a8908
Benchmark_zmm added
2016-04-06 03:00:07 -07:00
e8dddb1596
Adding extra benchmark
2016-04-06 10:32:54 +01:00
c7ba47bdc7
Merge branch 'master' of https://github.com/paboyle/Grid
2016-04-06 02:56:28 +01:00
e67fc2be18
Adding a trial for openmp overhead minimisation
2016-03-31 16:00:37 +01:00
f473ef7591
Fixing the compile
2016-03-31 07:47:42 -07:00
f7b1060aed
Use headers to clear macros and sub precision
2016-03-31 14:52:37 +01:00
8052556275
Cleaning up the single/double kernel implementation switch
2016-03-31 14:51:32 +01:00
60d965f79e
AVX512 improvements; sigfpe trapping too
2016-03-30 08:42:34 +01:00
83b15bfcdd
Better Avx512 assembly sequence for SU3 using fmaddsub to get the imag imag sign
2016-03-30 08:39:39 +01:00
1ecbf9794d
Merge branch 'master' of https://github.com/paboyle/Grid
2016-03-30 08:37:55 +01:00
2ded354403
configure
2016-03-30 00:17:43 -07:00
340428a1fe
Eigen fixes and HDCR work
2016-03-30 00:16:02 -07:00
c77b7ee897
AddSub based alternate SU3 routine
2016-03-28 17:55:22 -06:00
b6c3bc574b
Moving to a more coherent organisation of the inline assembly and arch dependencies.
2016-03-28 16:24:37 +01:00
1e355a51e1
Interface change
2016-03-27 23:46:55 -07:00
ad80f61fba
AVX512 shaken out
2016-03-28 00:38:05 -06:00
61469252fe
AVX512 shaken out under SDE
2016-03-28 00:37:12 -06:00
02198ac5b5
Tolerance and more coverage
2016-03-28 00:36:17 -06:00
21abaf7e91
Gamma sign change
2016-03-28 00:35:45 -06:00
165bffc2e7
Avx512 changes for assembler kernels
2016-03-26 22:25:45 -06:00
644fd6d32e
Build avx512 clean
2016-03-25 09:35:33 -07:00
f54e0ec9bd
Try lanczos to set up hdcr subspace
2016-03-17 10:36:16 +00:00
a155a362da
Update from HDCR tuning
2016-03-16 02:31:04 -07:00
60d4564151
ICC no compile fix
2016-03-16 02:30:40 -07:00
d4e57f4bc6
IO Bandwidth reporting
2016-03-16 02:30:16 -07:00
3920b2c0ab
HDCR updates
2016-03-16 02:29:58 -07:00
2733c4b93c
hdcr updates
2016-03-16 02:29:37 -07:00
e17c773a0b
Longer runs for vtune
2016-03-16 02:29:13 -07:00
36a800f26c
Microsecond granularity support
2016-03-16 02:28:51 -07:00
b75da563d9
Resurrect timestamp. Should make optional
2016-03-16 02:28:17 -07:00