Guido Cossu
b812d5e39c
Added single threaded version of the derivative for the Ls vectorised DWF
2016-12-06 16:31:13 +00:00
Peter Boyle
e27c6b217c
Updating
2016-12-01 12:42:53 +00:00
paboyle
6adf35da54
Faster Mobius
2016-12-01 11:39:04 +00:00
Lanny91
b18950f776
Added simd real divide test with QPX divide fixes
2016-11-25 13:21:33 +00:00
Lanny91
0acbf77bc6
Add QPX Div structure
2016-11-24 13:24:12 +00:00
a2cffb0304
AVXFMA target fixed
2016-11-21 17:47:18 +01:00
97cddda49e
Merge branch 'feature/gen-simd' into feature/doxygen
...
# Conflicts:
# Makefile.am
# configure.ac
2016-11-19 13:11:13 +01:00
b873504b90
fully generic SIMD
2016-11-19 01:32:39 +01:00
042ae5b87c
generic 256bits SIMD
2016-11-15 12:16:15 +00:00
azusayamaguchi
f7b60004f3
Merge branch 'develop' into release/v0.6.0
2016-11-04 16:08:07 +00:00
d5e95bc350
Merge branch 'release/v0.6.0' into feature/feynman-rules
2016-10-31 18:36:21 +00:00
Guido Cossu
e1042aef77
First version of the doube prec for testing purposes
...
It does not compile single and double version at the same time
2016-10-28 17:20:04 +01:00
paboyle
aa6a839c60
avx512 build fix; detect clang/gcc intrinsics vs. ICPC
2016-10-28 09:13:09 +01:00
ca21003f01
Merge branch 'feature/fft-opt' into feature/feynman-rules
...
# Conflicts:
# lib/FFT.h
# lib/qcd/action/fermion/WilsonFermion5D.h
# tests/core/Test_fft.cc
2016-10-26 18:44:47 +01:00
azusayamaguchi
460d0753a1
Merge branch 'develop' into feature/mpi3
...
Conflicts:
lib/simd/Grid_avx512.h
2016-10-25 01:08:51 +01:00
azusayamaguchi
75ebd3a0d1
Typo fixes and rotate for CLANG
2016-10-21 22:34:29 +01:00
bd6a228af6
Merge commit '20a091c3eddfdb67a82ece6413740a93650a2f98' into feature/feynman-rules
2016-10-21 13:10:30 +01:00
azusayamaguchi
20a091c3ed
Intel vs. Clang intrinsics differences absorbed
2016-10-21 09:08:36 +01:00
997fd882ff
Merge branch 'develop' into feature/feynman-rules
...
# Conflicts:
# lib/Threads.h
# lib/qcd/action/fermion/WilsonFermion.cc
# lib/qcd/action/fermion/WilsonFermion.h
# lib/qcd/utils/SUn.h
# lib/simd/Grid_avx.h
# lib/simd/Intel512common.h
2016-10-19 18:35:18 +01:00
paboyle
811ca45473
GNU clang hack for AVX512 since there are missing reduce intrinsics in Clang 3.9 and GCC-6 AVX512 support
2016-10-17 16:23:21 +01:00
azusayamaguchi
81f2aeaece
KNL streaming stores, and KNL performance coutners
2016-10-12 11:45:22 +01:00
paboyle
6f408256bc
FMA4 option moved on the align
2016-10-11 10:03:01 +01:00
paboyle
8d11681aac
verbose remove
2016-10-10 23:50:42 +01:00
paboyle
3d5c9a1ee9
No compile fix on clang++ 3.9
2016-10-10 23:50:13 +01:00
Guido Cossu
611b5d74ba
Fix for AVX+FMA3 compilation
2016-10-10 15:26:17 +01:00
cb02b7088f
Merge branch 'develop' into feature/doxygen
...
# Conflicts:
# configure.ac
2016-10-09 13:35:44 +01:00
paboyle
87acd06990
Use streaming stores
2016-09-26 10:11:34 +01:00
paboyle
836e929565
Divide handling improved
2016-09-26 09:42:22 +01:00
Antonin Portelli
0724f7af75
QPX single precision implementation
2016-09-19 18:09:12 +01:00
4d11a6f5f2
first commit for QPX intrinsics
2016-08-23 14:41:44 +01:00
paboyle
17097a93ec
FFTW test ran over 4 mpi processes.
2016-08-17 01:33:55 +01:00
b1cfb4d661
first try at a nicer Doxygen implementation
2016-08-05 15:29:18 +01:00
93d29bb699
build system improvements after discussion with Peter
2016-08-04 16:19:59 +01:00
e9f30cab2c
first working version for the new build system
2016-07-30 17:53:18 +01:00
paboyle
4908b77d46
Fixed conflicts. PLEASE avoid making wholesale cosmetic only changes, this created
...
a HUGE amount of difficult to resolve and understand conflicts .
Wholesale formatting, reordering functions etc... in a central file like Tensor_class
or Grid_vector_types while others are also editing without making substantial functionality
changes creates pain.
2016-07-15 20:59:07 +01:00
paboyle
f4dd5062d7
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2016-07-15 19:26:06 +01:00
paboyle
8f47d0b5ab
Rotation needed for hopping term in fifth dim with Ls vectorised fields
2016-07-14 23:45:36 +01:00
paboyle
a0676beeb1
Open up dependency on Eigen and FFTW
2016-07-07 22:31:07 +01:00
Guido Cossu
e3d5319470
Debugged the real() and imag() functions and added tests to Test_Simd
2016-07-06 14:16:03 +01:00
Guido Cossu
fdfbf11c6d
Merge branch 'develop' into temporary-smearing
2016-07-04 18:45:10 +01:00
Guido Cossu
9cb90f714e
Merge remote-tracking branch 'origin/develop' into temporary-smearing
2016-07-04 17:28:40 +01:00
Guido Cossu
1a6d65c6a4
Converted set_uw and set_fj to all complex functions
2016-07-03 10:27:43 +01:00
paboyle
bdaa5b1767
Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking.
2016-06-30 14:35:02 -07:00
paboyle
8fcefc021a
Improved the prefetching when using cache blocking codes
2016-06-30 14:35:02 -07:00
paboyle
1445189361
COntrol the prefetch strategy
2016-06-30 14:35:02 -07:00
paboyle
a25bec87d9
Prefetch during save
2016-06-30 14:35:01 -07:00
paboyle
2d8bb4c594
Tweaks
2016-06-30 14:35:01 -07:00
paboyle
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
paboyle
87418e7df1
Slightly faster prefetching perf.
2016-06-13 02:32:52 -07:00
paboyle
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
Azusa Yamaguchi
d9408893b3
Prefetching in the normal kernel implementation.
2016-06-08 05:43:48 -07:00
paboyle
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
9d5f693cbe
empty SIMD fix
2016-05-24 10:56:27 +01:00
91e04056f9
fix of the empty SIMD
2016-05-12 19:24:10 +01:00
paboyle
c23375cd65
Testing travis CI integration
2016-04-30 06:30:56 -07:00
paboyle
c79ea0dcef
Fixingn IMCI
2016-04-22 21:52:54 -07:00
paboyle
e3f141f82f
Fixed SSE compile with typecasts
2016-04-22 10:30:30 -07:00
paboyle
a6dfa2386b
GCC choked on intrinsics calls that ICPC did not
2016-04-22 06:33:41 -07:00
paboyle
587f80cd93
Updated to compile and pass under intel SDE
2016-04-19 15:13:54 -07:00
paboyle
528eb773ad
Merged.
...
Merge branch 'master' of https://github.com/paboyle/Grid
2016-04-19 22:24:34 +01:00
paboyle
e5657510b0
Rotate support for Ls simd-ized
2016-04-19 22:24:18 +01:00
paboyle
f473919526
Rotate support
2016-04-19 22:23:51 +01:00
Christopher Kelly
ab56ccdd25
-Complete and working implementation of Grid_empty
2016-04-15 13:17:42 -04:00
paboyle
f473ef7591
Fixing the compile
2016-03-31 07:47:42 -07:00
paboyle
8052556275
Cleaning up the single/double kernel implementation switch
2016-03-31 14:51:32 +01:00
paboyle
83b15bfcdd
Better Avx512 assembly sequence for SU3 using fmaddsub to get the imag imag sign
2016-03-30 08:39:39 +01:00
paboyle
c77b7ee897
AddSub based alternate SU3 routine
2016-03-28 17:55:22 -06:00
paboyle
b6c3bc574b
Moving to a more coherent organisation of the inline assembly and arch dependencies.
2016-03-28 16:24:37 +01:00
paboyle
ad80f61fba
AVX512 shaken out
2016-03-28 00:38:05 -06:00
paboyle
165bffc2e7
Avx512 changes for assembler kernels
2016-03-26 22:25:45 -06:00
paboyle
644fd6d32e
Build avx512 clean
2016-03-25 09:35:33 -07:00
2d8bb356e3
Smearing routines compile (still untested)
2016-02-25 02:43:59 +09:00
a7251f28c7
Stout smearing compiles (untested)
2016-02-24 03:16:50 +09:00
Antonin Portelli
497e7e4c53
BG/Q compatibility fix
2016-02-23 15:57:38 +00:00
paboyle
aae8bf31a7
Global edit adding copyright and license info to every source file.
2016-01-02 14:51:32 +00:00
Azusa Yamaguchi
24a5a81c53
SSE compile fix
2015-12-16 09:09:37 +00:00
paboyle
3ce10aa975
Fix a regression failure on Mobius; chroma regression added
2015-12-10 22:55:00 +00:00
paboyle
fa01ae5980
integer divide
2015-11-28 17:00:34 -08:00
Azusa Yamaguchi
4690acc3c8
Don't know why peter committed these as they didn't compile
2015-11-06 10:31:48 +00:00
paboyle
16c7993434
Merge branch 'master' of github.com:paboyle/Grid
...
Conflicts:
lib/simd/Grid_avx512.h
lib/simd/Grid_imci.h
2015-11-04 03:32:10 -08:00
Peter Boyle
dfc1de6f60
Merge branch 'master' of github.com:paboyle/Grid
2015-11-04 05:14:26 -06:00
paboyle
9b5d31ffc1
mac , mult routines
...
Lines# with '#' will be ignored, and an empty message aborts the commit.
2015-11-04 03:10:34 -08:00
paboyle
a38762159c
Inline assembly hooks for AVX 512. Better way in some ways than BAGEL to generate assembly.
...
Updated Grid_avx512.h
2015-11-04 03:09:06 -08:00
Peter Boyle
ffc5dab17f
AMD FMA4 support added for Interlagos/BlueWaters
2015-11-04 04:29:58 -06:00
Peter Boyle
814c79f38d
SIMD improvements for mac and madd use in complex for avx, sse
2015-10-09 00:38:52 +02:00
paboyle
f4b6d1dfea
NGO stores reenabled
2015-09-30 16:02:14 -07:00
Peter Boyle
64d64d1ab6
Updating to modify non-inlining permute routines and hopefully get better reg use and
...
enhance performance.
2015-09-25 08:55:04 -07:00
Peter Boyle
5ef42add2d
Changes to remove warnings under icc; disambiguate AVX512 from IMCI correctly
...
and drop swizzles in AVX512. Don't know why these compiled.
2015-09-23 05:23:45 -07:00
neo
490009745c
Small change in the HMC interface.
...
Example of multiple levels in the WilsonFermion hmc test.
Merge remote-tracking branch 'upstream/master'
Conflicts:
lib/qcd/hmc/HMC.h
lib/qcd/hmc/integrators/Integrator.h
lib/qcd/hmc/integrators/Integrator_algorithm.h
tests/Test_simd.cc
2015-07-30 17:16:57 +09:00
Peter Boyle
d9d4c5916a
Elemental force term for Wilson dslash added and tests thereof passing.
...
Now need to construct pseudofermion two flavour, ratio, one flavour, ratio
action fragments.
2015-07-26 10:54:38 +09:00
neo
9adaeb061a
More NEON functionalities
2015-07-21 11:52:15 +09:00
neo
0ffcdf6204
Debugged vector version of ProjectOnGroup
2015-07-06 02:24:58 +09:00
Peter Boyle
4deffd1ccb
No compile fix
2015-07-02 02:03:09 +01:00
neo
e31dfa79d1
Merge remote-tracking branch 'upstream/master'
2015-06-17 02:02:51 +09:00
neo
6e5db0b1da
Corrected bug in integer multiplications for SSE4 and AVX2
...
Merge remote-tracking branch 'upstream/master'
Conflicts:
tests/Make.inc
2015-06-16 23:34:45 +09:00
Azusa Yamaguchi
20fe866651
Critical bug fix of sin/cos typo
2015-06-16 14:17:45 +01:00
Azusa Yamaguchi
22c8185caa
Binop assist and real/complex improvements
2015-06-14 00:59:07 +01:00
Azusa Yamaguchi
42f7e5b7f8
More functions broken out into element by element
2015-06-14 00:58:14 +01:00
neo
ecf3bae150
Merge remote-tracking branch 'upstream/master'
2015-06-09 19:01:07 +09:00
neo
e80012896a
Adding support for iMatrix exponentiation
2015-06-09 18:59:45 +09:00