1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-10-24 17:54:47 +01:00
Commit Graph

111 Commits

Author SHA1 Message Date
paboyle
ff6da364e8 FFT double and single precision gives good performance now in multithreaded code. 2016-08-24 15:05:00 +01:00
paboyle
e423a09974 FFT improved and test_FFT passing under MPI 8 processes, 8^4 for LatticeComplexD and LatticeSpinMatrixD 2016-08-18 02:23:21 +01:00
paboyle
17097a93ec FFTW test ran over 4 mpi processes. 2016-08-17 01:33:55 +01:00
629283726b build system: local Grid link flag moved to configure.ac 2016-08-03 15:07:42 +01:00
9e5b934d21 improved LAPACK configuration 2016-08-02 17:26:54 +01:00
e9f30cab2c first working version for the new build system 2016-07-30 17:53:18 +01:00
paboyle
f4dd5062d7 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2016-07-15 19:26:06 +01:00
paboyle
c0e878705e Updated file list 2016-07-15 00:02:39 +01:00
paboyle
de3e79d300 red black for Ls vectorised is 4d red black. Update accordingly now I've made this choice 2016-07-14 23:49:42 +01:00
paboyle
adbc7c1188 Adding files for multiple implementations (cache opt) and Ls vectorisation
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.

The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.

This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.

Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00
paboyle
a0676beeb1 Open up dependency on Eigen and FFTW 2016-07-07 22:31:07 +01:00