bd6a228af6
Merge commit '20a091c3eddfdb67a82ece6413740a93650a2f98' into feature/feynman-rules
2016-10-21 13:10:30 +01:00
63d219498b
first (dirty) implementation of Feynman stoctachtic EM field
2016-10-21 13:10:13 +01:00
paboyle
2c54a53d0a
Compile verbose reduce
2016-10-21 12:12:14 +01:00
paboyle
306160ad9a
bcopy threaded
2016-10-21 12:07:28 +01:00
azusayamaguchi
20a091c3ed
Intel vs. Clang intrinsics differences absorbed
2016-10-21 09:08:36 +01:00
azusayamaguchi
202078eb1b
Cray / OpenSHMEM ordering differs
2016-10-21 09:07:20 +01:00
paboyle
a762b1fb71
MPI3 working with a bounce through shared memory on my laptop.
...
Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node.
2016-10-21 09:03:26 +01:00
paboyle
5b5925b8e5
Forgot to add
2016-10-20 17:09:40 +01:00
paboyle
b58adc6a4b
commVector
2016-10-20 17:00:15 +01:00
paboyle
f9d5e95d72
allocator template typedefs moved to AlignedAllocator
2016-10-20 16:59:39 +01:00
paboyle
4f8e636a43
commVector
2016-10-20 16:59:16 +01:00
paboyle
9b39f35ae6
commVector different for SHMEM compat
2016-10-20 16:58:53 +01:00
paboyle
5fe2b85cbd
MPI3 and shared memory support
2016-10-20 16:58:01 +01:00
paboyle
c7cccaaa69
Comm vector for shmem
2016-10-20 16:57:31 +01:00
paboyle
cbcfea466f
MPI3
2016-10-20 16:57:14 +01:00
paboyle
4955672fc3
MPI3
2016-10-20 16:57:00 +01:00
paboyle
8c043da5b7
SHMEM and comms allocator made different
2016-10-20 16:56:05 +01:00
paboyle
3cbe974eb4
Layout
2016-10-20 16:55:21 +01:00
997fd882ff
Merge branch 'develop' into feature/feynman-rules
...
# Conflicts:
# lib/Threads.h
# lib/qcd/action/fermion/WilsonFermion.cc
# lib/qcd/action/fermion/WilsonFermion.h
# lib/qcd/utils/SUn.h
# lib/simd/Grid_avx.h
# lib/simd/Intel512common.h
2016-10-19 18:35:18 +01:00
paboyle
7af9b87318
Cache face tables to improve performance.
...
Extract merge now looking poor.
2016-10-18 09:51:37 +01:00
paboyle
811ca45473
GNU clang hack for AVX512 since there are missing reduce intrinsics in Clang 3.9 and GCC-6 AVX512 support
2016-10-17 16:23:21 +01:00
paboyle
bc1a4d40ba
Faster integer handling avoid push_back
2016-10-17 16:16:44 +01:00
paboyle
c8079e6621
Time the face gateher in x-dir more carefully
2016-10-13 22:28:50 +01:00
azusayamaguchi
8b0d171c9a
32bit issue on the KNL code variant where byte offsets were stored
2016-10-12 17:49:32 +01:00
azusayamaguchi
8bbd9ebc27
Reversing changes to Stencil class
2016-10-12 13:47:20 +01:00
azusayamaguchi
6472b431f0
__rdpmc needed for gcc, clang++
2016-10-12 12:29:08 +01:00
azusayamaguchi
bd205a3293
Fixing for non x86 and non KNL
2016-10-12 12:09:15 +01:00
azusayamaguchi
496beffa88
Fix non-KNL build
2016-10-12 12:06:08 +01:00
azusayamaguchi
9b63e97108
align not absolutely required and confuses clang++
2016-10-12 11:51:21 +01:00
azusayamaguchi
81f2aeaece
KNL streaming stores, and KNL performance coutners
2016-10-12 11:45:22 +01:00
paboyle
2d4a45c758
Typecast pointer
2016-10-12 09:14:15 +01:00
paboyle
a123dcd7e9
Static required for shmem. Reading same object twice requires csum reset
2016-10-12 00:29:57 +01:00
paboyle
6b27c42dfe
Cosmetic
2016-10-12 00:29:39 +01:00
paboyle
f7c2aa3ba5
runtime by default
2016-10-12 00:29:13 +01:00
paboyle
7240d73184
Parallelise the x faces; fix the segv on KNL with comms
2016-10-11 22:21:07 +01:00
paboyle
42cd148f5e
Base pointer for comms buffer under AVX512 assembly
2016-10-11 16:06:06 +01:00
paboyle
6e01264bb7
don't use static by default
2016-10-11 10:03:39 +01:00
paboyle
6f408256bc
FMA4 option moved on the align
2016-10-11 10:03:01 +01:00
paboyle
8d11681aac
verbose remove
2016-10-10 23:50:42 +01:00
paboyle
3d5c9a1ee9
No compile fix on clang++ 3.9
2016-10-10 23:50:13 +01:00
paboyle
dc389e467c
axpy_ssp for any coeff type via template
2016-10-10 23:48:05 +01:00
paboyle
3619167d62
Mass parameter
2016-10-10 23:47:33 +01:00
paboyle
96f1d1b828
Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass).
2016-10-10 23:46:45 +01:00
paboyle
657e0a8f4d
Mass parameter
2016-10-10 23:46:10 +01:00
paboyle
616e7cd83e
Mass parameter
2016-10-10 23:45:48 +01:00
paboyle
6f26d2e8d4
Overlap tree level feynman rule
2016-10-10 23:45:18 +01:00
paboyle
c014574504
A "please implement me" feynman rule. If this were abstract virtual it would
...
require/force implementation
2016-10-10 23:44:00 +01:00
paboyle
d7ce164e6e
Feynman rule for DWF
2016-10-10 23:43:36 +01:00
paboyle
c0d5b99016
Dminus
2016-10-10 23:43:19 +01:00
paboyle
09ca32d678
Dminus added for Cayley
2016-10-10 23:42:55 +01:00
paboyle
082ae350c6
static schedule by default
2016-10-10 23:42:30 +01:00
Guido Cossu
611b5d74ba
Fix for AVX+FMA3 compilation
2016-10-10 15:26:17 +01:00
Guido Cossu
b56c9ffa52
Fix for AVXFMA
2016-10-10 14:43:37 +01:00
cb02b7088f
Merge branch 'develop' into feature/doxygen
...
# Conflicts:
# configure.ac
2016-10-09 13:35:44 +01:00
Guido Cossu
2e453dfbf5
Added some instrumentation to benchmark the force computation
2016-10-06 17:52:45 +01:00
paboyle
4089984431
Timing hooks
2016-10-06 09:25:12 +01:00
Guido Cossu
c78bbd0f8c
Fix ASM compilation
2016-10-04 15:37:32 +01:00
536e2ff073
*.inc removed: please don't commit these files either!
2016-09-27 11:54:03 +01:00
paboyle
87acd06990
Use streaming stores
2016-09-26 10:11:34 +01:00
paboyle
9353b6edfe
Fenv out of grid namespace
2016-09-26 10:09:13 +01:00
paboyle
167cc2650e
GNU SOURCE problem on travis
2016-09-26 09:58:09 +01:00
paboyle
7089b6d5a5
Setting up but not implemented some QED rules
2016-09-26 09:43:40 +01:00
paboyle
2ba7d43ddd
Divide handling
2016-09-26 09:43:14 +01:00
paboyle
836e929565
Divide handling improved
2016-09-26 09:42:22 +01:00
paboyle
b6713ecb60
Momentum space rules for Overlap, DWF untested to date
2016-09-26 09:39:09 +01:00
paboyle
52a39f0fcd
Divide in ET
2016-09-26 09:38:38 +01:00
paboyle
81a7a03076
Integer <<
2016-09-26 09:38:17 +01:00
paboyle
16b37b956c
divide goes to ET
2016-09-26 09:37:59 +01:00
paboyle
567b6cf23f
demangle moves to logging
2016-09-26 09:36:51 +01:00
paboyle
296396646d
FPE's on macos set up
2016-09-26 09:36:14 +01:00
Guido Cossu
5c190a1b8c
Merge branch 'develop' into feature/hirep
2016-09-23 11:06:06 +01:00
Guido Cossu
c4ac6e7e8f
Consolidating HMC interface
...
Uniformed interface for standard action in fundamental rep and Hirep
2016-09-23 10:47:42 +01:00
Guido Cossu
510e340e16
Debugged last commit for the Two index representation
2016-09-22 22:16:21 +01:00
Guido Cossu
6ffadca153
Restored number of colours to 3
2016-09-22 14:22:54 +01:00
Guido Cossu
b6597b74e7
Added support for the Two index Symmetric and Antisymmetric representations
...
Tested for HMC convergence: OK
Added also a test file showing an example for mixed representations
2016-09-22 14:17:37 +01:00
a034e9901b
Merge branch 'develop' into feature/hadrons
2016-09-20 13:49:33 +01:00
Antonin Portelli
0724f7af75
QPX single precision implementation
2016-09-19 18:09:12 +01:00
2e74520821
removed libtool use (BG/Q compatibility)
2016-09-16 15:25:49 +01:00
Antonin Portelli
6dd75ad9e5
Merge branch 'develop' of github.com:paboyle/Grid into feature/bgq
2016-09-16 15:07:54 +01:00
Guido Cossu
fda408ee6f
Added first lines for supporting Two Index representations
2016-09-13 10:43:30 +01:00
Guido Cossu
b9c80318a2
Merge branch 'develop' into feature/hirep
2016-09-13 10:01:51 +01:00
Guido Cossu
5df5d52d41
Fix for the Intel compiler
2016-09-12 17:17:20 +01:00
Guido Cossu
f76f281e58
Cleaning files after fix
2016-09-09 11:34:25 +01:00
Guido Cossu
aa20cc8b52
Fixing compilation error with AVX512 flag
2016-09-09 02:58:52 -07:00
Guido Cossu
0fd179fb33
Merge branch 'develop' into feature/hirep
2016-09-01 12:59:53 +01:00
Guido Cossu
f45ef8d114
Minor modification in ActionBase.h
2016-09-01 11:46:46 +01:00
paboyle
8535d433a7
Cold or hot must support any precisoin
2016-08-31 00:27:53 +01:00
paboyle
b573d1f35a
Wilson tree level added
2016-08-31 00:27:04 +01:00
paboyle
0c1d7e4daf
Mom space prop for Wilson action
2016-08-31 00:26:36 +01:00
paboyle
02e983a0cd
Momentum space prop and free prop convolution
2016-08-31 00:26:02 +01:00
paboyle
d15ab66aae
FFT moves higher in include order
2016-08-31 00:25:22 +01:00
paboyle
9005b82c6d
Multi dim FFT, and normalisation fix
2016-08-31 00:24:52 +01:00
paboyle
3475f45ce7
Demangle support for typeid stuff
2016-08-31 00:23:48 +01:00
paboyle
0744f38866
Demangle support is useful
2016-08-31 00:23:28 +01:00
Guido Cossu
fd5614738d
Merge branch 'develop' into feature/hirep
2016-08-30 18:21:36 +01:00
Guido Cossu
b0d3e4bb2c
Separating travis builds
2016-08-30 13:44:07 +01:00
Guido Cossu
b512ccbee6
HMC for Adjoint fermions works
...
Accepts and reproduces known results
Check initial instability of inverters
when starting from hot configurations
2016-08-30 11:31:25 +01:00
paboyle
8c89391c02
FFTW unresolved fixed when no fftw3.h
2016-08-24 16:41:47 +01:00
paboyle
bfac5195b8
tidy up
2016-08-24 16:38:36 +01:00
paboyle
744691097f
Printing
2016-08-24 15:05:56 +01:00
paboyle
ff6da364e8
FFT double and single precision gives good performance now in multithreaded code.
2016-08-24 15:05:00 +01:00
4d11a6f5f2
first commit for QPX intrinsics
2016-08-23 14:41:44 +01:00
paboyle
88be3b39bb
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2016-08-22 18:29:36 +01:00
paboyle
356e7940fd
fftw can be switched off
2016-08-22 16:24:49 +01:00
paboyle
73ce476890
Include fftw headers
2016-08-22 16:24:21 +01:00
paboyle
e423a09974
FFT improved and test_FFT passing under MPI 8 processes, 8^4 for LatticeComplexD and LatticeSpinMatrixD
2016-08-18 02:23:21 +01:00
paboyle
17097a93ec
FFTW test ran over 4 mpi processes.
2016-08-17 01:33:55 +01:00
paboyle
4ab7dbfd57
Instantiate
2016-08-15 23:00:40 +01:00
paboyle
90e70790f3
Feature for z-Mobius prep
2016-08-15 22:31:29 +01:00
Guido Cossu
9c2e8d5e28
Nc=3 just to let all the test pass in Travis
2016-08-09 15:46:57 +01:00
Guido Cossu
147e2025b9
Added unit tests on the representation transformations
...
Status: Passing all tests
2016-08-08 16:54:22 +01:00
b1cfb4d661
first try at a nicer Doxygen implementation
2016-08-05 15:29:18 +01:00
paboyle
32bc7a6ab8
MPI back out of change that hangs
...
AVX2 for clang, gcc needs the -mfma flag.
2016-08-05 10:36:00 +01:00
7ff7c7d90d
Merge branch 'develop' into feature/hadrons
2016-08-04 16:22:10 +01:00
93d29bb699
build system improvements after discussion with Peter
2016-08-04 16:19:59 +01:00
2485ef9c9c
Merge branch 'feature/new-build' into feature/hadrons
...
# Conflicts:
# Makefile.am
# scripts/copyright
2016-08-03 16:49:16 +01:00
9e5b934d21
improved LAPACK configuration
2016-08-02 17:26:54 +01:00
Guido Cossu
49b5c49851
Checked the hermiticity of the op in derivative, ok
...
Still CG fails to converge
2016-07-31 12:37:33 +01:00
e9f30cab2c
first working version for the new build system
2016-07-30 17:53:18 +01:00
Guido Cossu
089f0ab582
Debugged HMC for Creutz relation
2016-07-28 16:44:41 +01:00
Guido Cossu
b93e18ed50
Modified the Dirac Kernel class to compile with different number of colours
...
Added the general push_back functionality to accomodate for all defined representations
Compiles, not tested
2016-07-18 16:36:28 +01:00
Guido Cossu
9c77bb69a5
Added all elements for Hirep HMC
...
TODO: Test and debug
2016-07-18 12:05:23 +01:00
paboyle
f9e90eeb1f
Sign error on the force for 4d fields fixed
2016-07-16 01:52:44 +01:00
paboyle
fad5c675eb
sign error on the 4d gparity force
2016-07-16 01:51:56 +01:00
paboyle
4908b77d46
Fixed conflicts. PLEASE avoid making wholesale cosmetic only changes, this created
...
a HUGE amount of difficult to resolve and understand conflicts .
Wholesale formatting, reordering functions etc... in a central file like Tensor_class
or Grid_vector_types while others are also editing without making substantial functionality
changes creates pain.
2016-07-15 20:59:07 +01:00
paboyle
f4dd5062d7
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2016-07-15 19:26:06 +01:00
paboyle
980ff18956
Solving the instantiation no compile issue
2016-07-15 17:19:44 +01:00
Guido Cossu
7edf4c6c04
Added HMC utitities for the higher representations
...
TODO: Inherit types for the pseudofermions, Debugging, testing
2016-07-15 13:39:47 +01:00
paboyle
1a6c7204ac
Disable instantiation; Use cache version instead
2016-07-15 00:34:39 +01:00
paboyle
49310fbab3
Done with red black change over
2016-07-15 00:08:43 +01:00
paboyle
5c0c8efb9e
Updated file list
2016-07-15 00:02:11 +01:00
paboyle
dfd714e1ef
Multiple implementations for the 5d hopping terms, depending on cache friendly
...
ops and/or the 5th direction being vectorised
All use 4d redblack.
2016-07-15 00:00:09 +01:00
paboyle
79a8ca1a62
Rewrite for performance. Impl dependent instantiations give
...
4d linalg impls of the 5d hopping terms (and inverse)
Cache friendly loop orderings of the above
Dense matrix stored and apply to the above
-- Switch to Ls vectorised, and use dense matrix approach for the MooeeInv
and rotate/shift of the Mooee M5D routines.
2016-07-14 23:58:15 +01:00
paboyle
fb45eb2eb2
5d ls vec rename of impl class
2016-07-14 23:57:26 +01:00
paboyle
a307274c96
Fermion impl rename for ls vectorised 5d approaches
2016-07-14 23:56:13 +01:00
paboyle
3f2c44a5fe
Updating the class to 5d selection based on impl type
2016-07-14 23:55:26 +01:00
paboyle
48fb1cdc11
Update domain 5d vectorised impl type, move the type over to 4d redblack with
...
the dense OO inverse
2016-07-14 23:54:35 +01:00
paboyle
8a79e93cc2
Rename the 5d domain wall fermion vectorised Ls impl class
2016-07-14 23:53:00 +01:00
paboyle
dd62a61c5c
Added broadcast and rotation of simd vectors
2016-07-14 23:49:00 +01:00
paboyle
8f47d0b5ab
Rotation needed for hopping term in fifth dim with Ls vectorised fields
2016-07-14 23:45:36 +01:00
paboyle
42af132dab
Fix for chris kellys request to peek poke on checkerboarded fields
2016-07-14 23:44:48 +01:00
paboyle
adbc7c1188
Adding files for multiple implementations (cache opt) and Ls vectorisation
...
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.
The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.
This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.
Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00
Guido Cossu
9dc345e8e8
Debugged smearing and adding HMC functions for hirep
2016-07-13 17:51:18 +01:00
Christopher Kelly
6f47fbb1e2
Disabled parallel for loops in ExtractSlice and InsertSlice due to race conditions. Likely will need to do so for localConvert too.
2016-07-13 10:49:18 -04:00
Guido Cossu
a9ae30f868
Added representations definitions for the HMC
2016-07-12 13:36:10 +01:00
Christopher Kelly
a3c0fb79b6
Fix to iVector and iMatrix pokeIndex and checkerboard local site indexing.
2016-07-11 17:15:22 -04:00
paboyle
62601bb649
Bug fix
2016-07-08 20:46:29 +01:00
paboyle
ef97e32152
Adding persistent communicators
2016-07-08 17:16:08 +01:00
Guido Cossu
daea5297ee
Wrote the projector in the adjoint representation algebra
2016-07-08 16:14:16 +01:00
Guido Cossu
5028969d4b
Added generators for the adjoint representation
2016-07-08 15:40:11 +01:00
paboyle
a0676beeb1
Open up dependency on Eigen and FFTW
2016-07-07 22:31:07 +01:00
Christopher Kelly
c5106d0c03
Bugfix
2016-07-07 16:06:30 -04:00
Guido Cossu
fbf96b1bbb
]Merge branch 'develop' into feature/hirep
2016-07-07 14:20:10 +01:00
Guido Cossu
3c49ddfaa4
Merge branch 'temporary-smearing' into develop
2016-07-07 14:04:59 +01:00
Guido Cossu
ffb8b3116c
Tested smeared RHMC Wilson1p1, accepting
2016-07-07 11:49:36 +01:00
Christopher Kelly
dd8cfff111
Another fix for pedantic compilers
2016-07-06 18:22:15 -04:00
Christopher Kelly
184642adb0
Fix for pedantic compilers
2016-07-06 18:15:15 -04:00
Christopher Kelly
4774a3bcd2
Generalized HotConfiguration and functions it calls to accept gauge fields with precision other than the default.
2016-07-06 18:01:08 -04:00
Christopher Kelly
25fafa9a89
Comment
2016-07-06 16:19:41 -04:00
Christopher Kelly
85ed8175cb
Implemented mixed precision CG. Fixed filelist to exclude lib/Old directory and include Config.h.
2016-07-06 15:57:04 -04:00
Christopher Kelly
df5c788ef2
Merge branch 'develop' into feature/multi_prec
2016-07-06 14:52:28 -04:00
Christopher Kelly
15f22425c8
Added option to prevent CG from exiting when it fails to converge
2016-07-06 14:50:01 -04:00
Guido Cossu
e87182cf98
Debugged the copy constructor of the Lattice class
2016-07-06 15:31:00 +01:00
Guido Cossu
e3d5319470
Debugged the real() and imag() functions and added tests to Test_Simd
2016-07-06 14:16:03 +01:00
Guido Cossu
ffedeb1c58
Minor modifications
2016-07-06 11:41:27 +01:00
Guido Cossu
3e3b367aa9
Small changes in the Log files
2016-07-05 15:05:28 +01:00
Guido Cossu
3e80947c2b
Cleaned up HMC output. Tested smeared HMCs for single precision (OK)
2016-07-05 12:03:54 +01:00
Guido Cossu
fdfbf11c6d
Merge branch 'develop' into temporary-smearing
2016-07-04 18:45:10 +01:00
Guido Cossu
9cb90f714e
Merge remote-tracking branch 'origin/develop' into temporary-smearing
2016-07-04 17:28:40 +01:00
Guido Cossu
2daffdf95d
Tested smeared WilsonRatio action, accepts
2016-07-04 16:17:28 +01:00
Guido Cossu
149f826601
Tested smearing for Nf2 WilsonFermionAction, non EO: accepts
2016-07-04 16:09:19 +01:00
Guido Cossu
cd8ee27080
Simple change in iGamma for smearing
2016-07-04 16:02:57 +01:00
Guido Cossu
0fa66e8f3c
Debugged smearing for EOWilson, accepts
2016-07-04 15:35:37 +01:00
Guido Cossu
8dd099267d
Corrected a bug in the Expression Templates (acso and asin were wrong)
2016-07-03 12:28:25 +01:00
Guido Cossu
1a6d65c6a4
Converted set_uw and set_fj to all complex functions
2016-07-03 10:27:43 +01:00
paboyle
fc4a043663
Colors and banner clean up
2016-07-02 16:15:38 +01:00
Guido Cossu
092fa0d8da
Debugged set_fj,
...
to be fixed: BUG in imag()
2016-07-01 16:06:20 +01:00
e0b7004f96
Merge branch 'master' into feature/hadrons
2016-07-01 15:54:34 +01:00
paboyle
680645f849
Merge branch 'release/v0.5.0'
2016-06-30 15:15:03 -07:00
paboyle
712b9a3489
Asm only for avx512
2016-06-30 14:35:02 -07:00
paboyle
bdaa5b1767
Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking.
2016-06-30 14:35:02 -07:00
paboyle
8fcefc021a
Improved the prefetching when using cache blocking codes
2016-06-30 14:35:02 -07:00
paboyle
1445189361
COntrol the prefetch strategy
2016-06-30 14:35:02 -07:00
paboyle
05c884a62a
Prefetch change
2016-06-30 14:35:01 -07:00
paboyle
a25bec87d9
Prefetch during save
2016-06-30 14:35:01 -07:00
paboyle
2d8bb4c594
Tweaks
2016-06-30 14:35:01 -07:00
paboyle
51cb2d4328
update file lists
2016-06-30 14:35:01 -07:00
paboyle
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
Guido Cossu
565e9329ba
Changed the colouring classes
2016-06-30 16:51:03 +01:00
Guido Cossu
5e02392f9c
Fixed compilation error for benchmark_dwf
...
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
75fc295f6e
Merge branch 'hadrons' into feature/hadrons
2016-06-14 17:51:15 +01:00
Richard Rollins
86187d7cca
Removed write to stdout in constructor for MPI CartesianCommunicator
2016-06-14 15:34:20 +01:00
paboyle
87418e7df1
Slightly faster prefetching perf.
2016-06-13 02:32:52 -07:00
paboyle
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
Azusa Yamaguchi
d9408893b3
Prefetching in the normal kernel implementation.
2016-06-08 05:43:48 -07:00
paboyle
8ac021de73
Added a test an fixed it for red black precon Ls innermost vectorised DWF
2016-06-07 13:16:56 -07:00
paboyle
e503ef5590
Cleaned up
2016-06-07 00:11:36 +01:00
paboyle
a7682b0060
Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS
2016-06-06 23:48:21 +01:00
paboyle
d4c9d71fc8
Merge branch 'master' of https://github.com/paboyle/Grid
2016-06-06 07:06:54 -07:00
paboyle
786ca52c43
Problems remain in the red black preconditioning of the Ls vectorisation
2016-06-06 07:05:51 -07:00