1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-09-20 17:25:37 +01:00
Commit Graph

1606 Commits

Author SHA1 Message Date
azusayamaguchi
202078eb1b Cray / OpenSHMEM ordering differs 2016-10-21 09:07:20 +01:00
paboyle
a762b1fb71 MPI3 working with a bounce through shared memory on my laptop.
Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node.
2016-10-21 09:03:26 +01:00
Guido Cossu
deef2673b2 Separating the Lattice theories stub from the QCD.h file 2016-10-20 17:24:08 +01:00
paboyle
5b5925b8e5 Forgot to add 2016-10-20 17:09:40 +01:00
Guido Cossu
977b0a6dd9 Merge branch 'develop' into feature/hmc_generalise 2016-10-20 17:04:41 +01:00
Guido Cossu
977d844394 Few modifications on stdout messages 2016-10-20 17:01:59 +01:00
paboyle
b58adc6a4b commVector 2016-10-20 17:00:15 +01:00
paboyle
f9d5e95d72 allocator template typedefs moved to AlignedAllocator 2016-10-20 16:59:39 +01:00
paboyle
4f8e636a43 commVector 2016-10-20 16:59:16 +01:00
paboyle
9b39f35ae6 commVector different for SHMEM compat 2016-10-20 16:58:53 +01:00
paboyle
5fe2b85cbd MPI3 and shared memory support 2016-10-20 16:58:01 +01:00
paboyle
c7cccaaa69 Comm vector for shmem 2016-10-20 16:57:31 +01:00
paboyle
cbcfea466f MPI3 2016-10-20 16:57:14 +01:00
paboyle
4955672fc3 MPI3 2016-10-20 16:57:00 +01:00
paboyle
8c043da5b7 SHMEM and comms allocator made different 2016-10-20 16:56:05 +01:00
paboyle
3cbe974eb4 Layout 2016-10-20 16:55:21 +01:00
997fd882ff Merge branch 'develop' into feature/feynman-rules
# Conflicts:
#	lib/Threads.h
#	lib/qcd/action/fermion/WilsonFermion.cc
#	lib/qcd/action/fermion/WilsonFermion.h
#	lib/qcd/utils/SUn.h
#	lib/simd/Grid_avx.h
#	lib/simd/Intel512common.h
2016-10-19 18:35:18 +01:00
Guido Cossu
590675e2ca Csum in hex format 2016-10-19 17:26:25 +01:00
Guido Cossu
8c65bdf6d3 Printing checksum for the RNG file 2016-10-19 16:56:11 +01:00
Guido Cossu
74f1ed3bc5 Adding some documentation for HMC 2016-10-19 10:51:13 +01:00
paboyle
7af9b87318 Cache face tables to improve performance.
Extract merge now looking poor.
2016-10-18 09:51:37 +01:00
paboyle
811ca45473 GNU clang hack for AVX512 since there are missing reduce intrinsics in Clang 3.9 and GCC-6 AVX512 support 2016-10-17 16:23:21 +01:00
paboyle
bc1a4d40ba Faster integer handling avoid push_back 2016-10-17 16:16:44 +01:00
Guido Cossu
e250e6b7bb Moving parameters outside of the HMCrunner 2016-10-14 17:22:32 +01:00
paboyle
c8079e6621 Time the face gateher in x-dir more carefully 2016-10-13 22:28:50 +01:00
azusayamaguchi
8b0d171c9a 32bit issue on the KNL code variant where byte offsets were stored 2016-10-12 17:49:32 +01:00
azusayamaguchi
8bbd9ebc27 Reversing changes to Stencil class 2016-10-12 13:47:20 +01:00
azusayamaguchi
6472b431f0 __rdpmc needed for gcc, clang++ 2016-10-12 12:29:08 +01:00
azusayamaguchi
bd205a3293 Fixing for non x86 and non KNL 2016-10-12 12:09:15 +01:00
azusayamaguchi
496beffa88 Fix non-KNL build 2016-10-12 12:06:08 +01:00
azusayamaguchi
9b63e97108 align not absolutely required and confuses clang++ 2016-10-12 11:51:21 +01:00
azusayamaguchi
81f2aeaece KNL streaming stores, and KNL performance coutners 2016-10-12 11:45:22 +01:00
paboyle
2d4a45c758 Typecast pointer 2016-10-12 09:14:15 +01:00
paboyle
a123dcd7e9 Static required for shmem. Reading same object twice requires csum reset 2016-10-12 00:29:57 +01:00
paboyle
6b27c42dfe Cosmetic 2016-10-12 00:29:39 +01:00
paboyle
f7c2aa3ba5 runtime by default 2016-10-12 00:29:13 +01:00
paboyle
7240d73184 Parallelise the x faces; fix the segv on KNL with comms 2016-10-11 22:21:07 +01:00
paboyle
42cd148f5e Base pointer for comms buffer under AVX512 assembly 2016-10-11 16:06:06 +01:00
Guido Cossu
eda4dd622e Some more edit 2016-10-11 15:45:20 +01:00
paboyle
6e01264bb7 don't use static by default 2016-10-11 10:03:39 +01:00
paboyle
6f408256bc FMA4 option moved on the align 2016-10-11 10:03:01 +01:00
paboyle
8d11681aac verbose remove 2016-10-10 23:50:42 +01:00
paboyle
3d5c9a1ee9 No compile fix on clang++ 3.9 2016-10-10 23:50:13 +01:00
paboyle
dc389e467c axpy_ssp for any coeff type via template 2016-10-10 23:48:05 +01:00
paboyle
3619167d62 Mass parameter 2016-10-10 23:47:33 +01:00
paboyle
96f1d1b828 Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass). 2016-10-10 23:46:45 +01:00
paboyle
657e0a8f4d Mass parameter 2016-10-10 23:46:10 +01:00
paboyle
616e7cd83e Mass parameter 2016-10-10 23:45:48 +01:00
paboyle
6f26d2e8d4 Overlap tree level feynman rule 2016-10-10 23:45:18 +01:00
paboyle
c014574504 A "please implement me" feynman rule. If this were abstract virtual it would
require/force implementation
2016-10-10 23:44:00 +01:00
paboyle
d7ce164e6e Feynman rule for DWF 2016-10-10 23:43:36 +01:00
paboyle
c0d5b99016 Dminus 2016-10-10 23:43:19 +01:00
paboyle
09ca32d678 Dminus added for Cayley 2016-10-10 23:42:55 +01:00
paboyle
082ae350c6 static schedule by default 2016-10-10 23:42:30 +01:00
Guido Cossu
611b5d74ba Fix for AVX+FMA3 compilation 2016-10-10 15:26:17 +01:00
Guido Cossu
b56c9ffa52 Fix for AVXFMA 2016-10-10 14:43:37 +01:00
Guido Cossu
c68a2b9637 Minor fix 2016-10-10 11:54:58 +01:00
Guido Cossu
293df6cd20 Generalising the HMCRunner and moving parameters to the user level 2016-10-10 11:49:55 +01:00
Guido Cossu
65f61bb3bf Reset QCD colours to 3 2016-10-10 09:46:17 +01:00
Guido Cossu
26b9740d53 Some fix for the GenericHMCrunner 2016-10-10 09:43:05 +01:00
cb02b7088f Merge branch 'develop' into feature/doxygen
# Conflicts:
#	configure.ac
2016-10-09 13:35:44 +01:00
Guido Cossu
6eb873dd96 Added scalar action phi^4
Check Norm2 output (Complex type assumption)
2016-10-07 17:28:46 +01:00
Guido Cossu
11b4c80b27 Added support for hmc and binary IO for a general field 2016-10-07 13:37:29 +01:00
Guido Cossu
2e453dfbf5 Added some instrumentation to benchmark the force computation 2016-10-06 17:52:45 +01:00
Guido Cossu
c065e454c3 Adding Binrary IO, untested 2016-10-06 10:12:11 +01:00
paboyle
4089984431 Timing hooks 2016-10-06 09:25:12 +01:00
Guido Cossu
c78bbd0f8c Fix ASM compilation 2016-10-04 15:37:32 +01:00
Guido Cossu
d9b5fbd374 In the middle of adding a general binary writer 2016-10-04 11:24:08 +01:00
Guido Cossu
cfbc1a26b8 Now the gauge implementation has to take care of the Nexp 2016-10-03 16:20:06 +01:00
Guido Cossu
257f69f931 One more function to generalise the HMC integrator 2016-10-03 15:50:04 +01:00
Guido Cossu
e415260961 First cut on generalised HMC
Backward compatibility OK
2016-10-03 15:28:00 +01:00
536e2ff073 *.inc removed: please don't commit these files either! 2016-09-27 11:54:03 +01:00
paboyle
87acd06990 Use streaming stores 2016-09-26 10:11:34 +01:00
paboyle
9353b6edfe Fenv out of grid namespace 2016-09-26 10:09:13 +01:00
paboyle
167cc2650e GNU SOURCE problem on travis 2016-09-26 09:58:09 +01:00
paboyle
7089b6d5a5 Setting up but not implemented some QED rules 2016-09-26 09:43:40 +01:00
paboyle
2ba7d43ddd Divide handling 2016-09-26 09:43:14 +01:00
paboyle
836e929565 Divide handling improved 2016-09-26 09:42:22 +01:00
paboyle
b6713ecb60 Momentum space rules for Overlap, DWF untested to date 2016-09-26 09:39:09 +01:00
paboyle
52a39f0fcd Divide in ET 2016-09-26 09:38:38 +01:00
paboyle
81a7a03076 Integer << 2016-09-26 09:38:17 +01:00
paboyle
16b37b956c divide goes to ET 2016-09-26 09:37:59 +01:00
paboyle
567b6cf23f demangle moves to logging 2016-09-26 09:36:51 +01:00
paboyle
296396646d FPE's on macos set up 2016-09-26 09:36:14 +01:00
Guido Cossu
5c190a1b8c Merge branch 'develop' into feature/hirep 2016-09-23 11:06:06 +01:00
Guido Cossu
c4ac6e7e8f Consolidating HMC interface
Uniformed interface for standard action in fundamental rep and Hirep
2016-09-23 10:47:42 +01:00
Guido Cossu
510e340e16 Debugged last commit for the Two index representation 2016-09-22 22:16:21 +01:00
Guido Cossu
6ffadca153 Restored number of colours to 3 2016-09-22 14:22:54 +01:00
Guido Cossu
b6597b74e7 Added support for the Two index Symmetric and Antisymmetric representations
Tested for HMC convergence: OK
Added also a test file showing an example for mixed representations
2016-09-22 14:17:37 +01:00
a034e9901b Merge branch 'develop' into feature/hadrons 2016-09-20 13:49:33 +01:00
Antonin Portelli
0724f7af75 QPX single precision implementation 2016-09-19 18:09:12 +01:00
2e74520821 removed libtool use (BG/Q compatibility) 2016-09-16 15:25:49 +01:00
Antonin Portelli
6dd75ad9e5 Merge branch 'develop' of github.com:paboyle/Grid into feature/bgq 2016-09-16 15:07:54 +01:00
Guido Cossu
fda408ee6f Added first lines for supporting Two Index representations 2016-09-13 10:43:30 +01:00
Guido Cossu
b9c80318a2 Merge branch 'develop' into feature/hirep 2016-09-13 10:01:51 +01:00
Guido Cossu
5df5d52d41 Fix for the Intel compiler 2016-09-12 17:17:20 +01:00
Guido Cossu
f76f281e58 Cleaning files after fix 2016-09-09 11:34:25 +01:00
Guido Cossu
aa20cc8b52 Fixing compilation error with AVX512 flag 2016-09-09 02:58:52 -07:00
Guido Cossu
0fd179fb33 Merge branch 'develop' into feature/hirep 2016-09-01 12:59:53 +01:00
Guido Cossu
f45ef8d114 Minor modification in ActionBase.h 2016-09-01 11:46:46 +01:00
paboyle
8535d433a7 Cold or hot must support any precisoin 2016-08-31 00:27:53 +01:00
paboyle
b573d1f35a Wilson tree level added 2016-08-31 00:27:04 +01:00
paboyle
0c1d7e4daf Mom space prop for Wilson action 2016-08-31 00:26:36 +01:00
paboyle
02e983a0cd Momentum space prop and free prop convolution 2016-08-31 00:26:02 +01:00
paboyle
d15ab66aae FFT moves higher in include order 2016-08-31 00:25:22 +01:00
paboyle
9005b82c6d Multi dim FFT, and normalisation fix 2016-08-31 00:24:52 +01:00
paboyle
3475f45ce7 Demangle support for typeid stuff 2016-08-31 00:23:48 +01:00
paboyle
0744f38866 Demangle support is useful 2016-08-31 00:23:28 +01:00
Guido Cossu
fd5614738d Merge branch 'develop' into feature/hirep 2016-08-30 18:21:36 +01:00
Guido Cossu
b0d3e4bb2c Separating travis builds 2016-08-30 13:44:07 +01:00
Guido Cossu
b512ccbee6 HMC for Adjoint fermions works
Accepts and reproduces known results

Check initial instability of inverters
when starting from hot configurations
2016-08-30 11:31:25 +01:00
paboyle
8c89391c02 FFTW unresolved fixed when no fftw3.h 2016-08-24 16:41:47 +01:00
paboyle
bfac5195b8 tidy up 2016-08-24 16:38:36 +01:00
paboyle
744691097f Printing 2016-08-24 15:05:56 +01:00
paboyle
ff6da364e8 FFT double and single precision gives good performance now in multithreaded code. 2016-08-24 15:05:00 +01:00
4d11a6f5f2 first commit for QPX intrinsics 2016-08-23 14:41:44 +01:00
paboyle
88be3b39bb Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2016-08-22 18:29:36 +01:00
paboyle
356e7940fd fftw can be switched off 2016-08-22 16:24:49 +01:00
paboyle
73ce476890 Include fftw headers 2016-08-22 16:24:21 +01:00
paboyle
e423a09974 FFT improved and test_FFT passing under MPI 8 processes, 8^4 for LatticeComplexD and LatticeSpinMatrixD 2016-08-18 02:23:21 +01:00
paboyle
17097a93ec FFTW test ran over 4 mpi processes. 2016-08-17 01:33:55 +01:00
paboyle
4ab7dbfd57 Instantiate 2016-08-15 23:00:40 +01:00
paboyle
90e70790f3 Feature for z-Mobius prep 2016-08-15 22:31:29 +01:00
Guido Cossu
9c2e8d5e28 Nc=3 just to let all the test pass in Travis 2016-08-09 15:46:57 +01:00
Guido Cossu
147e2025b9 Added unit tests on the representation transformations
Status: Passing all tests
2016-08-08 16:54:22 +01:00
b1cfb4d661 first try at a nicer Doxygen implementation 2016-08-05 15:29:18 +01:00
paboyle
32bc7a6ab8 MPI back out of change that hangs
AVX2 for clang, gcc needs the -mfma flag.
2016-08-05 10:36:00 +01:00
7ff7c7d90d Merge branch 'develop' into feature/hadrons 2016-08-04 16:22:10 +01:00
93d29bb699 build system improvements after discussion with Peter 2016-08-04 16:19:59 +01:00
2485ef9c9c Merge branch 'feature/new-build' into feature/hadrons
# Conflicts:
#	Makefile.am
#	scripts/copyright
2016-08-03 16:49:16 +01:00
9e5b934d21 improved LAPACK configuration 2016-08-02 17:26:54 +01:00
Guido Cossu
49b5c49851 Checked the hermiticity of the op in derivative, ok
Still CG fails to converge
2016-07-31 12:37:33 +01:00
e9f30cab2c first working version for the new build system 2016-07-30 17:53:18 +01:00
Guido Cossu
089f0ab582 Debugged HMC for Creutz relation 2016-07-28 16:44:41 +01:00
Guido Cossu
b93e18ed50 Modified the Dirac Kernel class to compile with different number of colours
Added the general push_back functionality to accomodate for all defined representations

Compiles, not tested
2016-07-18 16:36:28 +01:00
Guido Cossu
9c77bb69a5 Added all elements for Hirep HMC
TODO: Test and debug
2016-07-18 12:05:23 +01:00
paboyle
f9e90eeb1f Sign error on the force for 4d fields fixed 2016-07-16 01:52:44 +01:00
paboyle
fad5c675eb sign error on the 4d gparity force 2016-07-16 01:51:56 +01:00
paboyle
4908b77d46 Fixed conflicts. PLEASE avoid making wholesale cosmetic only changes, this created
a HUGE amount of difficult to resolve and understand conflicts .

Wholesale formatting, reordering functions etc... in a central file like Tensor_class
or Grid_vector_types while others are also editing without making substantial functionality
changes creates pain.
2016-07-15 20:59:07 +01:00
paboyle
f4dd5062d7 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2016-07-15 19:26:06 +01:00
paboyle
980ff18956 Solving the instantiation no compile issue 2016-07-15 17:19:44 +01:00
Guido Cossu
7edf4c6c04 Added HMC utitities for the higher representations
TODO: Inherit types for the pseudofermions, Debugging, testing
2016-07-15 13:39:47 +01:00
paboyle
1a6c7204ac Disable instantiation; Use cache version instead 2016-07-15 00:34:39 +01:00
paboyle
49310fbab3 Done with red black change over 2016-07-15 00:08:43 +01:00
paboyle
5c0c8efb9e Updated file list 2016-07-15 00:02:11 +01:00
paboyle
dfd714e1ef Multiple implementations for the 5d hopping terms, depending on cache friendly
ops and/or the 5th direction being vectorised
All use 4d redblack.
2016-07-15 00:00:09 +01:00
paboyle
79a8ca1a62 Rewrite for performance. Impl dependent instantiations give
4d linalg impls of the 5d hopping terms (and inverse)
Cache friendly loop orderings of the above
Dense matrix stored and apply to the above

-- Switch to Ls vectorised, and use dense matrix approach for the MooeeInv
   and rotate/shift of the Mooee M5D routines.
2016-07-14 23:58:15 +01:00
paboyle
fb45eb2eb2 5d ls vec rename of impl class 2016-07-14 23:57:26 +01:00
paboyle
a307274c96 Fermion impl rename for ls vectorised 5d approaches 2016-07-14 23:56:13 +01:00
paboyle
3f2c44a5fe Updating the class to 5d selection based on impl type 2016-07-14 23:55:26 +01:00
paboyle
48fb1cdc11 Update domain 5d vectorised impl type, move the type over to 4d redblack with
the dense OO inverse
2016-07-14 23:54:35 +01:00
paboyle
8a79e93cc2 Rename the 5d domain wall fermion vectorised Ls impl class 2016-07-14 23:53:00 +01:00
paboyle
dd62a61c5c Added broadcast and rotation of simd vectors 2016-07-14 23:49:00 +01:00
paboyle
8f47d0b5ab Rotation needed for hopping term in fifth dim with Ls vectorised fields 2016-07-14 23:45:36 +01:00
paboyle
42af132dab Fix for chris kellys request to peek poke on checkerboarded fields 2016-07-14 23:44:48 +01:00
paboyle
adbc7c1188 Adding files for multiple implementations (cache opt) and Ls vectorisation
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.

The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.

This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.

Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00
Guido Cossu
9dc345e8e8 Debugged smearing and adding HMC functions for hirep 2016-07-13 17:51:18 +01:00
Christopher Kelly
6f47fbb1e2 Disabled parallel for loops in ExtractSlice and InsertSlice due to race conditions. Likely will need to do so for localConvert too. 2016-07-13 10:49:18 -04:00
Guido Cossu
a9ae30f868 Added representations definitions for the HMC 2016-07-12 13:36:10 +01:00
Christopher Kelly
a3c0fb79b6 Fix to iVector and iMatrix pokeIndex and checkerboard local site indexing. 2016-07-11 17:15:22 -04:00
paboyle
62601bb649 Bug fix 2016-07-08 20:46:29 +01:00
paboyle
ef97e32152 Adding persistent communicators 2016-07-08 17:16:08 +01:00
Guido Cossu
daea5297ee Wrote the projector in the adjoint representation algebra 2016-07-08 16:14:16 +01:00
Guido Cossu
5028969d4b Added generators for the adjoint representation 2016-07-08 15:40:11 +01:00
paboyle
a0676beeb1 Open up dependency on Eigen and FFTW 2016-07-07 22:31:07 +01:00
Christopher Kelly
c5106d0c03 Bugfix 2016-07-07 16:06:30 -04:00
Guido Cossu
fbf96b1bbb ]Merge branch 'develop' into feature/hirep 2016-07-07 14:20:10 +01:00
Guido Cossu
3c49ddfaa4 Merge branch 'temporary-smearing' into develop 2016-07-07 14:04:59 +01:00
Guido Cossu
ffb8b3116c Tested smeared RHMC Wilson1p1, accepting 2016-07-07 11:49:36 +01:00
Christopher Kelly
dd8cfff111 Another fix for pedantic compilers 2016-07-06 18:22:15 -04:00
Christopher Kelly
184642adb0 Fix for pedantic compilers 2016-07-06 18:15:15 -04:00
Christopher Kelly
4774a3bcd2 Generalized HotConfiguration and functions it calls to accept gauge fields with precision other than the default. 2016-07-06 18:01:08 -04:00
Christopher Kelly
25fafa9a89 Comment 2016-07-06 16:19:41 -04:00
Christopher Kelly
85ed8175cb Implemented mixed precision CG. Fixed filelist to exclude lib/Old directory and include Config.h. 2016-07-06 15:57:04 -04:00
Christopher Kelly
df5c788ef2 Merge branch 'develop' into feature/multi_prec 2016-07-06 14:52:28 -04:00
Christopher Kelly
15f22425c8 Added option to prevent CG from exiting when it fails to converge 2016-07-06 14:50:01 -04:00
Guido Cossu
e87182cf98 Debugged the copy constructor of the Lattice class 2016-07-06 15:31:00 +01:00
Guido Cossu
e3d5319470 Debugged the real() and imag() functions and added tests to Test_Simd 2016-07-06 14:16:03 +01:00
Guido Cossu
ffedeb1c58 Minor modifications 2016-07-06 11:41:27 +01:00
Guido Cossu
3e3b367aa9 Small changes in the Log files 2016-07-05 15:05:28 +01:00
Guido Cossu
3e80947c2b Cleaned up HMC output. Tested smeared HMCs for single precision (OK) 2016-07-05 12:03:54 +01:00
Guido Cossu
fdfbf11c6d Merge branch 'develop' into temporary-smearing 2016-07-04 18:45:10 +01:00
Guido Cossu
9cb90f714e Merge remote-tracking branch 'origin/develop' into temporary-smearing 2016-07-04 17:28:40 +01:00
Guido Cossu
2daffdf95d Tested smeared WilsonRatio action, accepts 2016-07-04 16:17:28 +01:00
Guido Cossu
149f826601 Tested smearing for Nf2 WilsonFermionAction, non EO: accepts 2016-07-04 16:09:19 +01:00
Guido Cossu
cd8ee27080 Simple change in iGamma for smearing 2016-07-04 16:02:57 +01:00
Guido Cossu
0fa66e8f3c Debugged smearing for EOWilson, accepts 2016-07-04 15:35:37 +01:00
Guido Cossu
8dd099267d Corrected a bug in the Expression Templates (acso and asin were wrong) 2016-07-03 12:28:25 +01:00
Guido Cossu
1a6d65c6a4 Converted set_uw and set_fj to all complex functions 2016-07-03 10:27:43 +01:00
paboyle
fc4a043663 Colors and banner clean up 2016-07-02 16:15:38 +01:00
Guido Cossu
092fa0d8da Debugged set_fj,
to be fixed: BUG in imag()
2016-07-01 16:06:20 +01:00
e0b7004f96 Merge branch 'master' into feature/hadrons 2016-07-01 15:54:34 +01:00
paboyle
680645f849 Merge branch 'release/v0.5.0' 2016-06-30 15:15:03 -07:00
paboyle
712b9a3489 Asm only for avx512 2016-06-30 14:35:02 -07:00
paboyle
bdaa5b1767 Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking. 2016-06-30 14:35:02 -07:00
paboyle
8fcefc021a Improved the prefetching when using cache blocking codes 2016-06-30 14:35:02 -07:00
paboyle
1445189361 COntrol the prefetch strategy 2016-06-30 14:35:02 -07:00
paboyle
05c884a62a Prefetch change 2016-06-30 14:35:01 -07:00
paboyle
a25bec87d9 Prefetch during save 2016-06-30 14:35:01 -07:00
paboyle
2d8bb4c594 Tweaks 2016-06-30 14:35:01 -07:00
paboyle
51cb2d4328 update file lists 2016-06-30 14:35:01 -07:00
paboyle
6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
Guido Cossu
565e9329ba Changed the colouring classes 2016-06-30 16:51:03 +01:00
Guido Cossu
5e02392f9c Fixed compilation error for benchmark_dwf
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
75fc295f6e Merge branch 'hadrons' into feature/hadrons 2016-06-14 17:51:15 +01:00
Richard Rollins
86187d7cca Removed write to stdout in constructor for MPI CartesianCommunicator 2016-06-14 15:34:20 +01:00
paboyle
87418e7df1 Slightly faster prefetching perf. 2016-06-13 02:32:52 -07:00
paboyle
55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
Azusa Yamaguchi
d9408893b3 Prefetching in the normal kernel implementation. 2016-06-08 05:43:48 -07:00
paboyle
8ac021de73 Added a test an fixed it for red black precon Ls innermost vectorised DWF 2016-06-07 13:16:56 -07:00
paboyle
e503ef5590 Cleaned up 2016-06-07 00:11:36 +01:00
paboyle
a7682b0060 Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS 2016-06-06 23:48:21 +01:00
paboyle
d4c9d71fc8 Merge branch 'master' of https://github.com/paboyle/Grid 2016-06-06 07:06:54 -07:00
paboyle
786ca52c43 Problems remain in the red black preconditioning of the Ls vectorisation 2016-06-06 07:05:51 -07:00
Peter Boyle
f78d89bcbe Update Lebesgue.cc
kill verbose
2016-06-03 13:33:42 +01:00
paboyle
53d06046b0 Compiling updates for KNL 2016-06-03 03:47:54 -07:00
paboyle
139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
1826ed06a3 Merge branch 'master' into hadrons 2016-05-27 16:50:31 +01:00
1c0e922585 Merge pull request #35 from aportelli/master
empty SIMD fix
2016-05-27 16:49:13 +01:00
9d5f693cbe empty SIMD fix 2016-05-24 10:56:27 +01:00
Peter Boyle
5c90c3b457 Merge pull request #34 from aportelli/master
Polymorphic lattices & various small updates
2016-05-24 10:50:04 +01:00
3ff96c502b Merge branch 'master' into hadrons 2016-05-12 19:24:18 +01:00
91e04056f9 fix of the empty SIMD 2016-05-12 19:24:10 +01:00
15a0908bfc Merge branch 'master' into hadrons 2016-05-12 18:35:46 +01:00
3789e3f31c additional fixed in slice functions 2016-05-12 18:35:38 +01:00
07f0b69784 Merge branch 'master' into hadrons 2016-05-12 13:02:18 +01:00
0c66719210 const fix in slice functions 2016-05-12 13:01:35 +01:00
362f255100 Hadrons: module parameters can now be accessed from outside 2016-05-12 11:59:28 +01:00
paboyle
3a5b5c8bec Save an old tar of tree 2016-05-12 03:20:17 -07:00
3d78ed03ef Merge branch 'master' into hadrons 2016-05-11 15:21:46 +01:00
4bc21ec7cb thread CL argument fix 2016-05-11 15:21:29 +01:00
e3083b6dfc Merge commit 'ab894186589224d570e0ecef8eea06443194a8ab' 2016-05-11 15:20:41 +01:00
paboyle
ab89418658 Precision change going in; useful for mixed precision algorithms for example. 2016-05-11 15:18:47 +01:00
paboyle
28cd99882c Subslicing 2016-05-11 15:06:54 +01:00
paboyle
aceaee774c ExtractSlice / InsertSlice for lower dimensional lattices where the lattice is not
distributed in the orthogonal direction.
Useful for fermion 4d/5d etc..
2016-05-11 14:12:02 +01:00
312637e5fb Merge branch 'master' into hadrons
# Conflicts:
#	lib/Log.h
2016-05-04 12:16:18 -07:00
101aa769eb LatticeBase contain the grid pointer and a virtual destructor to allow polymorphic lattice pointers 2016-05-04 12:15:31 -07:00
0bf99bfde5 log polish 2016-05-04 12:14:49 -07:00
64bf6fe54e macro to dump NERSC header to a stream 2016-05-04 12:14:38 -07:00
1161d566b9 minor code cleaning 2016-05-02 19:32:11 -07:00
d08d93c44c Merge branch 'master' into hadrons 2016-05-01 18:30:44 -07:00
c698b16d75 function to generate Chroma-style gamma matrix products 2016-05-01 18:30:35 -07:00
c4c89336fe SliceSum: shutting down warning about non-threaded code for now 2016-05-01 18:29:57 -07:00
fa59789580 ConjugateGradient: cleaner output 2016-05-01 18:29:20 -07:00
0ab10cdedb Merge branch 'master' into hadrons 2016-05-01 16:08:05 -07:00
92c2c7d3b5 SchurRedBlackDiagMooeeSolve: fix: guess was not initialised from input 2016-05-01 16:07:55 -07:00
e99ce0875f directly exit when using '--help' option 2016-05-01 16:05:16 -07:00
beb11fd4ef Merge branch 'master' into hadrons 2016-05-01 10:32:24 -07:00
paboyle
c23375cd65 Testing travis CI integration 2016-04-30 06:30:56 -07:00
paboyle
f7ca6ca889 Bernoulli reenabled -- using integral type for the discrete_distribution, but
then casts in the fill
2016-04-30 03:48:28 -07:00