portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-07-19 00:23:28 +01:00

Author	SHA1	Message	Date
paboyle	dfd714e1ef	Multiple implementations for the 5d hopping terms, depending on cache friendly ops and/or the 5th direction being vectorised All use 4d redblack.	2016-07-15 00:00:09 +01:00
paboyle	79a8ca1a62	Rewrite for performance. Impl dependent instantiations give 4d linalg impls of the 5d hopping terms (and inverse) Cache friendly loop orderings of the above Dense matrix stored and apply to the above -- Switch to Ls vectorised, and use dense matrix approach for the MooeeInv and rotate/shift of the Mooee M5D routines.	2016-07-14 23:58:15 +01:00
paboyle	fb45eb2eb2	5d ls vec rename of impl class	2016-07-14 23:57:26 +01:00
paboyle	a307274c96	Fermion impl rename for ls vectorised 5d approaches	2016-07-14 23:56:13 +01:00
paboyle	3f2c44a5fe	Updating the class to 5d selection based on impl type	2016-07-14 23:55:26 +01:00
paboyle	48fb1cdc11	Update domain 5d vectorised impl type, move the type over to 4d redblack with the dense OO inverse	2016-07-14 23:54:35 +01:00
paboyle	8a79e93cc2	Rename the 5d domain wall fermion vectorised Ls impl class	2016-07-14 23:53:00 +01:00
paboyle	adbc7c1188	Adding files for multiple implementations (cache opt) and Ls vectorisation of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely in the vector direction, s-hopping terms involve rotations. The serial dependence of the LDU inversion for Mobius and 4d even odd checkerboarding is removed by simply applying Ls^2 operations (vectorised many ways) as a dense matrix operation. This should give similar throughput but high flops (non-compulsory flops) but enable use of the KNL cache friendly kernels throughout the code. Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512 with single precision.	2016-07-14 22:59:21 +01:00
Guido Cossu	9dc345e8e8	Debugged smearing and adding HMC functions for hirep	2016-07-13 17:51:18 +01:00
Guido Cossu	a9ae30f868	Added representations definitions for the HMC	2016-07-12 13:36:10 +01:00
paboyle	ef97e32152	Adding persistent communicators	2016-07-08 17:16:08 +01:00
Guido Cossu	daea5297ee	Wrote the projector in the adjoint representation algebra	2016-07-08 16:14:16 +01:00
Guido Cossu	5028969d4b	Added generators for the adjoint representation	2016-07-08 15:40:11 +01:00
paboyle	a0676beeb1	Open up dependency on Eigen and FFTW	2016-07-07 22:31:07 +01:00
Guido Cossu	fbf96b1bbb	]Merge branch 'develop' into feature/hirep	2016-07-07 14:20:10 +01:00
Guido Cossu	3c49ddfaa4	Merge branch 'temporary-smearing' into develop	2016-07-07 14:04:59 +01:00
Guido Cossu	ffb8b3116c	Tested smeared RHMC Wilson1p1, accepting	2016-07-07 11:49:36 +01:00
Christopher Kelly	4774a3bcd2	Generalized HotConfiguration and functions it calls to accept gauge fields with precision other than the default.	2016-07-06 18:01:08 -04:00
Guido Cossu	e87182cf98	Debugged the copy constructor of the Lattice class	2016-07-06 15:31:00 +01:00
Guido Cossu	e3d5319470	Debugged the real() and imag() functions and added tests to Test_Simd	2016-07-06 14:16:03 +01:00
Guido Cossu	ffedeb1c58	Minor modifications	2016-07-06 11:41:27 +01:00
Guido Cossu	3e80947c2b	Cleaned up HMC output. Tested smeared HMCs for single precision (OK)	2016-07-05 12:03:54 +01:00
Guido Cossu	fdfbf11c6d	Merge branch 'develop' into temporary-smearing	2016-07-04 18:45:10 +01:00
Guido Cossu	9cb90f714e	Merge remote-tracking branch 'origin/develop' into temporary-smearing	2016-07-04 17:28:40 +01:00
Guido Cossu	2daffdf95d	Tested smeared WilsonRatio action, accepts	2016-07-04 16:17:28 +01:00
Guido Cossu	149f826601	Tested smearing for Nf2 WilsonFermionAction, non EO: accepts	2016-07-04 16:09:19 +01:00
Guido Cossu	cd8ee27080	Simple change in iGamma for smearing	2016-07-04 16:02:57 +01:00
Guido Cossu	0fa66e8f3c	Debugged smearing for EOWilson, accepts	2016-07-04 15:35:37 +01:00
Guido Cossu	8dd099267d	Corrected a bug in the Expression Templates (acso and asin were wrong)	2016-07-03 12:28:25 +01:00
Guido Cossu	1a6d65c6a4	Converted set_uw and set_fj to all complex functions	2016-07-03 10:27:43 +01:00
Guido Cossu	092fa0d8da	Debugged set_fj, to be fixed: BUG in imag()	2016-07-01 16:06:20 +01:00
paboyle	680645f849	Merge branch 'release/v0.5.0'	2016-06-30 15:15:03 -07:00
paboyle	712b9a3489	Asm only for avx512	2016-06-30 14:35:02 -07:00
paboyle	bdaa5b1767	Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking.	2016-06-30 14:35:02 -07:00
paboyle	8fcefc021a	Improved the prefetching when using cache blocking codes	2016-06-30 14:35:02 -07:00
paboyle	05c884a62a	Prefetch change	2016-06-30 14:35:01 -07:00
paboyle	2d8bb4c594	Tweaks	2016-06-30 14:35:01 -07:00
paboyle	6d58cb2a68	Enable reordering of the loops in the assembler for cache friendly. This gets in the way of L2 prefetching however. Do next next link in stencil prefetching.	2016-06-30 14:35:01 -07:00
Guido Cossu	565e9329ba	Changed the colouring classes	2016-06-30 16:51:03 +01:00
Guido Cossu	5e02392f9c	Fixed compilation error for benchmark_dwf Some parts were assuming floating point precision	2016-06-20 12:30:51 +01:00
paboyle	87418e7df1	Slightly faster prefetching perf.	2016-06-13 02:32:52 -07:00
paboyle	55f65b81b5	Improvements to the assembler interface that let us move chunks of the site and s loop into the kernels. This will save on function call overhead and guarantee L2 prefetching strategy is right since OMP can't distribute the sub-chunks of work.	2016-06-09 01:12:36 -07:00
Azusa Yamaguchi	d9408893b3	Prefetching in the normal kernel implementation.	2016-06-08 05:43:48 -07:00
paboyle	8ac021de73	Added a test an fixed it for red black precon Ls innermost vectorised DWF	2016-06-07 13:16:56 -07:00
paboyle	e503ef5590	Cleaned up	2016-06-07 00:11:36 +01:00
paboyle	a7682b0060	Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS	2016-06-06 23:48:21 +01:00
paboyle	53d06046b0	Compiling updates for KNL	2016-06-03 03:47:54 -07:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00
portelli	c698b16d75	function to generate Chroma-style gamma matrix products	2016-05-01 18:30:35 -07:00
paboyle	5341977948	IMCI fixes. Thought I had committed these. The "real" disambiguation between std::real and Grid::real shouldn't have been necessary and I don't know why only the icpc v16.0 on babbage hits it. May need a longer term rename of Grid::real or some careful EnableIf work.	2016-04-30 03:34:16 -07:00

1 2 3 4 5 ...