portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2024-11-16 02:35:36 +00:00

Author	SHA1	Message	Date
Peter Boyle	1584e17b54	Revert to fast versoin	2018-05-02 14:10:55 +01:00
pretidav	6a15e2e8ef	Added WilsonTwoIndexAntiSymmImpl instantiation in WilsonKernelsHand.cc (shoud not be necessary)	2017-11-12 14:16:19 +01:00
pretidav	59d9ccf70c	restored WilsonKernelsHand.cc and added Qtop to production codes	2017-11-08 22:02:32 +01:00
pretidav	a493429218	added Production tests for MixedRep, Adj, 2S, 2AS. Still missing QObs. The HMC is not printing correctly all the actions and forces.	2017-11-04 18:16:54 +01:00
Christopher Kelly	59bd1fe21b	Fix for 'perm' and 'local' not being set for hand-unrolled external-site Dslash, which caused incorrect behavior of G-parity kernel	2017-08-29 13:07:37 -07:00
Christopher Kelly	f365a83fae	In G-parity unrolled kernel, replaced calls to permute and exchange with run-time-evaluated permute type with explicit calls to appropriate underlying functions	2017-08-25 14:24:11 -04:00
Christopher Kelly	34a9aeb331	Reduced number of if-statement evaluations in G-parity unrolled kernel	2017-08-24 13:53:50 -07:00
Christopher Kelly	ce5df177ee	Removed superfluous implementation of G-parity twist for hand-unrolled kernel from GparityWilsonImpl	2017-08-23 15:05:22 -04:00
Christopher Kelly	a0bb8e5b46	Added hand-unrolled kernel implementations of all the other dslash precision / comms precision combinations with G-parity	2017-08-23 14:44:40 -04:00
Christopher Kelly	46f88e6d72	G-parity hand-unrolled intrinsics twist now uses one less permute and one less temporary	2017-08-23 13:21:10 -04:00
Christopher Kelly	b61835c1a5	Added inplace version of intrinsic G-parity twist to hand-unrolled kernel	2017-08-23 12:33:48 -04:00
Christopher Kelly	ab50145001	Implemented first, unoptimized version of hand-unrolled G-parity kernels Improved Test_gparity	2017-08-22 17:12:25 -04:00
paboyle	f301be94ce	Fixed	2017-04-22 17:42:31 +01:00
Peter Boyle	1d1b225497	Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).	2017-04-22 09:05:28 -04:00
paboyle	736bf3c866	Major rework of stencil. Half precision and MPI3 now working.	2017-04-22 11:33:50 +01:00
paboyle	fc4ab9ccd5	Working half precision comms	2017-04-20 11:20:26 +01:00
paboyle	4e7ab3166f	Refactoring header layout	2017-02-22 18:09:33 +00:00
paboyle	2c246551d0	Overlap comms and compute options in wilson kernels	2017-02-07 01:37:10 -05:00
azusayamaguchi	c190221fd3	Internal SHM comms in non-simd directions working Need to fix simd directions	2016-10-22 18:14:27 +01:00
paboyle	b58adc6a4b	commVector	2016-10-20 17:00:15 +01:00
Guido Cossu	0fd179fb33	Merge branch 'develop' into feature/hirep	2016-09-01 12:59:53 +01:00
Guido Cossu	fd5614738d	Merge branch 'develop' into feature/hirep	2016-08-30 18:21:36 +01:00
paboyle	4ab7dbfd57	Instantiate	2016-08-15 23:00:40 +01:00
Guido Cossu	b93e18ed50	Modified the Dirac Kernel class to compile with different number of colours Added the general push_back functionality to accomodate for all defined representations Compiles, not tested	2016-07-18 16:36:28 +01:00
paboyle	8a79e93cc2	Rename the 5d domain wall fermion vectorised Ls impl class	2016-07-14 23:53:00 +01:00
paboyle	6d58cb2a68	Enable reordering of the loops in the assembler for cache friendly. This gets in the way of L2 prefetching however. Do next next link in stencil prefetching.	2016-06-30 14:35:01 -07:00
paboyle	55f65b81b5	Improvements to the assembler interface that let us move chunks of the site and s loop into the kernels. This will save on function call overhead and guarantee L2 prefetching strategy is right since OMP can't distribute the sub-chunks of work.	2016-06-09 01:12:36 -07:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00
paboyle	9b6ab6db16	simd in 5th dimension support	2016-04-19 15:38:01 -07:00
paboyle	165bffc2e7	Avx512 changes for assembler kernels	2016-03-26 22:25:45 -06:00
paboyle	3425751cb8	Missing return value	2016-02-19 01:06:03 +00:00
Peter Boyle	81395e85d1	Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it.	2016-02-16 13:56:44 -06:00
Peter Boyle	a0fc47c6f9	Cheaper implementation	2016-02-15 16:02:36 -06:00
neo	6371676a75	Correcting some compilation errors for clang-sse	2016-02-10 11:37:03 +09:00
paboyle	fc6ad65751	Pushed the overlap comms tweaks	2016-01-11 06:34:22 -08:00
paboyle	dafc74020c	Overlap comms compute improvements in hand op kernels, and better timing from Edison and Cori	2016-01-10 16:54:27 -08:00
paboyle	331768dcff	Added overlap comms compute mode	2016-01-03 01:38:11 +00:00
paboyle	aae8bf31a7	Global edit adding copyright and license info to every source file.	2016-01-02 14:51:32 +00:00
paboyle	34a0fde2ad	Fixes to fermion force terms after sign of gamma_mu (0...3) change. Thought I had already committed these. Believe I have got the Gparity fermion force working. * tests/Test_gpdwf_force.cc -- correctly predicts dS for two flavour pseudofermion based on a small dt update of U field. * tests/Test_hmc_EODWFRatio_Gparity.cc -- ran 1 trajectory on 8^4 with dH=0.21. Need to accumulate a full plaquette log to believe fully which will take some hours of run time.	2015-12-15 23:14:12 +00:00
paboyle	3ce10aa975	Fix a regression failure on Mobius; chroma regression added	2015-12-10 22:55:00 +00:00
Peter Boyle	28022755ae	Stencil class name global change to StencilImpl typedef	2015-11-06 05:30:17 -06:00
Peter Boyle	64d64d1ab6	Updating to modify non-inlining permute routines and hopefully get better reg use and enhance performance.	2015-09-25 08:55:04 -07:00
Peter Boyle	2f38ebc446	Reintroducing the hand unrolled loops	2015-09-08 17:45:30 +01:00
Peter Boyle	84a66476ab	Rework/global edit to enforce type templating of fermion operators. Allows multi-precision work and paves the way for alternate BC's and such like allowing for example G-parity which is important for K pipi programme. In particular, can drive an extra flavour index into the fermion fields using template types.	2015-08-10 20:47:44 +01:00
Peter Boyle	d1afebf71e	Sizable improvement in multigrid for unsquared. 6000 matmuls CG unprec 2000 matmuls CG prec (4000 eo muls) 1050 matmuls PGCR on 16^3 x 32 x 8 m=.01 Substantial effort on timing and logging infrastructure	2015-07-24 01:31:13 +09:00
Peter Boyle	98c817df1b	big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to near the bleeding edge I guess	2015-06-30 15:03:11 +01:00
Peter Boyle	5644ab1e19	Large scale change to support 5d fermion formulations. Have 5d replicated wilson with 4d gauge working and matrix regressing to Ls copies of wilson.	2015-05-31 15:09:02 +01:00

47 Commits