portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2024-11-14 09:45:36 +00:00

Author	SHA1	Message	Date
paboyle	73bb2d5128	Ugly hack to speed up compile on GPU; we don't use the hand kernels on GPU anyway so why compile	2018-06-13 20:35:28 +01:00
Peter Boyle	b15db11c60	Kernels -> pure static object to enable device execution	2018-03-24 19:35:20 -04:00
Peter Boyle	607dc2d3c6	Remove lebesgue order	2018-03-22 18:23:09 -04:00
Peter Boyle	8a1d303ab9	GPU friendly stencil improvements	2018-03-19 07:11:03 -04:00
paboyle	3277bda130	View introduction to prepare for accelerator offload. Probably same problem exists for stencil object	2018-03-04 16:38:08 +00:00
paboyle	aa6de818e2	Copy data needed by Kernels out of the grid object to avoid host reference	2018-02-02 11:36:11 +00:00
paboyle	2d0bcc2606	Zero changes, acceleartor on kernels and some thread loop changes	2018-01-27 23:47:38 +00:00
paboyle	c4f82e072b	_grid becomes private ; use Grid()§	2018-01-27 00:04:12 +00:00
paboyle	85771e97e9	Hide internal data	2018-01-26 23:04:46 +00:00
paboyle	87ee592176	Pragma changes and layout and warning elimination for nvcc	2018-01-24 13:14:09 +00:00
paboyle	71c8c9e4fb	Pretty	2018-01-14 23:03:01 +00:00
paboyle	a935ef7b39	Namespace	2018-01-14 23:01:07 +00:00
Christopher Kelly	59bd1fe21b	Fix for 'perm' and 'local' not being set for hand-unrolled external-site Dslash, which caused incorrect behavior of G-parity kernel	2017-08-29 13:07:37 -07:00
Christopher Kelly	f365a83fae	In G-parity unrolled kernel, replaced calls to permute and exchange with run-time-evaluated permute type with explicit calls to appropriate underlying functions	2017-08-25 14:24:11 -04:00
Christopher Kelly	34a9aeb331	Reduced number of if-statement evaluations in G-parity unrolled kernel	2017-08-24 13:53:50 -07:00
Christopher Kelly	ce5df177ee	Removed superfluous implementation of G-parity twist for hand-unrolled kernel from GparityWilsonImpl	2017-08-23 15:05:22 -04:00
Christopher Kelly	a0bb8e5b46	Added hand-unrolled kernel implementations of all the other dslash precision / comms precision combinations with G-parity	2017-08-23 14:44:40 -04:00
Christopher Kelly	46f88e6d72	G-parity hand-unrolled intrinsics twist now uses one less permute and one less temporary	2017-08-23 13:21:10 -04:00
Christopher Kelly	b61835c1a5	Added inplace version of intrinsic G-parity twist to hand-unrolled kernel	2017-08-23 12:33:48 -04:00
Christopher Kelly	ab50145001	Implemented first, unoptimized version of hand-unrolled G-parity kernels Improved Test_gparity	2017-08-22 17:12:25 -04:00
paboyle	f301be94ce	Fixed	2017-04-22 17:42:31 +01:00
Peter Boyle	1d1b225497	Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).	2017-04-22 09:05:28 -04:00
paboyle	736bf3c866	Major rework of stencil. Half precision and MPI3 now working.	2017-04-22 11:33:50 +01:00
paboyle	fc4ab9ccd5	Working half precision comms	2017-04-20 11:20:26 +01:00
paboyle	4e7ab3166f	Refactoring header layout	2017-02-22 18:09:33 +00:00
paboyle	2c246551d0	Overlap comms and compute options in wilson kernels	2017-02-07 01:37:10 -05:00
azusayamaguchi	c190221fd3	Internal SHM comms in non-simd directions working Need to fix simd directions	2016-10-22 18:14:27 +01:00
paboyle	b58adc6a4b	commVector	2016-10-20 17:00:15 +01:00
Guido Cossu	0fd179fb33	Merge branch 'develop' into feature/hirep	2016-09-01 12:59:53 +01:00
Guido Cossu	fd5614738d	Merge branch 'develop' into feature/hirep	2016-08-30 18:21:36 +01:00
paboyle	4ab7dbfd57	Instantiate	2016-08-15 23:00:40 +01:00
Guido Cossu	b93e18ed50	Modified the Dirac Kernel class to compile with different number of colours Added the general push_back functionality to accomodate for all defined representations Compiles, not tested	2016-07-18 16:36:28 +01:00
paboyle	8a79e93cc2	Rename the 5d domain wall fermion vectorised Ls impl class	2016-07-14 23:53:00 +01:00
paboyle	6d58cb2a68	Enable reordering of the loops in the assembler for cache friendly. This gets in the way of L2 prefetching however. Do next next link in stencil prefetching.	2016-06-30 14:35:01 -07:00
paboyle	55f65b81b5	Improvements to the assembler interface that let us move chunks of the site and s loop into the kernels. This will save on function call overhead and guarantee L2 prefetching strategy is right since OMP can't distribute the sub-chunks of work.	2016-06-09 01:12:36 -07:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00
paboyle	9b6ab6db16	simd in 5th dimension support	2016-04-19 15:38:01 -07:00
paboyle	165bffc2e7	Avx512 changes for assembler kernels	2016-03-26 22:25:45 -06:00
paboyle	3425751cb8	Missing return value	2016-02-19 01:06:03 +00:00
Peter Boyle	81395e85d1	Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it.	2016-02-16 13:56:44 -06:00
Peter Boyle	a0fc47c6f9	Cheaper implementation	2016-02-15 16:02:36 -06:00
neo	6371676a75	Correcting some compilation errors for clang-sse	2016-02-10 11:37:03 +09:00
paboyle	fc6ad65751	Pushed the overlap comms tweaks	2016-01-11 06:34:22 -08:00
paboyle	dafc74020c	Overlap comms compute improvements in hand op kernels, and better timing from Edison and Cori	2016-01-10 16:54:27 -08:00
paboyle	331768dcff	Added overlap comms compute mode	2016-01-03 01:38:11 +00:00
paboyle	aae8bf31a7	Global edit adding copyright and license info to every source file.	2016-01-02 14:51:32 +00:00
paboyle	34a0fde2ad	Fixes to fermion force terms after sign of gamma_mu (0...3) change. Thought I had already committed these. Believe I have got the Gparity fermion force working. * tests/Test_gpdwf_force.cc -- correctly predicts dS for two flavour pseudofermion based on a small dt update of U field. * tests/Test_hmc_EODWFRatio_Gparity.cc -- ran 1 trajectory on 8^4 with dH=0.21. Need to accumulate a full plaquette log to believe fully which will take some hours of run time.	2015-12-15 23:14:12 +00:00
paboyle	3ce10aa975	Fix a regression failure on Mobius; chroma regression added	2015-12-10 22:55:00 +00:00
Peter Boyle	28022755ae	Stencil class name global change to StencilImpl typedef	2015-11-06 05:30:17 -06:00
Peter Boyle	64d64d1ab6	Updating to modify non-inlining permute routines and hopefully get better reg use and enhance performance.	2015-09-25 08:55:04 -07:00

1 2

55 Commits