portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-06-19 18:33:43 +01:00

Author	SHA1	Message	Date
Peter Boyle	c5f93abcd7	GPU clean up	2018-05-14 19:40:33 -04:00
paboyle	db988301d0	Introduce view objects for indexing lattices. Used to pass the view to acccelerators	2018-03-04 15:55:16 +00:00
paboyle	eed9aa9f0c	Extract merge gpu ready	2018-02-24 22:23:01 +00:00
paboyle	e657f9a344	OMP collapse changes to make NVCC happy	2018-01-28 01:21:53 +00:00
paboyle	70e276e1ab	parallel_for elimination -> thread_loop	2018-01-28 01:01:14 +00:00
paboyle	c4f82e072b	_grid becomes private ; use Grid()§	2018-01-27 00:04:12 +00:00
paboyle	32523a229c	Hide internals	2018-01-26 23:08:02 +00:00
paboyle	5609624b44	Threading constructs replaced	2018-01-24 13:32:24 +00:00
paboyle	6bf5fb1924	Clean up and format NAMESPACE	2018-01-13 00:08:25 +00:00
paboyle	54e94360ad	Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit	2017-06-24 23:10:24 +01:00
paboyle	180c732b4c	Move compressors out of Cshift. Slice iterators would help	2017-04-20 13:17:55 +01:00
paboyle	4a340aa5ca	Massive compressor rework to support reduced precision comms	2017-04-20 09:28:27 +01:00
paboyle	4e7ab3166f	Refactoring header layout	2017-02-22 18:09:33 +00:00
paboyle	3ae92fa2e6	Global changes to parallel_for structure. Move the comms flags to more sensible names	2017-02-21 05:24:27 -05:00
paboyle	41009cc142	Move excange into the stencil only; keep Cshift fully general	2017-02-20 17:48:04 -05:00
paboyle	8a29c16bde	Faster gather exchange	2017-02-16 23:52:22 +00:00
paboyle	bd600702cf	Vectorise the XYZT face gathering better. Hard coded for simd_layout <= 2 in any given spread out direction; full generality is inconsistent with efficiency.	2017-02-15 11:11:04 +00:00
paboyle	85c7bc4321	Bug fixes for cases that physics code couldn't hit but latent and discovered on KNL (long vector, y SIMD dir) and checker dir set to y. Remove the assertions on these code paths now they are tested.	2017-02-07 01:01:15 -05:00
paboyle	4f8e636a43	commVector	2016-10-20 16:59:16 +01:00
paboyle	9b39f35ae6	commVector different for SHMEM compat	2016-10-20 16:58:53 +01:00
paboyle	7240d73184	Parallelise the x faces; fix the segv on KNL with comms	2016-10-11 22:21:07 +01:00
paboyle	7223753355	Rotate in a direction > 2 for simd_layout	2016-04-19 15:35:15 -07:00
paboyle	db5e8050a8	Attempts at some optimisation	2016-02-18 22:33:58 +00:00
Peter Boyle	c9fadf97a5	Simplify the compressor interface again.	2016-02-17 18:16:45 -06:00
Peter Boyle	c650bb3f3d	Very small merge speed up.	2016-02-16 18:41:53 -06:00
Peter Boyle	41c2b09184	Shmem comms [NO MPI] target added. The dwf test runs and passes. Not really shaken out to my satisfaction though as I want more tests done, so don't declare as working. But committing my current while I try a few experimentals.	2016-02-14 14:24:38 -06:00
paboyle	d19321dfde	Overlap comms compute changes	2016-01-10 19:20:16 +00:00
paboyle	aae8bf31a7	Global edit adding copyright and license info to every source file.	2016-01-02 14:51:32 +00:00
paboyle	145a295231	Bug fix for stencil with large shifts (3+), would be important to naik term for example but did not impact Wilson based nearest neighbour stencils.	2015-12-30 19:29:48 +00:00
Peter Boyle	473fa28a6c	Partial optimisation; comms in x-dir for red black dslash will be slow as the checker skipping block strided loops are non threadable. Will need to write a kernel for these instead and drive them with a lookup table to make a look sufficiently simple to thread.	2015-11-06 05:23:23 -06:00
Peter Boyle	12c5ec813c	Useful debug messages (commented out) are included for preservation in case I need to revisit this	2015-11-04 09:59:27 +00:00
Peter Boyle	1271508ca2	Bug fix for spread out in x (EO) direction. This is really annoying -- it is very hard to thread the loops with the index recursion on buffer offset in the red-black case. Must think of a good threading solution here.	2015-11-04 09:57:57 +00:00
Peter Boyle	0a9ebac514	Gparity modifications in the Gparity compressor variant.	2015-08-11 06:22:20 +01:00
Peter Boyle	1d0df449e8	Reorganise of file naming	2015-06-03 12:47:05 +01:00
Azusa Yamaguchi	b00a40dd65	Const safety	2015-06-01 12:25:59 +01:00
Azusa Yamaguchi	12c2562b96	No compile fix on mpi target	2015-05-31 22:50:03 +01:00
Peter Boyle	5644ab1e19	Large scale change to support 5d fermion formulations. Have 5d replicated wilson with 4d gauge working and matrix regressing to Ls copies of wilson.	2015-05-31 15:09:02 +01:00
Peter Boyle	67fa5691e5	Weak scale the benchmarks automatically.	2015-05-28 13:47:01 +01:00
neo	da46b56e85	Adding support for doxygen generation	2015-05-27 10:34:56 +09:00
neo	1a24801246	checked performance of new vector libaries. Added check for c++11 support on the configure.ac	2015-05-26 12:02:54 +09:00
neo	9e29ac6549	Completed implementation of new Grid_simd classes Tested performance for SSE4, Ok. AVX1/2, AVX512 yet untested	2015-05-22 17:33:15 +09:00
Peter Boyle	b00622302b	gcc doesn't like collapse(2) for some reason I can't figure	2015-05-15 11:36:22 +01:00
Peter Boyle	48f425d31c	I have made the Cshift work successfully with open mp threading in every routine. Collapse(2) is now working under clang-omp++.	2015-05-13 00:31:00 +01:00
Peter Boyle	6103c29ee3	Threading support rework. Placed parallel pragmas as macros; implemented deterministic thread reduction in style of BFM.	2015-05-12 07:51:41 +01:00
Peter Boyle	5555a852be	Lots of changes required to compile for MIC under ICPC	2015-05-10 23:29:21 +01:00
Peter Boyle	25d523c0f4	Shaken out stencil to the point where I think wilson dslash is correct. Need to audit code carefully, consolidate between stencil and cshift, and then benchmark and optimise.	2015-04-28 08:11:59 +01:00
Peter Boyle	f159495a9d	Reworking CSHIFT and Stencil. Implementing Wilson and discovered rework is required	2015-04-27 13:45:07 +01:00
Peter Boyle	b32c14b433	Got the NERSC IO working and fixed a bug in cshift.	2015-04-22 22:46:48 +01:00
Peter Boyle	e5a25dfcb1	Build reorg with which I am a bit happier	2015-04-18 21:22:50 +01:00

49 Commits