portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2025-06-17 15:27:06 +01:00

Author	SHA1	Message	Date
Peter Boyle	f617468e04	Update Lattice_base.h	2024-10-11 10:39:16 -04:00
Peter Boyle	2c9878fc3a	Merge branch 'develop' of https://github.com/paboyle/Grid into develop	2024-08-27 12:05:46 -04:00
Peter Boyle	3668e81c5e	Extract slice working on checkerboard field for Block Lanczos	2024-08-27 11:31:30 -04:00
Peter Boyle	d66b2423cb	Move slice operations to GPU for BlockCG	2024-08-27 11:28:47 -04:00
Peter Boyle	15cc78f0b6	peek/poke local site on checkerboard arrays	2024-08-27 11:23:42 -04:00
Peter Boyle	06db4ddea2	Fast init on GPU	2024-08-27 11:22:33 -04:00
Peter Boyle	a3322b470f	Merge branch 'develop' of https://github.com/paboyle/Grid into develop	2024-08-20 14:30:52 +00:00
Peter Boyle	6f1328160c	Remove SVM use	2024-07-25 18:05:40 +00:00
Peter Boyle	a66973163f	Device vector not UVM	2024-07-11 15:24:11 +00:00
Peter Boyle	9563238e9b	Force initial to identity	2024-06-11 17:51:58 +00:00
Peter Boyle	1739146599	Property to initialise reduction	2024-06-11 16:47:35 +00:00
Peter Boyle	a49a161f8d	SYCL update to use buffer on reduction variable	2024-06-08 16:05:18 +00:00
Peter Boyle	cfe1b13225	Back out zero change	2024-05-21 01:14:08 +01:00
Peter Boyle	832fc08809	Merge pull request #459 from dbollweg/sycl_slicesum_update Sycl slicesum bugfix	2024-05-20 15:06:53 -04:00
Peter Boyle	5c3ace7c3e	Merge branch 'develop' into feature/scidac-wp1	2024-04-30 05:26:06 -04:00
Peter Boyle	57552d8ca3	Assign from non-lattice made accelerator resident	2024-04-05 01:05:12 -04:00
Peter Boyle	d1e9fe50d2	Xor csum for repro testing	2024-03-22 15:42:57 +00:00
Peter Boyle	e49e95b037	Upgrade of the Britney test with flight recorder and fast xor checksum	2024-03-22 15:39:27 +00:00
Peter Boyle	fab1efb48c	More britney logging improvements	2024-03-19 14:36:21 +00:00
dbollweg	461cd045c6	sliceSum cleanup	2024-03-13 18:18:44 -04:00
dbollweg	fee65d7a75	Merge branch 'paboyle:develop' into sycl_slicesum_update	2024-03-13 18:06:17 -04:00
dbollweg	31f9971dbf	avoid PI_ERROR_OUT_OF_RESOURCES in sycl sliceSum	2024-03-13 13:39:26 -04:00
Peter Boyle	95f3d69cf9	Extra hardware test hook	2024-03-12 20:09:37 +00:00
Peter Boyle	cf8632bbac	Britney test option	2024-03-12 15:15:35 +00:00
dbollweg	d87296f3e8	Merge branch 'develop' of https://github.com/dbollweg/Grid into develop	2024-03-06 16:54:22 -05:00
dbollweg	be94cf1c6f	Fewer wait-calls in sycl slicesum	2024-03-06 16:53:13 -05:00
Peter Boyle	cc04dc42dc	Merge branch 'develop' into feature/scidac-wp1	2024-03-06 14:55:21 -05:00
Peter Boyle	976c3e9b59	Hack for flight logging CG inner products. Can be made to work, but could put in some more serious infrastructure for repro testing and blame attribution (Britney test) if necessary	2024-03-05 23:59:57 +00:00
Peter Boyle	3f1636637d	Merge pull request #453 from dbollweg/feature/sliceSum_gpu Feature/slice sum gpu	2024-02-28 14:04:43 -05:00
Christoph Lehner	22b43b86cb	Make GPT test suite work with SYCL	2024-02-28 12:57:17 +01:00
dbollweg	3c9012676a	CUDA cub refuses to reduce vSpinColourMatrix, breaking up into smaller parts like already done for HIP case.	2024-02-27 12:41:45 -05:00
Dennis Bollweg	6cd2d8fcd5	Replace cuda/hip memcpy with Grid functions	2024-02-26 09:55:07 -05:00
dbollweg	0a816b5509	Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu	2024-02-22 21:43:06 -05:00
dbollweg	1c8b807c2e	free malloc'd memory	2024-02-22 21:42:44 -05:00
Peter Boyle	44b466e072	Make InsertSliceFast the default at some point in future. Should I do this now?	2024-02-21 14:51:24 -05:00
Christoph Lehner	66391f84f2	Merge branch 'feature/gpt' of ../Grid into develop	2024-02-21 19:05:00 +01:00
Dennis Bollweg	15878f7613	sliceSumReduction_cub_large now also faster than CPU on Frontier	2024-02-16 13:55:21 -05:00
dbollweg	6f3455900e	Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs	2024-02-16 13:15:02 -05:00
dbollweg	09af8c25a2	Merge branch 'paboyle:develop' into feature/sliceSum_gpu	2024-02-09 13:02:59 -05:00
dbollweg	9514035b87	refactor slicesum: slicesum uses GPU version by default now	2024-02-09 13:02:28 -05:00
Peter Boyle	7019916294	RNG seed change safer for large volumes; this is a long term solution	2024-02-07 00:56:39 +00:00
dbollweg	1514b4f137	slicesum_sycl passes test	2024-02-06 19:08:44 -05:00
dbollweg	ab2de131bd	work towards sliceSum for sycl backend	2024-02-06 13:24:45 -05:00
Dennis Bollweg	5af8da76d7	Fix cuda compilation of Lattice_slicesum_gpu.h	2024-02-01 18:02:30 -05:00
Dennis Bollweg	b8b9dc952d	Async memcpy's and cleanup	2024-02-01 17:55:35 -05:00
Dennis Bollweg	79a6ed32d8	Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies	2024-02-01 16:41:03 -05:00
dbollweg	caa5f97723	Add sliceSum gpu using cub/hipcub	2024-01-31 16:50:06 -05:00
Peter Boyle	addc638856	Fast localCopyRegion, blockProjectFast	2024-01-22 17:40:38 -05:00
Peter Boyle	ca5ae8a2e6	Revert to working.	2024-01-17 16:32:05 -05:00
Peter Boyle	b7c7000d0d	Don't need the numerical rounding tolerance in multigrid	2023-12-22 18:10:23 -05:00

1 2 3 4 5 ...

268 Commits