portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-06-18 09:53:43 +01:00

Author	SHA1	Message	Date
paboyle	10116b3be8	Force device copyable and tell SYCL to shut it.	2024-03-06 01:13:27 +00:00
paboyle	a46a0f0882	force device copyable and don't take crap from SYCL	2024-03-06 01:12:49 +00:00
paboyle	1b93a9be88	Print out the hostname	2024-03-06 00:01:58 +00:00
paboyle	783a66b348	Deterministic reduction please	2024-03-06 00:01:37 +00:00
paboyle	976c3e9b59	Hack for flight logging CG inner products. Can be made to work, but could put in some more serious infrastructure for repro testing and blame attribution (Britney test) if necessary	2024-03-05 23:59:57 +00:00
paboyle	f8ca971dae	Use of a bare PRECISION macro is not namespace safe and collides with SYCL	2024-03-05 23:59:13 +00:00
paboyle	21bc8c24df	OneMKL batched blas starting	2024-03-05 23:58:20 +00:00
paboyle	30228214f7	SYCL conflict with Eigen	2024-03-05 23:56:10 +00:00
Peter Boyle	c805f86343	USQCD benchmark	2024-03-01 00:05:04 -05:00
Peter Boyle	88d8fa43d7	Benchmark development	2024-02-29 20:01:44 -05:00
Peter Boyle	3c49762875	Propagate in the blas routine	2024-02-29 15:33:06 -05:00
Peter Boyle	436bf1d9d3	Merge pull request #455 from clarkedavida/hisq_fat_links Hisq fat links	2024-02-29 15:29:39 -05:00
david clarke	f70df6e195	changed NO_SHIFT and BACKWARD_CONST from define to enum	2024-02-29 12:29:30 -07:00
Peter Boyle	ee1b8bbdbd	Merge pull request #454 from edbennett/adjoint-broke fix HMC for non-fundamental representations	2024-02-28 14:05:27 -05:00
Peter Boyle	3f1636637d	Merge pull request #453 from dbollweg/feature/sliceSum_gpu Feature/slice sum gpu	2024-02-28 14:04:43 -05:00
Christoph Lehner	9f89486df5	remove unnecessary code path	2024-02-28 19:56:23 +01:00
Christoph Lehner	22b43b86cb	Make GPT test suite work with SYCL	2024-02-28 12:57:17 +01:00
dbollweg	3c9012676a	CUDA cub refuses to reduce vSpinColourMatrix, breaking up into smaller parts like already done for HIP case.	2024-02-27 12:41:45 -05:00
Dennis Bollweg	6cd2d8fcd5	Replace cuda/hip memcpy with Grid functions	2024-02-26 09:55:07 -05:00
david clarke	b02d022993	fixed race condition (thx michael)	2024-02-23 17:14:28 -07:00
david clarke	94581e3c7a	accelerator_for is broken	2024-02-23 15:58:33 -07:00
david clarke	88b52cc045	Merge branch 'develop' into hisq_fat_links	2024-02-23 14:47:15 -07:00
dbollweg	0a816b5509	Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu	2024-02-22 21:43:06 -05:00
dbollweg	1c8b807c2e	free malloc'd memory	2024-02-22 21:42:44 -05:00
Christoph Lehner	66391f84f2	Merge branch 'feature/gpt' of ../Grid into develop	2024-02-21 19:05:00 +01:00
edbennett	97f7a9ecb3	fix HMC for non-fundamental representations	2024-02-21 08:27:55 +00:00
Dennis Bollweg	15878f7613	sliceSumReduction_cub_large now also faster than CPU on Frontier	2024-02-16 13:55:21 -05:00
dbollweg	e0d5e3c6c7	Merge branch 'paboyle:develop' into feature/sliceSum_gpu	2024-02-16 13:16:37 -05:00
dbollweg	6f3455900e	Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs	2024-02-16 13:15:02 -05:00
david clarke	56827d6ad6	accelerator_inline bug	2024-02-14 13:56:57 -07:00
paboyle	73c0b29535	Merge branch 'develop' of https://github.com/paboyle/Grid into develop	2024-02-13 20:19:32 +00:00
paboyle	303b83cdb8	Scaling benchmarks, verbosity and MPICH aware in acceleratorInit() For some reason Dirichlet benchmark fails on several nodes; need to debug this.	2024-02-13 19:48:03 +00:00
portelli	62055e04dd	missing semicolon generates error with some compilers	2024-02-13 18:18:27 +01:00
david clarke	db420525b3	fix Simd::Nsimd typo	2024-02-12 15:03:53 -07:00
dbollweg	4b43307402	Undo include path changes for level zero api header	2024-02-09 13:07:56 -05:00
dbollweg	09af8c25a2	Merge branch 'paboyle:develop' into feature/sliceSum_gpu	2024-02-09 13:02:59 -05:00
dbollweg	9514035b87	refactor slicesum: slicesum uses GPU version by default now	2024-02-09 13:02:28 -05:00
david clarke	2da09ae99b	acceleration compiles and doesn't break scalar mode	2024-02-06 18:40:13 -07:00
david clarke	a38fb0e04a	first effort toward accelerators	2024-02-06 18:24:55 -07:00
paboyle	7019916294	RNG seed change safer for large volumes; this is a long term solution	2024-02-07 00:56:39 +00:00
dbollweg	1514b4f137	slicesum_sycl passes test	2024-02-06 19:08:44 -05:00
david clarke	0a6e2f42c5	small amount of cleanup	2024-02-06 16:32:07 -07:00
dbollweg	ab2de131bd	work towards sliceSum for sycl backend	2024-02-06 13:24:45 -05:00
Dennis Bollweg	5af8da76d7	Fix cuda compilation of Lattice_slicesum_gpu.h	2024-02-01 18:02:30 -05:00
Dennis Bollweg	b8b9dc952d	Async memcpy's and cleanup	2024-02-01 17:55:35 -05:00
Dennis Bollweg	79a6ed32d8	Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies	2024-02-01 16:41:03 -05:00
dbollweg	caa5f97723	Add sliceSum gpu using cub/hipcub	2024-01-31 16:50:06 -05:00
david clarke	4924b3209e	projectU3 yields a unitary matrix	2024-01-23 14:43:58 -07:00
david clarke	00f24f8765	already found some bugs in projection, still needs testing	2024-01-22 05:50:16 -07:00
david clarke	f5b3d582b0	first attempt at U3 projection	2024-01-22 02:49:40 -07:00

1 2 3 4 5 ...

1801 Commits