portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-05-27 20:44:16 +01:00

Author	SHA1	Message	Date
paboyle	e5fa3d887f	Compile on CUDA	2025-08-21 22:10:27 +01:00
Peter Boyle	fe0db53842	FFT offload to GPU and MUCH faster comms. 40x speed up on Frontier	2025-08-21 16:45:38 -04:00
paboyle	9e6a4a4737	Assertion updates to macros (mostly) with backtrace. WIlson flow to include options for DBW2, Iwasaki, Symanzik. View logging for data assurance	2025-08-07 15:48:38 +00:00
paboyle	41f344bbd3	Merge with Christoph GPT checksum debug	2025-07-15 03:06:09 +00:00
paboyle	bfae14d035	More flight logging	2025-06-27 06:07:34 +00:00
Peter Boyle	6ec5cee368	Preparing for compressed comms	2025-06-17 16:38:10 +02:00
paboyle	d418f78352	Making running on Aurora more debuggable	2025-05-23 20:58:16 +00:00
Peter Boyle	adc90d3a86	NVLINK GET/PUT on cuda aware mpi	2025-04-04 18:35:05 -04:00
paboyle	19f9378b98	Should work on Aurora nowb	2025-03-11 13:50:43 +00:00
paboyle	1d22841811	Working on aurora, GPT issue turned up is fixed	2025-03-06 03:20:18 +00:00
paboyle	1cc5f221f3	GET not put ordering is better as I know when I've got all MY data	2025-02-12 14:53:05 +00:00
paboyle	0baaddbe98	Pipeline mode commit on Aurora. 5+ TF/s on 16^3x32 per tile at 384 nodes. More concurrency/fine grained scheduling is possible.	2025-02-04 19:27:26 +00:00
paboyle	8cf809e231	Best results on Aurora so far	2025-01-31 16:14:45 +00:00
paboyle	94019a922e	Significantly better performance on Aurora without using pipeline mode	2025-01-30 16:36:46 +00:00
paboyle	d6b2727f86	Pipeline mode getting better -- 2 nodes @ 10TF/s per node on Aurora	2025-01-29 09:22:21 +00:00
paboyle	febfe4e77f	Make my own reduction a configure flag	2024-10-15 14:32:35 +00:00
Peter Boyle	5c3ace7c3e	Merge branch 'develop' into feature/scidac-wp1	2024-04-30 05:26:06 -04:00
Peter Boyle	434c3e7f1d	We have a choice of GET or PUT across NVlink	2024-03-25 14:32:44 +00:00
Peter Boyle	b6ad1bafc7	Normal memory SendToRecvFrom asynchronous for use in general stencil code	2023-10-20 19:27:13 -04:00
Peter Boyle	2376156fbc	Merge branch 'develop' into feature/dirichlet	2023-03-27 21:33:50 -07:00
Peter Boyle	dd3bbb8fa2	MOve the synchronise out to the stencil so one call instead of one call per packet	2023-03-27 17:27:45 -07:00
Peter Boyle	a11c12e2e7	Modifications for partial dirichlet BCs	2022-11-15 16:20:01 -05:00
Peter Boyle	1177b8f661	Merge branch 'develop' into feature/dirichlet	2022-08-31 19:05:57 -04:00
Peter Boyle	06d9ce1a02	Synch ranks on node here for GPU - GPU memcopy	2022-08-04 13:35:56 -04:00
Peter Boyle	8137cc7049	Allways concurrent comms	2022-07-28 12:01:51 -04:00
Peter Boyle	2ab1af5754	Ensure no synchronize and not optoin dependent	2022-07-19 09:51:06 -07:00
Peter Boyle	f7217d12d2	World barrier for clock synch	2022-07-11 13:45:31 -04:00
Peter Boyle	7eb29cf529	MPI fix	2022-05-28 15:51:34 -07:00
Peter Boyle	3f31afa4fc	Clean up verbose	2022-05-24 18:18:51 -07:00
Peter Boyle	aab3bcb46f	Dirichlet first cut - wrong answers on dagger multiply. Struggling to get a compute node so changing systems	2022-02-22 19:58:33 +00:00
Peter Boyle	135808dcfa	Less verbose	2021-12-07 16:24:24 -05:00
Peter Boyle	2bf3b4d576	Update to reduce memory footpring in benchmark test	2021-12-07 09:02:02 -08:00
Peter Boyle	16c2a99965	Overlap cudamemcpy - didn't set up stream right	2021-10-11 13:31:26 -07:00
Peter Boyle	c0d56a1c04	Perlmutter tune up	2021-09-22 06:02:34 -07:00
Peter Boyle	ca9816bfbb	Typo	2021-09-21 04:12:04 +02:00
Peter Boyle	109507888b	Option to force use of MPI over Nvlink	2021-09-21 00:53:25 +02:00
Peter Boyle	8195890640	Force MPI over NVLINK	2021-09-14 05:00:17 +01:00
Peter Boyle	cd99edcc5f	maxLocalNorm2()	2021-02-04 18:25:49 -05:00
Peter Boyle	d05ce01809	TOFU behaviour now optional THREAD_MULTIPLE or THREAD_SERIALIZED	2020-11-13 03:52:19 +01:00
Peter Boyle	a8309638d4	UVM check in MPI calls	2020-09-03 20:29:26 -04:00
Peter Boyle	0c3095e173	Comms buffers to device memory	2020-09-03 15:45:35 -04:00
Christoph Lehner	197612bc7a	fast cpu basisRotate and other small cleanups	2020-07-30 07:08:54 -04:00
nmeyer-ur	8726e94ea7	merge upstream develop	2020-07-07 20:26:47 +02:00
nmeyer-ur	1635c263ee	disable TOFU by default	2020-06-30 19:27:08 +02:00
nmeyer-ur	465856331a	switch back to serialized; wrong results on single too	2020-06-15 15:39:39 +02:00
nmeyer-ur	cc958aa9ed	switch back to standard MPI_init due to wrong results in Benchmark_wilson using comms-overlap	2020-06-15 14:21:38 +02:00
nmeyer-ur	4fedd8d29f	switch to MPI_THREAD_SERIALIZED instead of SINGLE	2020-05-27 14:08:34 +02:00
nmeyer-ur	9a86059761	symmetrize VLA and fixed size build messages	2020-05-20 20:05:42 +02:00
nmeyer-ur	b780b7b7a0	guard prevents multiple TOFU messages	2020-05-20 19:20:59 +02:00
nmeyer-ur	fc2e9850d3	temporarily enable TOFU by default when using A64FX or A64FXFIXEDSIZE	2020-05-11 13:25:02 +02:00

1 2

58 Commits