portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-06-25 04:53:30 +01:00

Author	SHA1	Message	Date
paboyle	a7a16df9d0	GET not put has kinder barrier sequence for NVLINK type access as when GET is done, I can use it without barrier. Moves a barrier to a nicer place, overlapped with DtoH DMA	2025-02-12 14:59:28 +00:00
paboyle	382e0abefd	Was issueing a double fence -- the gather also fences	2025-02-12 14:57:28 +00:00
paboyle	6fdefe5b90	Barrier sequencing if doing "GET" not "PUT" is different. This is somewhat better timing for Barriers	2025-02-12 14:55:20 +00:00
paboyle	4788dd8e2e	More states in packet progression for GPU non aware MPI	2025-02-12 14:53:57 +00:00
paboyle	1cc5f221f3	GET not put ordering is better as I know when I've got all MY data	2025-02-12 14:53:05 +00:00
paboyle	93251bfba0	GET not put for better ordering in the downstream dependent kernels -- I know when I'm done, so we can move a barrier / handshake between ranks intranode to a point off critical path	2025-02-12 14:50:21 +00:00
paboyle	18b79508b8	New line better for pretty print	2025-02-12 14:49:48 +00:00
paboyle	4de5ed1613	Remove vector view. The std::vector will not inform Memory manager of deletion and so a stale entry could be left. It is not and should not be used.	2025-02-12 14:48:46 +00:00
paboyle	0baaddbe98	Pipeline mode commit on Aurora. 5+ TF/s on 16^3x32 per tile at 384 nodes. More concurrency/fine grained scheduling is possible.	2025-02-04 19:27:26 +00:00
paboyle	b50fb34e71	Perf on Aurora	2025-02-01 18:39:34 +00:00
paboyle	de84d730ff	Fastest run config on Aurora to date	2025-02-01 18:08:40 +00:00
Peter Boyle	c74d11e3d7	PVdagM MG	2025-02-01 11:04:13 -05:00
paboyle	c4fc972fec	Merge branch 'feature/deprecate-uvm' into develop	2025-01-31 16:32:36 +00:00
paboyle	8cf809e231	Best results on Aurora so far	2025-01-31 16:14:45 +00:00
paboyle	94019a922e	Significantly better performance on Aurora without using pipeline mode	2025-01-30 16:36:46 +00:00
paboyle	d6b2727f86	Pipeline mode getting better -- 2 nodes @ 10TF/s per node on Aurora	2025-01-29 09:22:21 +00:00
paboyle	74a4f43946	Optional host buffer bounce for no CUDA aware MPI	2025-01-28 15:22:46 +00:00
paboyle	1caf8b0f86	Rename	2025-01-28 15:22:37 +00:00
Peter Boyle	3f3661a86f	Heading towards PVdagM multigrid	2025-01-17 14:33:35 +00:00
paboyle	8fe429346f	Dslash testing for reproduce	2024-11-11 23:11:11 +00:00
Peter Boyle	5a4f9bf2e3	Force the ROCM version	2024-10-29 18:12:31 -04:00
Peter Boyle	b91fc1b6b4	Merge branch 'feature/boosted' into feature/deprecate-uvm Fixed boosted free field test	2024-10-28 16:53:09 -04:00
Peter Boyle	eafc150034	Test fft asserts	2024-10-23 16:46:26 -04:00
Peter Boyle	2877f1a268	Verbose reduce	2024-10-23 15:14:16 -04:00
Peter Boyle	1e893af775	GPU happy	2024-10-23 14:52:15 -04:00
Peter Boyle	d9f430a575	Happy GPU	2024-10-23 14:51:16 -04:00
Peter Boyle	63abe87f36	Memory manager verbose improvements that were useful to track an error	2024-10-23 14:49:13 -04:00
Peter Boyle	368d649c8a	feature/deprecate-uvm happier -- preallocate device resident neigbour table	2024-10-23 14:47:55 -04:00
Peter Boyle	5603464f39	Fix in partial fraction import/export physical and make the GPU happier on the deprecate-uvm -- don't use static vectors, make member of class	2024-10-23 14:45:58 -04:00
Peter Boyle	655c79f39e	Suppress warning on partial override	2024-10-23 14:44:41 -04:00
Peter Boyle	565b231c03	Nvcc happy	2024-10-23 14:44:17 -04:00
Peter Boyle	62a9f180fa	NVCC happy	2024-10-23 14:44:04 -04:00
Peter Boyle	5ae77876a8	Meson field and Aslash field on GPU; some compiler warning removed	2024-10-18 19:08:06 -04:00
Peter Boyle	4ed2c2c74f	Config command	2024-10-18 13:58:33 -04:00
Peter Boyle	955da582b6	Working on NVCC	2024-10-18 13:58:03 -04:00
Peter Boyle	11b07b950d	Vanilla linux compile, assuming spack prerequisites	2024-10-18 13:57:40 -04:00
Peter Boyle	8f70cfeda9	Clean up	2024-10-18 13:56:53 -04:00
Peter Boyle	ce64271048	Remove the copying version	2024-10-18 13:56:24 -04:00
paboyle	5cc4f3241d	Meson field test	2024-10-18 15:42:30 +00:00
Peter Boyle	6815e138b4	Boosted fermion attempt	2024-10-17 18:37:33 +01:00
paboyle	a78a61d76f	Update configure	2024-10-15 14:38:45 +00:00
paboyle	2eff3f34ed	Alternate reduction; default to grids own but make a configure flag --enable-reduction=grid\|mpi	2024-10-15 14:36:06 +00:00
paboyle	03687c1d62	Final version of test, closer to original again	2024-10-15 14:35:17 +00:00
paboyle	febfe4e77f	Make my own reduction a configure flag	2024-10-15 14:32:35 +00:00
paboyle	4d1aa134b5	Use normal reduction, configure flag to force deterministic	2024-10-15 14:32:11 +00:00
paboyle	5ec879860a	Odd rounding issue - bears looking into	2024-10-15 14:30:54 +00:00
Peter Boyle	f617468e04	Update Lattice_base.h	2024-10-11 10:39:16 -04:00
paboyle	b728af903c	Fast axpy norm under CFLAG	2024-10-11 03:23:09 +00:00
paboyle	54f1999030	axpy_norm_fast -- wasn't using the determinstic MPI sum causing issues	2024-10-11 03:22:18 +00:00
paboyle	fd58f0b669	Return ok	2024-10-11 03:21:21 +00:00

1 2 3 4 5 ...

7990 Commits