portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-06-19 18:33:43 +01:00

Author	SHA1	Message	Date
Peter Boyle	003fec509c	Fix Zero() used on thrust::complex in WordBundle4 initialisation Grid's Zero() sentinel is not assignable to thrust::complex<double>; use scalarD(0) instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 18:10:17 -04:00
Peter Boyle	773a82d87f	Reinstate large/small dispatch in CUB reduction path; radix-4 word-bundle for large types rocPRIM's DeviceReduce requires warpSize(64) threads each holding one element in shared memory, so sizeof(T)64 must fit in sharedMemPerBlock. LatticePropagator::scalar_objectD is 2304 bytes (642304 = 147 KB), exceeding the budget and triggering a compile-time static_assert in limit_block_size. Introduce sumD_gpu_direct (the original direct-CUB path, safe for small types) and a new sumD_gpu_large that groups the vobj's vector_type words in bundles of 4, reducing each bundle as WordBundle4<scalarD> (64 bytes, 64*64 = 4 KB — always within budget). If words % 4 != 0, the final partial bundle is zero-padded. sumD_gpu dispatches at compile time via if constexpr on sizeof(sobjD) > 512. For LatticePropagator (144 words) this gives 36 CUB launches instead of 144. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 16:55:58 -04:00
Peter Boyle	286c29d6fb	Add Test_reduction to tests/debug Tests the new CUB/hipCUB/SYCL lattice reduction (sum_gpu) against the preserved hand-rolled implementation (sum_gpu_old) for LatticeComplexF/D, LatticeColourMatrixF/D and LatticePropagatorF/D. Part a) gaussian random field: checks that old and new agree to within float/double roundoff tolerance. Part b) constant field (= 1.0, identity-matrix init): verifies innerProduct(sum, sum) = Ncomp * V^2 where Ncomp counts the nonzero diagonal scalar components per site (1 / Nc / Ns*Nc respectively). Make.inc is auto-generated by scripts/filelist on bootstrap and is not tracked; the new .cc file is all that is needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 14:31:33 -04:00
Peter Boyle	969b0a3922	Rewrite lattice GPU reduction to use CUB, hipCUB, and SYCL reduction Replace hand-rolled shared-memory reduction kernels (reduceBlock/reduceBlocks/ reduceKernel) and the global device variable retirementCount with a unified CUB/hipCUB DeviceReduce::Reduce path for CUDA/HIP and sycl::reduction for SYCL. No small/large split is needed: both CUB and sycl::reduction handle arbitrary object sizes internally. Old implementations preserved as sum_gpu_old / sumD_gpu_old etc. in the original files for regression testing on GPU hardware. Also add CLAUDE.md with build, test, and architecture guidance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-15 13:41:56 -04:00
Peter Boyle	c6c2834e03	Hip Happy	2026-05-15 11:30:29 -04:00
Peter Boyle	856545a1db	Support ROCM 7.0.2	2026-05-15 11:30:29 -04:00
Peter Boyle	e2d607f6c7	Merge pull request #490 from jdmaia/hip-guard-acceleratorfor2dNB [HIP] Including kernel launch parameter guard on accelerator_for2dNB	2026-05-06 14:51:30 -04:00
Julio Maia	66da4e0657	Including guard on accelerator_for2dNB against invalid kernel configurations if GRID_HIP	2026-05-06 13:26:33 -05:00
Peter Boyle	b37390bb5a	4 node usqcd run	2026-04-27 14:40:11 -07:00
Peter Boyle	829dc8cceb	32 node	2026-04-27 14:38:02 -07:00
Peter Boyle	13cc2c39f5	FOM run	2026-04-27 14:20:49 -07:00
Peter Boyle	66ea3b271c	Merge branch 'develop' of https://github.com/paboyle/Grid into develop	2026-04-27 13:55:52 -07:00
Peter Boyle	d293b58a20	384 node baseline run	2026-04-27 13:54:40 -07:00
Peter Boyle	ce093b2bf3	rdtsc	2026-04-27 13:54:06 -07:00
Peter Boyle	e4404efe5a	Perlmutter compile update	2026-04-27 13:53:28 -07:00
Peter Boyle	5ce270f1de	Adding Claude related files	2026-04-21 10:41:18 -04:00
Peter Boyle	af43b067a0	New CLAUDE controllable visualiser	2026-04-10 11:23:25 -04:00
Quadro	34b44d1fee	New file for animation in MD time direction	2026-04-02 13:55:38 -04:00
Peter Boyle	595ceaac37	Include grid header and make the ENABLE correct	2026-03-11 17:24:44 -04:00
Peter Boyle	daf5834e8e	Fixing incorrect PR about disable fermion instantiations	2026-03-11 17:05:46 -04:00
Peter Boyle	0d8658a039	Optimised	2026-03-05 06:06:32 -05:00
Peter Boyle	095e004d01	Setup change GCR	2026-03-05 06:06:32 -05:00
Peter Boyle	0acabee7f6	Modest change	2026-03-05 06:06:32 -05:00
Peter Boyle	76fbcffb60	Improvement to 16^3 hdcg	2026-03-05 06:06:32 -05:00
Peter Boyle	a0a62d7ead	Merge pull request #478 from vataspro/PolyakovUpstream Spatial Polyakov Loop implementation	2026-02-24 20:45:42 -05:00
Peter Boyle	c5038ea6a5	Merge pull request #483 from cmcknigh/bugfix/rocm7-rocblas-type-refactor Adding a version check to handle rocBlas type refactor	2026-02-24 20:45:03 -05:00
Peter Boyle	a5120903eb	Merge pull request #486 from RChrHill/fix/sp4-fp32 Define Sp4 ProjectOnGeneralGroup for generic vtype	2026-02-24 20:44:08 -05:00
Peter Boyle	00b286a08a	Merge pull request #488 from RChrHill/feature/additional-ET-traces Add ET support for Lattice spin- and colour-traces	2026-02-24 20:43:45 -05:00
Peter Boyle	24a9759353	Merge pull request #485 from edbennett/skip-fermion-instantiations Be able to skip compiling fermion instantiations altogether	2026-02-24 20:43:20 -05:00
edbennett	1b56f6f46d	be able to skip compiling fermion instantiations altogether	2026-02-24 23:52:18 +00:00
Peter Boyle	2a8084d569	Subspace setup	2026-02-13 17:26:11 -05:00
Peter Boyle	6ff29f9d4f	Alternate multigrids	2026-02-13 17:25:45 -05:00
RChHill	c4d3e79193	Add ET support for Lattice spin- and colour-traces	2026-01-29 14:46:52 +00:00
Peter Boyle	7cd3f21e6b	preserving a bunch of experiments on setup and g5 subspace doubling	2026-01-06 05:57:39 -05:00
paboyle	4a0aaf0786	Fix issue with Aurora compilers	2025-11-21 21:41:13 +00:00
paboyle	9c3835524c	Fix compile warn	2025-11-21 21:41:12 +00:00
paboyle	549351bb8a	Stag verbose clean up	2025-11-20 18:22:57 +00:00
RChHill	b650b89682	Define Sp4 ProjectOnGeneralGroup for generic vtype	2025-11-19 13:26:52 +00:00
Peter Boyle	74e6b19f83	Looks like the reuse of xfers in staggered has bugs or corner cases depending on volume	2025-11-17 22:29:06 -05:00
Peter Boyle	2e684028de	Improvements	2025-11-14 18:12:27 -05:00
paboyle	c54d87a472	Aurora compile fix for new compiler	2025-11-06 18:17:33 +00:00
Allen McKnight	4304245c1b	Merge branch 'develop' into bugfix/rocm7-rocblas-type-refactor	2025-11-04 08:50:11 -06:00
Peter Boyle	6165931afa	Update GridStd.h	2025-10-03 14:35:37 -04:00
Your Name	1d1fd3bcaf	adding a version check to handle rocblas type change	2025-10-02 15:24:24 -05:00
paboyle	23581333e6	link cufft	2025-08-21 22:25:55 +01:00
paboyle	e5fa3d887f	Compile on CUDA	2025-08-21 22:10:27 +01:00
paboyle	583fa7bb0a	FFTW guarded after CUDA adn HIP	2025-08-21 22:00:12 +01:00
Peter Boyle	fe0db53842	FFT offload to GPU and MUCH faster comms. 40x speed up on Frontier	2025-08-21 16:45:38 -04:00
Peter Boyle	76c0ada1e1	Benchmark for En Hung	2025-08-21 16:45:38 -04:00
Peter Boyle	92f49e9194	Merge pull request #482 from g-simonetti/wflow_sp2n_paboyle Fixed Wilson flow for Nc not equal to 3	2025-08-21 09:10:25 -04:00

1 2 3 4 5 ...

8191 Commits