portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-07-21 11:03:27 +01:00

Author	SHA1	Message	Date
Peter Boyle	42cd9eda71	Some improvements that should have been there if in synch with develop, and also some staggered hdcg type work	2026-05-29 13:36:57 -04:00
Thomas BlumandClaude Sonnet 4.5	34d8d003a8	staggered-hdcg: smoother shift tuning, CG baseline, Lanczos diagnostics Smoother shift: - Replace hard-coded mass^2 = 0.0025 with fine_lambda_max / divisor, measured at runtime via PowerMethod on the SchurStaggeredOperator. - Current divisor = 200 (tunable); concentrates the O(8) CG polynomial zeros on the high-frequency end of the spectrum [shift, lambda_max], repairing the spectral leakage introduced at coarse-cell boundaries when the coarse-grid solution is promoted back to the fine grid. - Add explanatory comment on the lego-block edge / covariant-derivative physics behind the high-mode smoothing requirement. Chebyshev filter (IRL): - Fix lambda_lo = 0.02 (was mass^2 * 0.5 = 0.00125). Tuning history logged in comments: lo=0.005 → 0/24 modes (T_70~53); lo=0.01 → 24/24 but 2 restarts; lo=0.02 → 24/24 in 1 restart. - Reduce Nk/Nm from 48/96 to 24/48 (target 24 near-null modes only). - Print Chebyshev filter parameters at run time. CG baseline: - Add sequential single-RHS CG loop before the HDCG solve to establish unpreconditioned iteration count and wall time for direct comparison. ImplicitlyRestartedBlockLanczosCoarse: - Print Ritz values before and after implicit shift at each restart. - Print alpha/beta block-diagonal elements at each Lanczos step. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-05-28 16:43:23 -04:00
Thomas BlumandClaude Sonnet 4.6	905651deaa	Test_staggered_hdcg: fix GridParallelRNG and Lanczos grid bugs - GridParallelRNG must be constructed on full (non-checkerboarded) UGrid, not UrbGrid; fill() recurses infinitely when _grid is checkerboarded. - evec and c_srcs for ImplicitlyRestartedBlockLanczosCoarse must both be on f_grid (Coarse4d), not CoarseMrhs; calc_irbl asserts evec[0].Grid() == src[0].Grid(). - Switch subspace generation from CreateSubspaceChebyshevNew to CreateSubspace (CG inverse iteration), which requires no spectral bound tuning and adapts automatically to the matrix spectrum. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 11:41:41 -04:00
Thomas BlumandClaude Sonnet 4.6	119308c42a	Test_staggered_hdcg: add missing ImplicitlyRestartedBlockLanczos.h include IRBLdiagonalisation, SortEigen, and LanczosType are defined in ImplicitlyRestartedBlockLanczos.h, which must be included before ImplicitlyRestartedBlockLanczosCoarse.h. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 20:55:51 -04:00
Thomas BlumandClaude Sonnet 4.6	520b90259d	Add staggered HDCG multigrid test and mac-arm Homebrew build scripts Test_staggered_hdcg.cc implements a two-level ADEF2 multigrid solver for NaiveStaggeredFermion using SchurStaggeredOperator, following the mrhs hermitian multigrid approach of arXiv:2409.03904. Uses a 33-point coarse stencil (NextToNearestStencilGeometry4D) with nbasis=24, block={4,4,4,4}, and Chebyshev subspace generation with hi=5.0 (lambda_max ~4.6). Also adds systems/mac-arm/sourceme-homebrew.sh and config-command-homebrew for building Grid on Apple Silicon with Homebrew-installed dependencies. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 15:52:49 -04:00
Peter Boyle	4d527e81fa	Remove hip specific files	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	0493656e86	debug: add Test_hipfft_repro — reproducer for hipFFT PARSE_ERROR on ROCm 7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	66fd504c4d	tests/debug: add G=4 to hipfft fail reproducer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	be4dd2b52f	tests/debug: test hipMemset variant before cache is populated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	707d059766	tests/debug: extend hipfft fail reproducer with hipMemset and sync variants Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	dbbfdd4e4b	tests/debug: add minimal hipfft ordering bug fail/pass pair Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	f967fb40bf	tests/debug: test plan-before-malloc vs malloc-before-plan ordering Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	74e0f846cb	tests/debug: extend hipfft reproducer with Grid-realistic howmany and exec tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	303a4d26e5	tests/debug: add minimal hipfft plan-creation reproducer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	a1119266c1	Revert to hand-rolled reduction; drop Lattice_reduction_gpu_cub.h Remove the CUB/hipCUB direction entirely. Restore Lattice_reduction_gpu.h, Lattice_reduction_sycl.h, and Lattice_reduction.h to the state before the CUB rewrite (commit `969b0a39`), recovering the original primary function names (sumD_gpu_small, sumD_gpu_large, sumD_gpu, sum_gpu, sum_gpu_large) and the hand-rolled shared-memory reduction kernel. Delete Lattice_reduction_gpu_cub.h. Update Test_reduction to remove the old/new comparison sections that depended on sum_gpu_old. The lesson: CUB DeviceReduce is slower than the hand-rolled kernel for small types, and the smem sizing problem for the extraction pass has no clean solution within the accelerator_for abstraction. The right improvement is a higher radix (12 then 4) in sumD_gpu_large, applied directly to the existing hand-rolled kernel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	f3c3b1c04b	Test_reduction: add timing benchmark for new vs old reduction paths Reports us/call and GB/s for sum_gpu (CUB/sycl::reduction) and sum_gpu_old (hand-rolled shared-memory) for each field type, with 5-call warmup and 100-call timed loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	dfd0503eae	Test_reduction: use separate float and double grids Float fields require a grid constructed with vComplexF::Nsimd(); using a double grid causes grid->_gsites to undercount the sites in float vobjF, making the constant-field expected value wrong. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	c629b2e87e	Rename scalarNorm2 to squaredSum in Test_reduction.cc The function computes \|sum\|^2 — the squared magnitude of an aggregate sum — not a norm. squaredSum makes clear that squaring is applied to the sum, not to individual site values before summing, distinguishing it from sumOfSquares (the squared L2 norm). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	bba328fac5	Add Test_reduction to tests/debug Tests the new CUB/hipCUB/SYCL lattice reduction (sum_gpu) against the preserved hand-rolled implementation (sum_gpu_old) for LatticeComplexF/D, LatticeColourMatrixF/D and LatticePropagatorF/D. Part a) gaussian random field: checks that old and new agree to within float/double roundoff tolerance. Part b) constant field (= 1.0, identity-matrix init): verifies innerProduct(sum, sum) = Ncomp * V^2 where Ncomp counts the nonzero diagonal scalar components per site (1 / Nc / Ns*Nc respectively). Make.inc is auto-generated by scripts/filelist on bootstrap and is not tracked; the new .cc file is all that is needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter Boyle	0d8658a039	Optimised	2026-03-05 06:06:32 -05:00
Peter Boyle	76fbcffb60	Improvement to 16^3 hdcg	2026-03-05 06:06:32 -05:00
Peter Boyle	6ff29f9d4f	Alternate multigrids	2026-02-13 17:25:45 -05:00
Peter Boyle	7cd3f21e6b	preserving a bunch of experiments on setup and g5 subspace doubling	2026-01-06 05:57:39 -05:00
paboyle	9e6a4a4737	Assertion updates to macros (mostly) with backtrace. WIlson flow to include options for DBW2, Iwasaki, Symanzik. View logging for data assurance	2025-08-07 15:48:38 +00:00
Peter Boyle	677b4cc5b0	Make all tests compile	2025-04-24 20:33:26 -04:00
Peter Boyle	6fec3c15ca	Cleaner printing	2025-04-04 18:35:06 -04:00
Peter Boyle	c74d11e3d7	PVdagM MG	2025-02-01 11:04:13 -05:00
Peter Boyle	3f3661a86f	Heading towards PVdagM multigrid	2025-01-17 14:33:35 +00:00
Peter Boyle	2a9cfeb9ea	New files	2024-09-26 14:23:29 -04:00
Peter Boyle	575eb72182	Converges on 16^3	2024-08-27 19:20:38 +00:00
Peter Boyle	29f6b8a74a	Setup	2024-08-27 12:02:49 -04:00
Peter Boyle	9779aaea33	16^3 optimise	2024-08-27 11:38:35 -04:00
Peter Boyle	ec25604a67	Fastest solver for mrhs multigrid	2024-08-27 11:32:34 -04:00
Peter Boyle	b461184797	Merge branch 'develop' of https://github.com/paboyle/Grid into develop	2024-07-23 09:53:58 -04:00
Peter Boyle	486412635a	8^4 test for PETSc	2024-07-22 15:25:17 -04:00
Peter Boyle	8b23a1546a	Force compile temporarily	2024-07-22 15:24:56 -04:00
Peter Boyle	a901e4e369	Regressed performance for paper	2024-07-22 15:24:04 -04:00
Peter Boyle	804d9367d4	Regressed performance	2024-07-22 15:23:25 -04:00
Peter Boyle	7c246606c1	Schur additional case	2024-07-10 22:04:32 +00:00
Peter Boyle	12b8be7cb9	Best so far on 96^3 350 Evecs converged on 4^4 block	2024-06-18 16:31:37 -04:00
Peter Boyle	dc80b08969	96^3 test	2024-06-10 15:07:29 -04:00
Peter Boyle	0e607a55e7	Updated for 8^4 test	2024-05-26 20:53:05 +00:00
Peter Boyle	ad14a82742	Working aas good as possible on 48^3 in double	2024-05-16 10:55:45 -04:00
Peter Boyle	98cf247f33	prepare to switch to mixed precision	2024-04-30 05:23:45 -04:00
Peter Boyle	0cf16522d1	Refine with HDCG choice	2024-04-30 05:22:14 -04:00
Peter Boyle	5147a42818	Updated hdcg	2024-04-05 01:05:57 -04:00
Peter Boyle	5b79d51c22	Improvements	2024-04-01 14:18:40 -04:00
Peter Boyle	cc04dc42dc	Merge branch 'develop' into feature/scidac-wp1	2024-03-06 14:55:21 -05:00
Peter Boyle	070b61f08f	Simplifying the MultiRHS solver to make it do SRHS and MRHS	2024-03-06 14:04:33 -05:00
Peter Boyle	cd15abe9d1	Mrhs prep	2024-02-27 11:41:13 -05:00

1 2 3