portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-07-17 23:53:27 +01:00

Author	SHA1	Message	Date
Peter Boyle	42cd9eda71	Some improvements that should have been there if in synch with develop, and also some staggered hdcg type work	2026-05-29 13:36:57 -04:00
Thomas BlumandClaude Sonnet 4.5	34d8d003a8	staggered-hdcg: smoother shift tuning, CG baseline, Lanczos diagnostics Smoother shift: - Replace hard-coded mass^2 = 0.0025 with fine_lambda_max / divisor, measured at runtime via PowerMethod on the SchurStaggeredOperator. - Current divisor = 200 (tunable); concentrates the O(8) CG polynomial zeros on the high-frequency end of the spectrum [shift, lambda_max], repairing the spectral leakage introduced at coarse-cell boundaries when the coarse-grid solution is promoted back to the fine grid. - Add explanatory comment on the lego-block edge / covariant-derivative physics behind the high-mode smoothing requirement. Chebyshev filter (IRL): - Fix lambda_lo = 0.02 (was mass^2 * 0.5 = 0.00125). Tuning history logged in comments: lo=0.005 → 0/24 modes (T_70~53); lo=0.01 → 24/24 but 2 restarts; lo=0.02 → 24/24 in 1 restart. - Reduce Nk/Nm from 48/96 to 24/48 (target 24 near-null modes only). - Print Chebyshev filter parameters at run time. CG baseline: - Add sequential single-RHS CG loop before the HDCG solve to establish unpreconditioned iteration count and wall time for direct comparison. ImplicitlyRestartedBlockLanczosCoarse: - Print Ritz values before and after implicit shift at each restart. - Print alpha/beta block-diagonal elements at each Lanczos step. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-05-28 16:43:23 -04:00
Thomas BlumandClaude Sonnet 4.6	905651deaa	Test_staggered_hdcg: fix GridParallelRNG and Lanczos grid bugs - GridParallelRNG must be constructed on full (non-checkerboarded) UGrid, not UrbGrid; fill() recurses infinitely when _grid is checkerboarded. - evec and c_srcs for ImplicitlyRestartedBlockLanczosCoarse must both be on f_grid (Coarse4d), not CoarseMrhs; calc_irbl asserts evec[0].Grid() == src[0].Grid(). - Switch subspace generation from CreateSubspaceChebyshevNew to CreateSubspace (CG inverse iteration), which requires no spectral bound tuning and adapts automatically to the matrix spectrum. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-28 11:41:41 -04:00
Thomas BlumandClaude Sonnet 4.6	119308c42a	Test_staggered_hdcg: add missing ImplicitlyRestartedBlockLanczos.h include IRBLdiagonalisation, SortEigen, and LanczosType are defined in ImplicitlyRestartedBlockLanczos.h, which must be included before ImplicitlyRestartedBlockLanczosCoarse.h. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 20:55:51 -04:00
Thomas BlumandClaude Sonnet 4.6	520b90259d	Add staggered HDCG multigrid test and mac-arm Homebrew build scripts Test_staggered_hdcg.cc implements a two-level ADEF2 multigrid solver for NaiveStaggeredFermion using SchurStaggeredOperator, following the mrhs hermitian multigrid approach of arXiv:2409.03904. Uses a 33-point coarse stencil (NextToNearestStencilGeometry4D) with nbasis=24, block={4,4,4,4}, and Chebyshev subspace generation with hi=5.0 (lambda_max ~4.6). Also adds systems/mac-arm/sourceme-homebrew.sh and config-command-homebrew for building Grid on Apple Silicon with Homebrew-installed dependencies. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-27 15:52:49 -04:00
Peter Boyle	4d527e81fa	Remove hip specific files	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	32654db366	Test_planned_fft: fix PlannedFFT template parameter to use ::vector_object Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	cd340cfab3	tests: add Test_planned_fft exercising PlannedFFT<vobj> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	f32866b2ff	tests/fft: remove PlanDestroy calls (FFT handles plans per-call) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	0493656e86	debug: add Test_hipfft_repro — reproducer for hipFFT PARSE_ERROR on ROCm 7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	66fd504c4d	tests/debug: add G=4 to hipfft fail reproducer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	be4dd2b52f	tests/debug: test hipMemset variant before cache is populated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	707d059766	tests/debug: extend hipfft fail reproducer with hipMemset and sync variants Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	dbbfdd4e4b	tests/debug: add minimal hipfft ordering bug fail/pass pair Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	f967fb40bf	tests/debug: test plan-before-malloc vs malloc-before-plan ordering Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	74e0f846cb	tests/debug: extend hipfft reproducer with Grid-realistic howmany and exec tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	303a4d26e5	tests/debug: add minimal hipfft plan-creation reproducer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	e79adc9d31	FFT: cache plans per vobj type across calls Plans are created lazily on the first FFT_dim call and reused for all subsequent calls on the same FFT object. PlanCreate<vobj>() can be called explicitly to pre-warm the cache. PlanDestroy() must be called before switching to a different vobj type; the destructor cleans up any live plans automatically. Update Test_fft.cc and Test_fftf.cc to call PlanDestroy() between the LatticeComplex and LatticeSpinMatrix sections that reuse the same FFT object. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	a1119266c1	Revert to hand-rolled reduction; drop Lattice_reduction_gpu_cub.h Remove the CUB/hipCUB direction entirely. Restore Lattice_reduction_gpu.h, Lattice_reduction_sycl.h, and Lattice_reduction.h to the state before the CUB rewrite (commit `969b0a39`), recovering the original primary function names (sumD_gpu_small, sumD_gpu_large, sumD_gpu, sum_gpu, sum_gpu_large) and the hand-rolled shared-memory reduction kernel. Delete Lattice_reduction_gpu_cub.h. Update Test_reduction to remove the old/new comparison sections that depended on sum_gpu_old. The lesson: CUB DeviceReduce is slower than the hand-rolled kernel for small types, and the smem sizing problem for the extraction pass has no clean solution within the accelerator_for abstraction. The right improvement is a higher radix (12 then 4) in sumD_gpu_large, applied directly to the existing hand-rolled kernel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	f3c3b1c04b	Test_reduction: add timing benchmark for new vs old reduction paths Reports us/call and GB/s for sum_gpu (CUB/sycl::reduction) and sum_gpu_old (hand-rolled shared-memory) for each field type, with 5-call warmup and 100-call timed loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	dfd0503eae	Test_reduction: use separate float and double grids Float fields require a grid constructed with vComplexF::Nsimd(); using a double grid causes grid->_gsites to undercount the sites in float vobjF, making the constant-field expected value wrong. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	c629b2e87e	Rename scalarNorm2 to squaredSum in Test_reduction.cc The function computes \|sum\|^2 — the squared magnitude of an aggregate sum — not a norm. squaredSum makes clear that squaring is applied to the sum, not to individual site values before summing, distinguishing it from sumOfSquares (the squared L2 norm). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter BoyleandClaude Sonnet 4.6	bba328fac5	Add Test_reduction to tests/debug Tests the new CUB/hipCUB/SYCL lattice reduction (sum_gpu) against the preserved hand-rolled implementation (sum_gpu_old) for LatticeComplexF/D, LatticeColourMatrixF/D and LatticePropagatorF/D. Part a) gaussian random field: checks that old and new agree to within float/double roundoff tolerance. Part b) constant field (= 1.0, identity-matrix init): verifies innerProduct(sum, sum) = Ncomp * V^2 where Ncomp counts the nonzero diagonal scalar components per site (1 / Nc / Ns*Nc respectively). Make.inc is auto-generated by scripts/filelist on bootstrap and is not tracked; the new .cc file is all that is needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter Boyle	595ceaac37	Include grid header and make the ENABLE correct	2026-03-11 17:24:44 -04:00
Peter Boyle	daf5834e8e	Fixing incorrect PR about disable fermion instantiations	2026-03-11 17:05:46 -04:00
Peter Boyle	0d8658a039	Optimised	2026-03-05 06:06:32 -05:00
Peter Boyle	76fbcffb60	Improvement to 16^3 hdcg	2026-03-05 06:06:32 -05:00
edbennett	1b56f6f46d	be able to skip compiling fermion instantiations altogether	2026-02-24 23:52:18 +00:00
Peter Boyle	6ff29f9d4f	Alternate multigrids	2026-02-13 17:25:45 -05:00
Peter Boyle	7cd3f21e6b	preserving a bunch of experiments on setup and g5 subspace doubling	2026-01-06 05:57:39 -05:00
Peter Boyle	2e684028de	Improvements	2025-11-14 18:12:27 -05:00
Peter Boyle	fe0db53842	FFT offload to GPU and MUCH faster comms. 40x speed up on Frontier	2025-08-21 16:45:38 -04:00
Peter Boyle	76c0ada1e1	Benchmark for En Hung	2025-08-21 16:45:38 -04:00
paboyle	9e6a4a4737	Assertion updates to macros (mostly) with backtrace. WIlson flow to include options for DBW2, Iwasaki, Symanzik. View logging for data assurance	2025-08-07 15:48:38 +00:00
paboyle	73af020f98	improved	2025-06-27 06:08:54 +00:00
Peter Boyle	3737a24096	Updated python output	2025-06-03 14:09:29 -04:00
Peter Boyle	5364d580c9	Output chirality, eigenvector density files and python source lego plot	2025-05-13 18:44:47 -04:00
Peter Boyle	677b4cc5b0	Make all tests compile	2025-04-24 20:33:26 -04:00
Peter BoyleandGitHub	ab3de50d5e	Merge pull request #473 from UCL-ARC/gauge_action_deriv WilsonGagueAction deriv	2025-04-24 14:39:10 -04:00
Chulwoo Jung	a957e7bfa1	Adding DWF evec Chirality measurement	2025-04-22 22:17:51 +00:00
Chulwoo Jung	cee4c8ce8c	Merge branch 'develop' of https://github.com/paboyle/Grid into specflow	2025-04-18 19:55:36 +00:00
Peter Boyle	6fec3c15ca	Cleaner printing	2025-04-04 18:35:06 -04:00
Mashy Green	d41542c64b	reverted sp2n test wilsonfundfermiongauge to original	2025-03-24 08:29:15 +00:00
Mashy Green	0000d2e558	Merge branch 'develop' into gauge_action_deriv	2025-03-10 08:35:57 +00:00
Muhammad AsifandGitHub	b1ba209696	Latest upstream with np-su3 patch and modified Sp_WilsonFunfFermionGauge test to be small (#22 ) Co-authored-by: Mashy Green <mashy@me.com> merging no-su3 patch	2025-02-24 11:38:42 +00:00
Mashy Green	717f647418	added the WilsonFlow patch from upstream PR #471	2025-02-24 08:41:31 +00:00
Peter Boyle	c74d11e3d7	PVdagM MG	2025-02-01 11:04:13 -05:00
paboyle	c4fc972fec	Merge branch 'feature/deprecate-uvm' into develop	2025-01-31 16:32:36 +00:00
Chulwoo Jung	570b72a47b	Bugfix. Sorry!	2025-01-21 15:37:39 -05:00
Chulwoo Jung	a5798a89ed	Merge branch 'develop' into specflow	2025-01-21 12:13:24 -05:00

1 2 3 4 5 ...