1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-06-04 11:14:38 +01:00

Commit Graph

  • 4aa0bca4dc Change sum operation to use gpucub mistake in PR from Chris develop Peter Boyle 2026-06-01 14:12:25 -04:00
  • 94d8ee4268 More info feature/staggered-hdcg Peter Boyle 2026-05-29 16:10:11 -04:00
  • 3fac61fc55 benchmarks: add DhopEO benchmarks to Benchmark_staggered and Benchmark_staggeredF Peter Boyle 2026-05-29 13:55:27 -04:00
  • 42cd9eda71 Some improvements that should have been there if in synch with develop, and also some staggered hdcg type work Peter Boyle 2026-05-29 13:36:57 -04:00
  • 34d8d003a8 staggered-hdcg: smoother shift tuning, CG baseline, Lanczos diagnostics Thomas Blum 2026-05-28 16:43:23 -04:00
  • 8540b2a85d Test_extended_meson_field: add view_open timers to measure MemoryManager H2D transfers feature/Kpipi-masaaki-offload Peter Boyle 2026-05-28 14:04:38 -04:00
  • 905651deaa Test_staggered_hdcg: fix GridParallelRNG and Lanczos grid bugs Thomas Blum 2026-05-28 11:41:41 -04:00
  • dbd3a0e612 A2ALoopPropagator: fuse outer product sum into single accelerator_for kernel Peter Boyle 2026-05-28 10:39:37 -04:00
  • 5b58d1da62 Test_extended_meson_field: add --Ni and --Nj command line options Peter Boyle 2026-05-27 22:46:11 -04:00
  • 377db1bc08 Tensor_inner: move scalar innerProductD overloads before norm2 for ADL visibility Peter Boyle 2026-05-27 22:24:52 -04:00
  • 699564997e Test_extended_meson_field: use decltype(coalescedRead) for arch-portable kernel types Peter Boyle 2026-05-27 22:18:43 -04:00
  • f2750fae09 Test_extended_meson_field: use Grid norm2 instead of std::norm for HIP compatibility Peter Boyle 2026-05-27 22:04:10 -04:00
  • ed12fa09c5 Not sure how this old lattice slice sum core fix didn't propagate Peter Boyle 2026-05-27 21:54:24 -04:00
  • b914403bbe Better setup Frontier Peter Boyle 2026-05-27 21:31:54 -04:00
  • c1566fb9a2 Merge branch 'develop' into feature/Kpipi-masaaki-offload Peter Boyle 2026-05-27 21:03:16 -04:00
  • 905da6f083 Merge branch 'feature/reduction-reorganisation' into develop Peter Boyle 2026-05-27 21:01:30 -04:00
  • cb199c127c Merge branch 'develop' into feature/Kpipi-masaaki-offload Peter Boyle 2026-05-27 20:59:30 -04:00
  • 119308c42a Test_staggered_hdcg: add missing ImplicitlyRestartedBlockLanczos.h include Thomas Blum 2026-05-27 20:55:51 -04:00
  • 1a932ea33b Merge Peter Boyle 2026-05-27 20:45:00 -04:00
  • 89a32799e3 mac-arm: align --enable-Sp=no with upstream config-command-mpi style Thomas Blum 2026-05-27 16:21:02 -04:00
  • ce8b52749d Merge remote-tracking branch 'origin/develop' into feature/staggered-hdcg Thomas Blum 2026-05-27 16:20:47 -04:00
  • 86c7f29183 Config command update Peter Boyle 2026-05-27 16:19:33 -04:00
  • bbdc8e95f4 mac-arm: disable Sp, fermion-reps, gparity for faster dev builds Thomas Blum 2026-05-27 16:19:28 -04:00
  • 1284acf37a Merge remote-tracking branch 'origin/develop' into feature/staggered-hdcg Thomas Blum 2026-05-27 16:19:19 -04:00
  • b0c99f876e Configure on mac update Peter Boyle 2026-05-27 16:16:55 -04:00
  • bf5fcdc860 Ease of use for std::complex interchangable with thrust Peter Boyle 2026-05-27 16:05:37 -04:00
  • 520b90259d Add staggered HDCG multigrid test and mac-arm Homebrew build scripts Thomas Blum 2026-05-27 15:52:49 -04:00
  • 5822a6599c skills: add GPU/A2A reference skill documents Peter Boyle 2026-05-27 11:12:47 -04:00
  • 0eeb334fe0 systems/mac-arm: add MPI configure command and spack sourceme Peter Boyle 2026-05-27 11:12:42 -04:00
  • d8d16407e9 A2ASpatialSum: extended meson field kernel and test Peter Boyle 2026-05-27 11:12:29 -04:00
  • b58a1508fa Perlmutter cuda version update Peter Boyle 2026-05-21 13:25:13 -07:00
  • 4d527e81fa Remove hip specific files Peter Boyle 2026-05-21 12:34:15 -04:00
  • 7803580aa6 Lattice_reduction_gpu: demote timing logs to Debug, disable by default Peter Boyle 2026-05-21 12:05:36 -04:00
  • 32654db366 Test_planned_fft: fix PlannedFFT template parameter to use ::vector_object Peter Boyle 2026-05-20 18:13:38 -04:00
  • cd340cfab3 tests: add Test_planned_fft exercising PlannedFFT<vobj> Peter Boyle 2026-05-20 17:59:24 -04:00
  • f32866b2ff tests/fft: remove PlanDestroy calls (FFT handles plans per-call) Peter Boyle 2026-05-20 17:54:41 -04:00
  • 1cd1dc091e FFT: add FFTbase, PlannedFFT; factor FFT_dim_execute free function Peter Boyle 2026-05-20 17:53:17 -04:00
  • 0493656e86 debug: add Test_hipfft_repro — reproducer for hipFFT PARSE_ERROR on ROCm 7 Peter Boyle 2026-05-19 22:27:27 -04:00
  • 66fd504c4d tests/debug: add G=4 to hipfft fail reproducer Peter Boyle 2026-05-19 22:21:52 -04:00
  • be4dd2b52f tests/debug: test hipMemset variant before cache is populated Peter Boyle 2026-05-19 22:10:16 -04:00
  • 707d059766 tests/debug: extend hipfft fail reproducer with hipMemset and sync variants Peter Boyle 2026-05-19 22:02:02 -04:00
  • f08c755ae6 FFT: use host stack buffer in PlanCreate, not deviceVector Peter Boyle 2026-05-19 21:49:06 -04:00
  • dbbfdd4e4b tests/debug: add minimal hipfft ordering bug fail/pass pair Peter Boyle 2026-05-19 21:48:23 -04:00
  • f967fb40bf tests/debug: test plan-before-malloc vs malloc-before-plan ordering Peter Boyle 2026-05-19 21:40:17 -04:00
  • 74e0f846cb tests/debug: extend hipfft reproducer with Grid-realistic howmany and exec tests Peter Boyle 2026-05-19 19:19:59 -04:00
  • 303a4d26e5 tests/debug: add minimal hipfft plan-creation reproducer Peter Boyle 2026-05-19 17:52:59 -04:00
  • 119888653c FFT HIP: use hipfftCreate+hipfftMakePlanMany instead of hipfftPlanMany Peter Boyle 2026-05-19 17:29:28 -04:00
  • a9f42c08f9 FFT: pass nullptr for inembed/onembed in hipfftPlanMany to avoid HIPFFT_PARSE_ERROR Peter Boyle 2026-05-19 17:15:21 -04:00
  • e79adc9d31 FFT: cache plans per vobj type across calls Peter Boyle 2026-05-19 15:12:10 -04:00
  • 5a9056cd93 Accelerator: lower default accelerator_threads from 16 to 8 Peter Boyle 2026-05-19 13:41:03 -04:00
  • 012c36ab5a Accelerator: raise default accelerator_threads from 2 to 16 Peter Boyle 2026-05-19 10:15:53 -04:00
  • 5c4574f9aa skills: add gpu-memory-performance.md Peter Boyle 2026-05-19 10:03:32 -04:00
  • a424775884 sumD_gpu_reduce_words: fuse pack+reduce into single packReduceKernel Peter Boyle 2026-05-19 09:46:43 -04:00
  • d6b1388741 Modified repack Peter Boyle 2026-05-19 08:53:13 -04:00
  • 796c6cae4e Enable GRID_REDUCTION_TIMING unconditionally Peter Boyle 2026-05-18 22:14:00 -04:00
  • 1a8064d6d9 Lattice_reduction_gpu: add GRID_REDUCTION_TIMING instrumentation Peter Boyle 2026-05-18 22:13:30 -04:00
  • 43648924c3 sumD_gpu_large: radix-12 word-bundle reduction replacing radix-1 Peter Boyle 2026-05-18 21:56:45 -04:00
  • bf2140e74d Lattice_reduction_sycl: fix double-precision accumulation in sumD_gpu_tensor Peter Boyle 2026-05-18 21:53:40 -04:00
  • a1119266c1 Revert to hand-rolled reduction; drop Lattice_reduction_gpu_cub.h Peter Boyle 2026-05-18 21:52:18 -04:00
  • a0f00c0eca sumD_gpu_direct: revert to per-lane write; CUB handles Nsimd*osites inputs Peter Boyle 2026-05-18 21:23:15 -04:00
  • d358954a84 sumD_gpu_direct: shared-memory lane reduction with acceleratorThreads(1) Peter Boyle 2026-05-18 21:08:10 -04:00
  • aee00bdfb5 sumD_gpu_direct: one thread per SIMD lane using extractLane Peter Boyle 2026-05-18 16:21:50 -04:00
  • cf324b0fa1 Lattice_reduction_gpu_cub: define GRID_REDUCTION_TIMING in header Peter Boyle 2026-05-18 14:54:08 -04:00
  • b314dc224d Lattice_reduction_gpu_cub: add GRID_REDUCTION_TIMING instrumentation Peter Boyle 2026-05-18 14:23:44 -04:00
  • 1bbd62498e Lattice_reduction_gpu_cub: replace WordBundle4 with iVector<iScalar<scalarD>,4> Peter Boyle 2026-05-18 13:55:28 -04:00
  • f3c3b1c04b Test_reduction: add timing benchmark for new vs old reduction paths Peter Boyle 2026-05-18 12:31:13 -04:00
  • 069f98b253 skills: HPC battle-hardening skill files for GPU+MPI correctness Peter Boyle 2026-05-18 12:10:44 -04:00
  • dfd0503eae Test_reduction: use separate float and double grids Peter Boyle 2026-05-18 12:09:35 -04:00
  • c629b2e87e Rename scalarNorm2 to squaredSum in Test_reduction.cc Peter Boyle 2026-05-15 23:15:11 -04:00
  • 7c8462abd1 Fix Zero() used on thrust::complex in WordBundle4 initialisation Peter Boyle 2026-05-15 18:10:17 -04:00
  • 95a6a0bde7 Reinstate large/small dispatch in CUB reduction path; radix-4 word-bundle for large types Peter Boyle 2026-05-15 16:55:58 -04:00
  • bba328fac5 Add Test_reduction to tests/debug Peter Boyle 2026-05-15 14:31:33 -04:00
  • 41362349f3 Rewrite lattice GPU reduction to use CUB, hipCUB, and SYCL reduction Peter Boyle 2026-05-15 13:41:56 -04:00
  • 12e3499b6d Updated rocm 7 compile for ORNL feature/reduction-reorganisation Peter Boyle 2026-05-21 12:28:42 -04:00
  • 9576011011 Changed setup for ROCM 7, nasty LD_LIBRARY_PATH issues were committing evils Peter Boyle 2026-05-21 12:28:04 -04:00
  • 155b34c1aa File list lost Peter Boyle 2026-05-15 15:06:58 -04:00
  • 982ffe9ebe Lattice_reduction_gpu: demote timing logs to Debug, disable by default Peter Boyle 2026-05-21 12:05:36 -04:00
  • 0251ecaeab Test_planned_fft: fix PlannedFFT template parameter to use ::vector_object Peter Boyle 2026-05-20 18:13:38 -04:00
  • 372a27d645 tests: add Test_planned_fft exercising PlannedFFT<vobj> Peter Boyle 2026-05-20 17:59:24 -04:00
  • 72b4a061f3 tests/fft: remove PlanDestroy calls (FFT handles plans per-call) Peter Boyle 2026-05-20 17:54:41 -04:00
  • 29198efabe FFT: add FFTbase, PlannedFFT; factor FFT_dim_execute free function Peter Boyle 2026-05-20 17:53:17 -04:00
  • 50aa51f93a debug: add Test_hipfft_repro — reproducer for hipFFT PARSE_ERROR on ROCm 7 Peter Boyle 2026-05-19 22:27:27 -04:00
  • 79ccc81a86 tests/debug: add G=4 to hipfft fail reproducer Peter Boyle 2026-05-19 22:21:52 -04:00
  • 3f0fdbb597 tests/debug: test hipMemset variant before cache is populated Peter Boyle 2026-05-19 22:10:16 -04:00
  • ea57bd8f03 tests/debug: extend hipfft fail reproducer with hipMemset and sync variants Peter Boyle 2026-05-19 22:02:02 -04:00
  • bdba5b8403 FFT: use host stack buffer in PlanCreate, not deviceVector Peter Boyle 2026-05-19 21:49:06 -04:00
  • 58cc6ca9c0 tests/debug: add minimal hipfft ordering bug fail/pass pair Peter Boyle 2026-05-19 21:48:23 -04:00
  • e5996b440d tests/debug: test plan-before-malloc vs malloc-before-plan ordering Peter Boyle 2026-05-19 21:40:17 -04:00
  • ad9d03fd85 tests/debug: extend hipfft reproducer with Grid-realistic howmany and exec tests Peter Boyle 2026-05-19 19:19:59 -04:00
  • 4de160ce20 tests/debug: add minimal hipfft plan-creation reproducer Peter Boyle 2026-05-19 17:52:59 -04:00
  • fc8c8ce6e7 FFT HIP: use hipfftCreate+hipfftMakePlanMany instead of hipfftPlanMany Peter Boyle 2026-05-19 17:29:28 -04:00
  • ddbb7f07c8 FFT: pass nullptr for inembed/onembed in hipfftPlanMany to avoid HIPFFT_PARSE_ERROR Peter Boyle 2026-05-19 17:15:21 -04:00
  • a5a04929fb Merge pull request #492 from giltirn/develop Peter Boyle 2026-05-19 15:26:58 -04:00
  • 1e29c59bcc FFT: cache plans per vobj type across calls Peter Boyle 2026-05-19 15:12:10 -04:00
  • b6abdc3845 Accelerator: lower default accelerator_threads from 16 to 8 Peter Boyle 2026-05-19 13:41:03 -04:00
  • 77b8657fcc Fixes to support CUDA > 13. Specifically, the CUDA header is no longer accidentally included within Grid's namespace, and the breaking change to cub::Sum() -> ::cuda::std::plus<>{} in CUDA-13 has been worked around Christopher Kelly 2026-05-19 12:22:14 -04:00
  • 2fadd8bb62 Accelerator: raise default accelerator_threads from 2 to 16 Peter Boyle 2026-05-19 10:15:53 -04:00
  • 60df2dd5d0 skills: add gpu-memory-performance.md Peter Boyle 2026-05-19 10:03:32 -04:00
  • 66b529b345 sumD_gpu_reduce_words: fuse pack+reduce into single packReduceKernel Peter Boyle 2026-05-19 09:46:43 -04:00
  • 1304172a93 Modified repack Peter Boyle 2026-05-19 08:53:13 -04:00