mirror of
https://github.com/paboyle/Grid.git
synced 2026-06-04 19:24:36 +01:00
Add staggered HDCG multigrid test and mac-arm Homebrew build scripts
Test_staggered_hdcg.cc implements a two-level ADEF2 multigrid solver for
NaiveStaggeredFermion using SchurStaggeredOperator, following the mrhs
hermitian multigrid approach of arXiv:2409.03904. Uses a 33-point coarse
stencil (NextToNearestStencilGeometry4D) with nbasis=24, block={4,4,4,4},
and Chebyshev subspace generation with hi=5.0 (lambda_max ~4.6).
Also adds systems/mac-arm/sourceme-homebrew.sh and config-command-homebrew
for building Grid on Apple Silicon with Homebrew-installed dependencies.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -30,6 +30,9 @@ Key configure options:
|
||||
| `--enable-Nc=` | `3` (default), `2`, `4`, `5` |
|
||||
| `--with-gmp=`, `--with-mpfr=`, `--with-fftw=`, `--with-lime=` | paths to libs |
|
||||
| `--enable-hdf5`, `--enable-mkl`, `--enable-lapack` | optional features |
|
||||
| `--enable-debug=yes` | adds `-g`, removes `-O3` |
|
||||
|
||||
Use `make V=1` for verbose compiler output (shows full flags; required for bug reports).
|
||||
|
||||
Platform recipes from `README.md`:
|
||||
- **KNL**: `--enable-simd=KNL --enable-comms=mpi3-auto --enable-mkl`
|
||||
@@ -96,3 +99,17 @@ GPU support is injected via macros (`accelerator_for`, `accelerator_for2dNB`). T
|
||||
- The `RealD`/`RealF`/`ComplexD`/`ComplexF` typedefs are used everywhere; avoid raw `double`/`float`.
|
||||
- Logging uses `Grid_log`, `Grid_error` macros (from `Grid/log/`); performance-critical paths use the `GRID_TRACE` / timer macros from `Grid/perfmon/`.
|
||||
- Reductions across MPI ranks go through `GridBase::GlobalSum` / `GlobalMax`; never reduce with bare MPI calls inside library code.
|
||||
|
||||
## Skills
|
||||
|
||||
`skills/` contains seven user-invocable Claude Code skills encoding deep domain knowledge for HPC work in this repo. Invoke with `/skill-name` or ask Claude to use them by name:
|
||||
|
||||
| Skill | When to use |
|
||||
|-------|-------------|
|
||||
| `gpu-memory-performance` | Bandwidth/occupancy problems — `acceleratorThreads()` pitfalls, `coalescedRead/Write`, fused vs staged HBM patterns |
|
||||
| `gpu-runtime-correctness` | Wrong answers, non-deterministic results, premature `q.wait()` returns |
|
||||
| `communication-overlap` | Designing GPU+MPI overlap pipelines; replacing broken accelerator-aware MPI paths with host-staged 7-phase pipeline |
|
||||
| `mpi-heterogeneous` | Collective hangs, buffer aliasing in `MPI_Sendrecv`, heterogeneous topology bugs |
|
||||
| `hang-diagnosis` | Distinguishing ioctl hangs, infinite poll loops, collective deadlocks, and silent wrong-answer races |
|
||||
| `correctness-verification` | Reproducibility checksums, double-wait testing, bisecting non-deterministic failures |
|
||||
| `compiler-validation` | Confirming compiler/optimisation flags are safe before production runs |
|
||||
|
||||
Reference in New Issue
Block a user