# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## What This Is Grid is a data-parallel C++ library for lattice QCD. It provides SIMD-vectorised lattice containers, MPI-based domain decomposition, GPU acceleration (CUDA/HIP/SYCL), and a full suite of QCD algorithms including HMC. ## Build Uses GNU Autotools. The bootstrap step only needs to run once (or after `configure.ac` changes). ```bash ./bootstrap.sh # downloads Eigen 3.4.0, generates configure mkdir build && cd build ../configure [options] make -j$(nproc) make check # run root-level tests make install ``` Key configure options: | Option | Common values | |--------|---------------| | `--enable-simd=` | `AVX2`, `AVX512`, `KNL`, `A64FX`, `NEONv8`, `GPU` | | `--enable-comms=` | `mpi-auto`, `mpi3-auto`, `none` | | `--enable-accelerator=` | `cuda`, `hip`, `sycl` | | `--enable-shm=` | `shmopen`, `hugetlbfs`, `nvlink` | | `--enable-Nc=` | `3` (default), `2`, `4`, `5` | | `--with-gmp=`, `--with-mpfr=`, `--with-fftw=`, `--with-lime=` | paths to libs | | `--enable-hdf5`, `--enable-mkl`, `--enable-lapack` | optional features | Platform recipes from `README.md`: - **KNL**: `--enable-simd=KNL --enable-comms=mpi3-auto --enable-mkl` - **Skylake/Haswell**: `--enable-simd=AVX512` or `AVX2` + `--enable-comms=mpi3-auto` - **AMD EPYC**: `--enable-simd=AVX2 --enable-comms=mpi3` - **A64FX (Fugaku)**: `--enable-simd=A64FX --enable-comms=mpi3 --enable-shm=shmget` (see `SVE_README.txt`) Required external libs: GMP, MPFR, OpenSSL, zlib. ## Running Tests ```bash # From build directory make check # root-level tests (Test_simd, Test_cshift, etc.) make -C tests/ tests # build tests in a subdirectory ./tests/core/Test_simd # run a single test binary directly ``` Test subdirectories and their focus: `core` (SIMD, stencil, comms), `solver` (CG, GMRES, eigensolvers), `hmc` (MD integrators), `forces` (fermion forces), `lanczos`, `IO`, `smearing`, `sp2n`, `debug`. ## Architecture ### Layer stack (bottom to top) 1. **SIMD layer** (`Grid/simd/`) — platform-specific intrinsics wrapped into `vRealF`, `vComplexD`, etc. The SIMD width and layout are compile-time constants controlled by `--enable-simd`. 2. **Tensor layer** (`Grid/tensors/`) — Lorentz/colour/spin tensor algebra built on top of SIMD types. `iMatrix`, `iVector`, `iScalar` templates compose into QCD types like `ColourMatrix`, `SpinColourVector`. 3. **Lattice layer** (`Grid/lattice/`) — `Lattice` container: a site-local tensor replicated across a distributed Cartesian grid. All arithmetic is site-parallel and expression-template-fused. 4. **Cartesian/comms layer** (`Grid/cartesian/`, `Grid/communicator/`) — `GridCartesian` holds the MPI topology and local/global geometry. `Grid/cshift/` implements nearest-neighbour halo exchange; `Grid/stencil/` is the optimised multi-hop stencil used by Dirac operators. 5. **Algorithm layer** (`Grid/algorithms/`) — iterative solvers (CG, GMRES, BiCGSTAB, mixed-precision), eigensolvers (Lanczos, LAPACK), FFT, smearing. 6. **QCD layer** (`Grid/qcd/`) — gauge and fermion actions, HMC integrators, observables. ### QCD subsystem (`Grid/qcd/`) - `action/fermion/` — Wilson, Clover, DWF (Mobius), Staggered, twisted-mass, G-parity variants - `action/gauge/` — Wilson gauge, Symanzik, Iwasaki, DBW2, plaquette+rect - `representations/` — Fundamental, Adjoint, Two-index, Sp(2n) - `hmc/` — Leapfrog, OMF2/OMF4 integrators; pseudofermion refreshment; Metropolis accept/reject - `smearing/` — APE, Stout, HEX, gradient flow - `observables/` — Polyakov loop, plaquette, topological charge ### GPU acceleration GPU support is injected via macros (`accelerator_for`, `accelerator_for2dNB`). The `Grid/simd/` SIMD types map to scalar on GPU device code; host code paths remain vectorised. Unified virtual memory is on by default (`--enable-unified=yes`); device-aware MPI (`--enable-accelerator-aware-mpi`) avoids device→host copies on transfers. ### Memory and I/O - `Grid/allocator/` — aligned/NUMA-aware allocators; caching allocator via `--enable-alloc-cache` - `Grid/parallelIO/` — distributed parallel reader/writer for ILDG (via LIME), SciDAC, and native binary formats - `Grid/serialisation/` — text, binary, HDF5, XML/JSON serialisation of arbitrary Grid objects ### HMC applications `HMC/` contains production-ready HMC driver programmes (e.g. `Mobius2p1f.cc`, `DWF_plus_DSDR_nf2plus1_Shamir_Gparity.cc`). These are built separately from the library tests. ## Key Conventions - **C++17** is required throughout. - Template structure: most classes are templated on `<_FImpl>` (fermion impl) or `` (gauge impl), which encode the representation and precision. Instantiation is controlled by `--enable-fermion-instantiations`. - The `RealD`/`RealF`/`ComplexD`/`ComplexF` typedefs are used everywhere; avoid raw `double`/`float`. - Logging uses `Grid_log`, `Grid_error` macros (from `Grid/log/`); performance-critical paths use the `GRID_TRACE` / timer macros from `Grid/perfmon/`. - Reductions across MPI ranks go through `GridBase::GlobalSum` / `GlobalMax`; never reduce with bare MPI calls inside library code.