mirror of
https://github.com/paboyle/Grid.git
synced 2026-06-04 11:14:38 +01:00
5822a6599c
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1.7 KiB
1.7 KiB
name, description, metadata
| name | description | metadata | ||||||
|---|---|---|---|---|---|---|---|---|
| ref_accelerator_for | Grid accelerator_for usage — converting block-strided thread_for to GPU-portable oSites loops |
|
Pattern: block-strided thread_for → accelerator_for over oSites
Old CPU-only pattern (block-strided over orthog dimension):
thread_for(r, rd, {
int so = r * grid->_ostride[orthogdim];
for (int n = 0; n < e1; n++)
for (int b = 0; b < e2; b++) {
int ss = so + n * stride + b;
// work on site ss
}
});
GPU-portable replacement:
accelerator_for(ss, grid->oSites(), Nsimd, {
// work on site ss — one SIMT thread per (osite, lane) on GPU
// one thread per osite (lane loop implicit via GRID_SIMT) on CPU
});
Key rules:
accelerator_for(iter, count, Nsimd, body)— Nsimd isvobj::Nsimd()orgrid->Nsimd()- On CPU: expands to
thread_forover count,acceleratorSIMTlanealways returns 0 — must use#ifdef GRID_SIMTpattern if iterating lanes explicitly (see ref_grid_simt_pattern) - On GPU: one SIMT thread per (iter × lane),
acceleratorSIMTlane(Nsimd)returns actual lane - Loop body must capture only scalar/POD by value or via device-accessible pointers; no
std::vectoror host containers inside the body Coordinateinsideaccelerator_formust beAcceleratorVector<int, MaxDims>(stack-allocated, device-safe) — Grid'sCoordinatetypedef already satisfies this
Where defined
Grid/threads/Accelerator.h — CPU path ~line 607; GPU paths in conditional blocks above.
Model file
Grid/algorithms/blas/MomentumProject.h — ImportVector is the canonical example of correct accelerator_for + SIMD lane extraction.