1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-06-04 11:14:38 +01:00
Files
Grid/skills/ref_coalesced_views.md
T
Peter Boyle 5822a6599c skills: add GPU/A2A reference skill documents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 11:12:47 -04:00

2.4 KiB

name, description, metadata
name description metadata
ref_coalesced_views Grid coalescedRead/coalescedWrite and autoView — GPU-portable field access inside accelerator_for
node_type type originSessionId
memory reference 956e80aa-401d-481a-80bb-17f8abe1c131

View access modes

autoView(v, field, AcceleratorRead);   // read-only, device-accessible
autoView(v, field, AcceleratorWrite);  // write-only, device-accessible
autoView(v, field, AcceleratorReadWrite); // read-write, device-accessible
autoView(v, field, CpuRead);           // CPU only (avoids GPU migration)
autoView(v, field, CpuWrite);          // CPU only

Views must be opened before accelerator_for and closed (go out of scope) after. Never open a view inside the accelerator_for body.

coalescedRead / coalescedWrite

Inside accelerator_for(ss, oSites, Nsimd, { ... }):

auto site = coalescedRead(v[ss]);          // reads SIMT lane; returns scalar_object on GPU, vobj on CPU
coalescedWrite(v[ss], site);               // writes SIMT lane
  • coalescedRead(v[ss]) calls v.operator()(ss) which on GPU returns extractLane(lane, v[ss]) — one lane per SIMT thread, contiguous across threads → coalesced
  • On CPU returns the full vobj (no lane extraction needed; handled transparently)
  • The returned type is decltype(coalescedRead(v[ss])) — use auto or match with scalar_object

Typical kernel pattern

autoView(out_v, out, AcceleratorWrite);
autoView(in_v,  in,  AcceleratorRead);
accelerator_for(ss, grid->oSites(), vobj::Nsimd(), {
  auto x = coalescedRead(in_v[ss]);
  // modify x ...
  coalescedWrite(out_v[ss], x);
});

Free function kernel signature

template<class vobj>
void MyKernel(Lattice<vobj> &out, const Lattice<vobj> &in)
{
  GridBase *grid = in.Grid();
  autoView(out_v, out, AcceleratorWrite);
  autoView(in_v,  in,  AcceleratorRead);
  accelerator_for(ss, grid->oSites(), vobj::Nsimd(), {
    auto x = coalescedRead(in_v[ss]);
    coalescedWrite(out_v[ss], x);
  });
}

What NOT to do

  • Do not access std::vector elements inside accelerator_for — not device-accessible
  • Do not use CpuRead/CpuWrite views inside accelerator_for — GPU will fault
  • Do not assign to v[ss] directly inside accelerator_for — use coalescedWrite
  • Do not open multiple write views on the same field simultaneously

ref_accelerator_for ref_lattice_vs_vector