mirror of
https://github.com/paboyle/Grid.git
synced 2026-06-04 19:24:36 +01:00
5822a6599c
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2.4 KiB
2.4 KiB
name, description, metadata
| name | description | metadata | ||||||
|---|---|---|---|---|---|---|---|---|
| ref_coalesced_views | Grid coalescedRead/coalescedWrite and autoView — GPU-portable field access inside accelerator_for |
|
View access modes
autoView(v, field, AcceleratorRead); // read-only, device-accessible
autoView(v, field, AcceleratorWrite); // write-only, device-accessible
autoView(v, field, AcceleratorReadWrite); // read-write, device-accessible
autoView(v, field, CpuRead); // CPU only (avoids GPU migration)
autoView(v, field, CpuWrite); // CPU only
Views must be opened before accelerator_for and closed (go out of scope) after. Never open a view inside the accelerator_for body.
coalescedRead / coalescedWrite
Inside accelerator_for(ss, oSites, Nsimd, { ... }):
auto site = coalescedRead(v[ss]); // reads SIMT lane; returns scalar_object on GPU, vobj on CPU
coalescedWrite(v[ss], site); // writes SIMT lane
coalescedRead(v[ss])callsv.operator()(ss)which on GPU returnsextractLane(lane, v[ss])— one lane per SIMT thread, contiguous across threads → coalesced- On CPU returns the full vobj (no lane extraction needed; handled transparently)
- The returned type is
decltype(coalescedRead(v[ss]))— useautoor match with scalar_object
Typical kernel pattern
autoView(out_v, out, AcceleratorWrite);
autoView(in_v, in, AcceleratorRead);
accelerator_for(ss, grid->oSites(), vobj::Nsimd(), {
auto x = coalescedRead(in_v[ss]);
// modify x ...
coalescedWrite(out_v[ss], x);
});
Free function kernel signature
template<class vobj>
void MyKernel(Lattice<vobj> &out, const Lattice<vobj> &in)
{
GridBase *grid = in.Grid();
autoView(out_v, out, AcceleratorWrite);
autoView(in_v, in, AcceleratorRead);
accelerator_for(ss, grid->oSites(), vobj::Nsimd(), {
auto x = coalescedRead(in_v[ss]);
coalescedWrite(out_v[ss], x);
});
}
What NOT to do
- Do not access
std::vectorelements insideaccelerator_for— not device-accessible - Do not use
CpuRead/CpuWriteviews insideaccelerator_for— GPU will fault - Do not assign to
v[ss]directly insideaccelerator_for— usecoalescedWrite - Do not open multiple write views on the same field simultaneously