1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-06-04 19:24:36 +01:00
Files
Grid/skills/ref_coalesced_views.md
Peter Boyle 5822a6599c skills: add GPU/A2A reference skill documents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 11:12:47 -04:00

71 lines
2.4 KiB
Markdown

---
name: ref_coalesced_views
description: Grid coalescedRead/coalescedWrite and autoView — GPU-portable field access inside accelerator_for
metadata:
node_type: memory
type: reference
originSessionId: 956e80aa-401d-481a-80bb-17f8abe1c131
---
## View access modes
```cpp
autoView(v, field, AcceleratorRead); // read-only, device-accessible
autoView(v, field, AcceleratorWrite); // write-only, device-accessible
autoView(v, field, AcceleratorReadWrite); // read-write, device-accessible
autoView(v, field, CpuRead); // CPU only (avoids GPU migration)
autoView(v, field, CpuWrite); // CPU only
```
Views must be opened **before** `accelerator_for` and closed (go out of scope) **after**. Never open a view inside the accelerator_for body.
## coalescedRead / coalescedWrite
Inside `accelerator_for(ss, oSites, Nsimd, { ... })`:
```cpp
auto site = coalescedRead(v[ss]); // reads SIMT lane; returns scalar_object on GPU, vobj on CPU
coalescedWrite(v[ss], site); // writes SIMT lane
```
- `coalescedRead(v[ss])` calls `v.operator()(ss)` which on GPU returns `extractLane(lane, v[ss])` — one lane per SIMT thread, contiguous across threads → coalesced
- On CPU returns the full vobj (no lane extraction needed; handled transparently)
- The returned type is `decltype(coalescedRead(v[ss]))` — use `auto` or match with scalar_object
## Typical kernel pattern
```cpp
autoView(out_v, out, AcceleratorWrite);
autoView(in_v, in, AcceleratorRead);
accelerator_for(ss, grid->oSites(), vobj::Nsimd(), {
auto x = coalescedRead(in_v[ss]);
// modify x ...
coalescedWrite(out_v[ss], x);
});
```
## Free function kernel signature
```cpp
template<class vobj>
void MyKernel(Lattice<vobj> &out, const Lattice<vobj> &in)
{
GridBase *grid = in.Grid();
autoView(out_v, out, AcceleratorWrite);
autoView(in_v, in, AcceleratorRead);
accelerator_for(ss, grid->oSites(), vobj::Nsimd(), {
auto x = coalescedRead(in_v[ss]);
coalescedWrite(out_v[ss], x);
});
}
```
## What NOT to do
- Do not access `std::vector` elements inside `accelerator_for` — not device-accessible
- Do not use `CpuRead`/`CpuWrite` views inside `accelerator_for` — GPU will fault
- Do not assign to `v[ss]` directly inside `accelerator_for` — use `coalescedWrite`
- Do not open multiple write views on the same field simultaneously
## Related
[[ref_accelerator_for]] [[ref_lattice_vs_vector]]