mirror of
https://github.com/paboyle/Grid.git
synced 2026-05-27 12:34:16 +01:00
5c4574f9aa
Documents the acceleratorThreads() default=2 trap, LambdaApply thread mapping, coalescedRead/Write idiom, when to use __global__ vs accelerator_for, and fused vs staged HBM access patterns. Includes observed MI250X numbers from LatticePropagatorD reduction (50 → 297 → 546 GB/s progression). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>