portelli/Grid - Grid - DiRAC Tursa git server

portelli/Grid

Fork 0

mirror of https://github.com/paboyle/Grid.git synced 2026-06-30 07:23:29 +01:00

Commit Graph

Author	SHA1	Message	Date
Peter Boyle	5c4574f9aa	skills: add gpu-memory-performance.md Documents the acceleratorThreads() default=2 trap, LambdaApply thread mapping, coalescedRead/Write idiom, when to use __global__ vs accelerator_for, and fused vs staged HBM access patterns. Includes observed MI250X numbers from LatticePropagatorD reduction (50 → 297 → 546 GB/s progression). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00
Peter Boyle	069f98b253	skills: HPC battle-hardening skill files for GPU+MPI correctness Six skill files encoding expertise for making codebases robust on problematic HPC systems, covering: correctness verification (double-run, fingerprinting, flight recorder), hang diagnosis, GPU runtime correctness (premature barrier, infinite poll), MPI correctness on heterogeneous systems (device buffer aliasing, AARCH64 PLT corruption, deterministic reductions), compiler validation, and communication/computation overlap pipeline design. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 12:34:30 -04:00

Author

SHA1

Message

Date

Peter Boyle

5c4574f9aa

skills: add gpu-memory-performance.md

Documents the acceleratorThreads() default=2 trap, LambdaApply thread
mapping, coalescedRead/Write idiom, when to use __global__ vs
accelerator_for, and fused vs staged HBM access patterns.

Includes observed MI250X numbers from LatticePropagatorD reduction
(50 → 297 → 546 GB/s progression).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-21 12:34:30 -04:00

Peter Boyle

069f98b253

skills: HPC battle-hardening skill files for GPU+MPI correctness

Six skill files encoding expertise for making codebases robust on
problematic HPC systems, covering: correctness verification
(double-run, fingerprinting, flight recorder), hang diagnosis,
GPU runtime correctness (premature barrier, infinite poll),
MPI correctness on heterogeneous systems (device buffer aliasing,
AARCH64 PLT corruption, deterministic reductions),
compiler validation, and communication/computation overlap pipeline
design.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-21 12:34:30 -04:00

2 Commits