1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-06-18 18:03:44 +01:00
Files
Grid/Grid
Peter Boyle 747c167658 sumD_gpu_direct: one thread per SIMD lane using extractLane
Replaces one thread per outer site calling Reduce() (sequential Nsimd-wide
loop) with one thread per lane calling extractLane() — O(1) per thread.
CUB now reduces over osites*Nsimd elements. Avoids serial lane reduction
but leaves the per-lane sobjD store stride as a known remaining concern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 16:21:50 -04:00
..
2026-05-15 11:30:29 -04:00
2025-11-14 18:12:27 -05:00
2025-05-23 20:57:11 +00:00
2025-08-11 11:06:06 -04:00
2026-04-27 13:54:06 -07:00
2024-06-11 16:41:23 -04:00
2025-11-20 18:22:57 +00:00
2025-06-13 05:03:36 +02:00
2024-03-05 23:56:10 +00:00
2019-07-18 14:51:09 +01:00
2019-02-26 11:29:12 +00:00
2024-02-27 11:38:52 -05:00
2025-10-03 14:35:37 -04:00