Grid/Grid/lattice at ad9d03fd8536a42692acc176a0aa3d856fe8b89e - Grid - DiRAC Tursa git server

portelli/Grid

mirror of https://github.com/paboyle/Grid.git synced 2026-05-20 00:54:30 +01:00

Files

T

History

Peter Boyle 66b529b345 sumD_gpu_reduce_words: fuse pack+reduce into single packReduceKernel

Replace the two-kernel pack+reduce sequence with a single fused kernel
packReduceKernel<R> that reads R words of each vobj at offset 'base'
and accumulates directly into iVector<iScalar<scalarD>,R>, eliminating
the intermediate bundle buffer entirely.

HBM access per word-group drops from 3x (pack-read + pack-write +
reduce-read) to 1x.  Thread count comes from getNumBlocksAndThreads
(warpSize..256) rather than acceleratorThreads(), so occupancy is
correct regardless of the --accelerator-threads setting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-19 09:46:43 -04:00

..

Lattice_arith.h

Fast axpy norm under CFLAG

2024-10-11 03:23:09 +00:00

Lattice_base.h

Assertion updates to macros (mostly) with backtrace.

2025-08-07 15:48:38 +00:00

Lattice_basis.h

Assertion updates to macros (mostly) with backtrace.

2025-08-07 15:48:38 +00:00

Lattice_comparison_utils.h

GPU reductions first cut; use thrust, non-reproducible. Inclusive scan can fix this if desired.

2019-01-01 13:53:37 +00:00

Lattice_comparison.h

Remove dead commented ouot coode

2020-08-31 23:40:29 -04:00

Lattice_conformable.h

Assertion updates to macros (mostly) with backtrace.

2025-08-07 15:48:38 +00:00

Lattice_coordinate.h

Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes.

2020-06-05 18:52:35 -04:00

Lattice_crc.h

Merge branch 'develop' into feature/scidac-wp1

2024-03-06 14:55:21 -05:00

Lattice_ET.h

Assertion updates to macros (mostly) with backtrace.

2025-08-07 15:48:38 +00:00

Lattice_local.h

Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes.

2020-06-05 18:52:35 -04:00

Lattice_matrix_reduction.h

Assertion updates to macros (mostly) with backtrace.

2025-08-07 15:48:38 +00:00

Lattice_peekpoke.h

Assertion updates to macros (mostly) with backtrace.

2025-08-07 15:48:38 +00:00

Lattice_real_imag.h

real and imag part not in ET

2020-08-31 23:56:26 -04:00

Lattice_reality.h

happy compile

2020-10-14 22:59:41 -04:00

Lattice_reduction_gpu.h

sumD_gpu_reduce_words: fuse pack+reduce into single packReduceKernel

2026-05-19 09:46:43 -04:00

Lattice_reduction_sycl.h

Lattice_reduction_sycl: fix double-precision accumulation in sumD_gpu_tensor

2026-05-18 21:53:40 -04:00

Lattice_reduction.h

Revert to hand-rolled reduction; drop Lattice_reduction_gpu_cub.h

2026-05-18 21:52:18 -04:00

Lattice_rng.h

Assertion updates to macros (mostly) with backtrace.

2025-08-07 15:48:38 +00:00

Lattice_slicesum_core.h

No compile fix

2025-04-04 18:35:05 -04:00

Lattice_trace.h

Merge remote-tracking branch 'LupoA/develop' into LupoA-develop

2023-10-02 16:22:35 -04:00

Lattice_transfer.h

Missed one

2025-08-14 20:25:54 +00:00

Lattice_transpose.h

Merge branch 'develop' into sycl

2020-06-09 04:00:12 -04:00

Lattice_unary.h

Global edit with change to View usage. autoView() creates a wrapper object that closes the view when scope closes.

2020-06-05 18:52:35 -04:00

Lattice_view.h

Updated to compile and run fast on CUDA

2025-08-10 00:00:13 +01:00

Lattice_where.h

Update thread issue

2021-03-12 14:55:07 +01:00

Lattice.h

Hack for flight logging CG inner products.

2024-03-05 23:59:57 +00:00

PaddedCell.h

Assertion updates to macros (mostly) with backtrace.

2025-08-07 15:48:38 +00:00