1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-13 01:05:36 +00:00

TODO updates

This commit is contained in:
Peter Boyle 2019-07-12 17:11:15 +01:00
parent 78ebd93281
commit c0d89a2dbb

50
TODO
View File

@ -1,66 +1,58 @@
- Lattice_arith - are the mult, mac etc.. still needed after ET engine? - Lattice_arith - are the mult, mac etc.. still needed after ET engine?
- LinalgUtils ssp loop not offloaded
- Mobius/Domain EOFA cache header implementaiotn has thread_loop
- ImprovedStaggered accelerate - ImprovedStaggered accelerate
- Lattice_reduction - remnant thread_loops must offload. Audit thread_loop in main code for non-accelerated code
Lattice_rng Lattice_rng
Lattice_transfer.h Lattice_transfer.h
- Stencil.h : Thread loops in exchange code. Need to offload these - accelerate A2Autils -- off critical path for HMC
- Lebesque order reintroduction. StencilView should have pointer to it
- Lebesque order reintroduction. StencilView should have pointer
- accelerate A2Autils
GPU branch code item work list GPU branch code item work list
----------------------------- -----------------------------
7) Accelerate the cshift 7) Accelerate the cshift & benchmark
* 0) Single GPU * 0) Single GPU
- 128 bit integer table load in GPU code. - 128 bit integer table load in GPU code.
- coalescedRead <- threadIdx.x - Staggered kernels -> GPU coalesced loop, loop in kernels
- Gianluca's changes to Cayley into gpu-port
- GPU accelerate EOFA
- Staggered kernels -> GPU coalesced loop
- Staggered kernels inline for GPU -- DONE - Staggered kernels inline for GPU -- DONE
* Gianluca merger
* 2) 5D terms & Gianluca
- Cayley coefficients -> GPU retention or prefetch - Cayley coefficients -> GPU retention or prefetch
- Mobius kernel fusion. -- Gianluca? - Gianluca's changes to Cayley into gpu-port
- Make GPU offload reductions optionally deterministic -- Gianluca - Mobius kernel fusion. -- Gianluca?
- Make GPU offload reductions deterministic -- Gianluca merge
- Lattice_reduction - remnant thread_loops must offload. Audit thread_loop in main code for non-accelerated code
* 3) Comms/NVlink * 3) Comms/NVlink
- OpenMP tasks to run comms threads. - OpenMP tasks to run comms threads. Experiment with it
- Remove explicit openMP in staggered. - Remove explicit openMP in staggered.
- Single parallel region around both the Kernel call - Single parallel region around both the Kernel call and the comms.
and the comms.
- Fix the halo exchange SIMT loop - Fix the halo exchange SIMT loop
- Stencil gather - Stencil gather ??
- SIMD dirs in stencil - SIMD dirs in stencil
* 4) ET enhancements * 4) ET enhancements
- eval -> scalar ops in ET engine - eval -> scalar ops in ET engine
- coalescedRead, coalescedWrite in expressions. - coalescedRead, coalescedWrite in expressions.
* 5) Misc * 5) Misc
- Conserved current clean up. - Conserved current clean up.
- multLinkProp eliminate - multLinkProp eliminate
8) Merge develop and test HMC 8) Merge develop and test HMC
9) Gamma tables on GPU; check this. Appear to work, but no idea why. Are these done on CPU?
9) Gamma tables on GPU; check this.
10) Audit 10) Audit
- pragma once uniformly - pragma once uniformly
- Audit NAMESPACE CHANGES - Audit NAMESPACE CHANGES
- Audit changes - Audit changes
============================================================================================= =============================================================================================
- GPU accelerate EOFA -- DONE
- LinalgUtils ssp loop not offloaded -- DONE
- coalescedRead <- threadIdx.x -- DONE
- Stencil.h : Thread loops in exchange code. Need to offload these -- DONE ; pending debug
- Mobius/Domain EOFA cache header implementaiotn has thread_loop -- DONE ; pending test
- Differentiate non-temporal coalescedWrite from temporal -- DONE
- Clean up PRAGMAS, and SIMT_loop -- DONE - Clean up PRAGMAS, and SIMT_loop -- DONE
thread_loop interface revisit. thread_loop interface revisit.
_foreach _foreach