mirror of
https://github.com/paboyle/Grid.git
synced 2024-11-09 23:45:36 +00:00
TODO updates
This commit is contained in:
parent
78ebd93281
commit
c0d89a2dbb
50
TODO
50
TODO
@ -1,66 +1,58 @@
|
||||
- Lattice_arith - are the mult, mac etc.. still needed after ET engine?
|
||||
- LinalgUtils ssp loop not offloaded
|
||||
- Mobius/Domain EOFA cache header implementaiotn has thread_loop
|
||||
- ImprovedStaggered accelerate
|
||||
- Lattice_reduction - remnant thread_loops must offload. Audit thread_loop in main code for non-accelerated code
|
||||
Lattice_rng
|
||||
Lattice_transfer.h
|
||||
|
||||
- Stencil.h : Thread loops in exchange code. Need to offload these
|
||||
|
||||
- Lebesque order reintroduction. StencilView should have pointer
|
||||
|
||||
- accelerate A2Autils
|
||||
- accelerate A2Autils -- off critical path for HMC
|
||||
- Lebesque order reintroduction. StencilView should have pointer to it
|
||||
|
||||
GPU branch code item work list
|
||||
-----------------------------
|
||||
|
||||
7) Accelerate the cshift
|
||||
|
||||
7) Accelerate the cshift & benchmark
|
||||
* 0) Single GPU
|
||||
- 128 bit integer table load in GPU code.
|
||||
- coalescedRead <- threadIdx.x
|
||||
- Gianluca's changes to Cayley into gpu-port
|
||||
- GPU accelerate EOFA
|
||||
- Staggered kernels -> GPU coalesced loop
|
||||
- Staggered kernels -> GPU coalesced loop, loop in kernels
|
||||
- Staggered kernels inline for GPU -- DONE
|
||||
|
||||
|
||||
* 2) 5D terms & Gianluca
|
||||
* Gianluca merger
|
||||
- Cayley coefficients -> GPU retention or prefetch
|
||||
- Mobius kernel fusion. -- Gianluca?
|
||||
- Make GPU offload reductions optionally deterministic -- Gianluca
|
||||
- Gianluca's changes to Cayley into gpu-port
|
||||
- Mobius kernel fusion. -- Gianluca?
|
||||
- Make GPU offload reductions deterministic -- Gianluca merge
|
||||
- Lattice_reduction - remnant thread_loops must offload. Audit thread_loop in main code for non-accelerated code
|
||||
|
||||
* 3) Comms/NVlink
|
||||
- OpenMP tasks to run comms threads.
|
||||
- OpenMP tasks to run comms threads. Experiment with it
|
||||
- Remove explicit openMP in staggered.
|
||||
- Single parallel region around both the Kernel call
|
||||
and the comms.
|
||||
- Single parallel region around both the Kernel call and the comms.
|
||||
- Fix the halo exchange SIMT loop
|
||||
- Stencil gather
|
||||
- Stencil gather ??
|
||||
- SIMD dirs in stencil
|
||||
|
||||
* 4) ET enhancements
|
||||
- eval -> scalar ops in ET engine
|
||||
- coalescedRead, coalescedWrite in expressions.
|
||||
- coalescedRead, coalescedWrite in expressions.
|
||||
|
||||
* 5) Misc
|
||||
|
||||
- Conserved current clean up.
|
||||
- multLinkProp eliminate
|
||||
|
||||
|
||||
8) Merge develop and test HMC
|
||||
|
||||
9) Gamma tables on GPU; check this.
|
||||
|
||||
9) Gamma tables on GPU; check this. Appear to work, but no idea why. Are these done on CPU?
|
||||
10) Audit
|
||||
- pragma once uniformly
|
||||
- Audit NAMESPACE CHANGES
|
||||
- Audit changes
|
||||
|
||||
|
||||
=============================================================================================
|
||||
- GPU accelerate EOFA -- DONE
|
||||
- LinalgUtils ssp loop not offloaded -- DONE
|
||||
- coalescedRead <- threadIdx.x -- DONE
|
||||
- Stencil.h : Thread loops in exchange code. Need to offload these -- DONE ; pending debug
|
||||
- Mobius/Domain EOFA cache header implementaiotn has thread_loop -- DONE ; pending test
|
||||
- Differentiate non-temporal coalescedWrite from temporal -- DONE
|
||||
|
||||
- Clean up PRAGMAS, and SIMT_loop -- DONE
|
||||
thread_loop interface revisit.
|
||||
_foreach
|
||||
|
Loading…
Reference in New Issue
Block a user