mirror of
https://github.com/paboyle/Grid.git
synced 2024-11-13 01:05:36 +00:00
TODO list update
This commit is contained in:
parent
cb336aa8f8
commit
f710d7bd45
28
TODO
28
TODO
@ -1,17 +1,27 @@
|
|||||||
|
- Lattice_arith - are the mult, mac etc.. still needed after ET engine?
|
||||||
|
- LinalgUtils ssp loop not offloaded
|
||||||
|
- Mobius/Domain EOFA cache header implementaiotn has thread_loop
|
||||||
|
- ImprovedStaggered accelerate
|
||||||
|
- Lattice_reduction - remnant thread_loops must offload. Audit thread_loop in main code for non-accelerated code
|
||||||
|
Lattice_rng
|
||||||
|
Lattice_transfer.h
|
||||||
|
|
||||||
|
- Stencil.h : Thread loops in exchange code. Need to offload these
|
||||||
|
|
||||||
|
- Lebesque order reintroduction. StencilView should have pointer
|
||||||
|
|
||||||
|
- accelerate A2Autils
|
||||||
|
|
||||||
GPU branch code item work list
|
GPU branch code item work list
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
|
||||||
|
7) Accelerate the cshift
|
||||||
|
|
||||||
* 0) Single GPU
|
* 0) Single GPU
|
||||||
- 128 bit integer table load in GPU code.
|
- 128 bit integer table load in GPU code.
|
||||||
- coalescedRead <- threadIdx.x
|
- coalescedRead <- threadIdx.x
|
||||||
- Gianluca's changes to Cayley into gpu-port
|
- Gianluca's changes to Cayley into gpu-port
|
||||||
- GPU accelerate EOFA
|
- GPU accelerate EOFA
|
||||||
- Clean up PRAGMAS, and SIMT_loop
|
|
||||||
thread_loop interface revisit.
|
|
||||||
for_n
|
|
||||||
for
|
|
||||||
- Staggered kernels -> GPU coalesced loop
|
- Staggered kernels -> GPU coalesced loop
|
||||||
- Staggered kernels inline for GPU -- DONE
|
- Staggered kernels inline for GPU -- DONE
|
||||||
|
|
||||||
@ -23,9 +33,12 @@ GPU branch code item work list
|
|||||||
|
|
||||||
* 3) Comms/NVlink
|
* 3) Comms/NVlink
|
||||||
- OpenMP tasks to run comms threads.
|
- OpenMP tasks to run comms threads.
|
||||||
|
- Remove explicit openMP in staggered.
|
||||||
- Single parallel region around both the Kernel call
|
- Single parallel region around both the Kernel call
|
||||||
and the comms.
|
and the comms.
|
||||||
- Fix the halo exchange SIMT loop
|
- Fix the halo exchange SIMT loop
|
||||||
|
- Stencil gather
|
||||||
|
- SIMD dirs in stencil
|
||||||
|
|
||||||
* 4) ET enhancements
|
* 4) ET enhancements
|
||||||
- eval -> scalar ops in ET engine
|
- eval -> scalar ops in ET engine
|
||||||
@ -35,9 +48,7 @@ GPU branch code item work list
|
|||||||
|
|
||||||
- Conserved current clean up.
|
- Conserved current clean up.
|
||||||
- multLinkProp eliminate
|
- multLinkProp eliminate
|
||||||
- SIMD dirs in stencil
|
|
||||||
|
|
||||||
7) Accelerate the cshift
|
|
||||||
|
|
||||||
8) Merge develop and test HMC
|
8) Merge develop and test HMC
|
||||||
|
|
||||||
@ -50,6 +61,11 @@ GPU branch code item work list
|
|||||||
|
|
||||||
|
|
||||||
=============================================================================================
|
=============================================================================================
|
||||||
|
- Clean up PRAGMAS, and SIMT_loop -- DONE
|
||||||
|
thread_loop interface revisit.
|
||||||
|
_foreach
|
||||||
|
_for
|
||||||
|
|
||||||
-- Figure what to do about "multLinkGpu" etc.. in FermionOperatorImpl. -- DONE
|
-- Figure what to do about "multLinkGpu" etc.. in FermionOperatorImpl. -- DONE
|
||||||
-- Gparity is the awkward one -- DONE
|
-- Gparity is the awkward one -- DONE
|
||||||
-- Solve non-Gparity first. -- DONE
|
-- Solve non-Gparity first. -- DONE
|
||||||
|
Loading…
Reference in New Issue
Block a user