From f710d7bd45b7ad2e59953006bf4bbfc0b5055416 Mon Sep 17 00:00:00 2001 From: Peter Boyle Date: Sat, 15 Jun 2019 12:54:27 +0100 Subject: [PATCH] TODO list update --- TODO | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/TODO b/TODO index ca20470c..46fd9b47 100644 --- a/TODO +++ b/TODO @@ -1,17 +1,27 @@ +- Lattice_arith - are the mult, mac etc.. still needed after ET engine? +- LinalgUtils ssp loop not offloaded +- Mobius/Domain EOFA cache header implementaiotn has thread_loop +- ImprovedStaggered accelerate +- Lattice_reduction - remnant thread_loops must offload. Audit thread_loop in main code for non-accelerated code + Lattice_rng + Lattice_transfer.h +- Stencil.h : Thread loops in exchange code. Need to offload these + +- Lebesque order reintroduction. StencilView should have pointer + +- accelerate A2Autils GPU branch code item work list ----------------------------- +7) Accelerate the cshift + * 0) Single GPU - 128 bit integer table load in GPU code. - coalescedRead <- threadIdx.x - Gianluca's changes to Cayley into gpu-port - GPU accelerate EOFA -- Clean up PRAGMAS, and SIMT_loop - thread_loop interface revisit. - for_n - for - Staggered kernels -> GPU coalesced loop - Staggered kernels inline for GPU -- DONE @@ -23,9 +33,12 @@ GPU branch code item work list * 3) Comms/NVlink - OpenMP tasks to run comms threads. +- Remove explicit openMP in staggered. - Single parallel region around both the Kernel call and the comms. - Fix the halo exchange SIMT loop +- Stencil gather +- SIMD dirs in stencil * 4) ET enhancements - eval -> scalar ops in ET engine @@ -35,9 +48,7 @@ GPU branch code item work list - Conserved current clean up. - multLinkProp eliminate -- SIMD dirs in stencil -7) Accelerate the cshift 8) Merge develop and test HMC @@ -50,6 +61,11 @@ GPU branch code item work list ============================================================================================= +- Clean up PRAGMAS, and SIMT_loop -- DONE + thread_loop interface revisit. + _foreach + _for + -- Figure what to do about "multLinkGpu" etc.. in FermionOperatorImpl. -- DONE -- Gparity is the awkward one -- DONE -- Solve non-Gparity first. -- DONE