mirror of
https://github.com/paboyle/Grid.git
synced 2024-11-10 07:55:35 +00:00
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
This commit is contained in:
commit
3c67d626ba
75
TODO
75
TODO
@ -1,3 +1,6 @@
|
|||||||
|
-- comms threads issue??
|
||||||
|
-- Part done: Staggered kernel performance on GPU
|
||||||
|
|
||||||
=========================================================
|
=========================================================
|
||||||
General
|
General
|
||||||
=========================================================
|
=========================================================
|
||||||
@ -5,28 +8,18 @@ General
|
|||||||
- Make representations code take Gimpl
|
- Make representations code take Gimpl
|
||||||
- Simplify the HMCand remove modules
|
- Simplify the HMCand remove modules
|
||||||
- Lattice_arith - are the mult, mac etc.. still needed after ET engine?
|
- Lattice_arith - are the mult, mac etc.. still needed after ET engine?
|
||||||
- Lattice_rng
|
- Lattice_rng - faster local only loop in init
|
||||||
- Lattice_transfer.h
|
- Audit: accelerate A2Autils -- off critical path for HMC
|
||||||
- accelerate A2Autils -- off critical path for HMC
|
|
||||||
|
|
||||||
=========================================================
|
=========================================================
|
||||||
GPU branch code item work list
|
GPU work list
|
||||||
=========================================================
|
=========================================================
|
||||||
|
|
||||||
* sum_cpu promote to double during summation for increased precisoin.
|
* sum_cpu promote to double during summation for increased precision.
|
||||||
* Introduce sumD & ReduceD
|
* Introduce sumD & ReduceD
|
||||||
* GPU sum is probably better currently.
|
* GPU sum is probably better currently.
|
||||||
|
|
||||||
* Accelerate the cshift & benchmark
|
* Accelerate the cshift & benchmark
|
||||||
|
|
||||||
* 0) Single GPU
|
|
||||||
- 128 bit integer table load in GPU code.
|
|
||||||
- ImprovedStaggered accelerate & measure perf
|
|
||||||
- Gianluca's changes to Cayley into gpu-port
|
|
||||||
- Mobius kernel fusion. -- Gianluca?
|
|
||||||
- Lebesque order reintroduction. StencilView should have pointer to it
|
|
||||||
- Lebesgue reorder in all kernels
|
|
||||||
|
|
||||||
* 3) Comms/NVlink
|
* 3) Comms/NVlink
|
||||||
- OpenMP tasks to run comms threads. Experiment with it
|
- OpenMP tasks to run comms threads. Experiment with it
|
||||||
- Remove explicit openMP in staggered.
|
- Remove explicit openMP in staggered.
|
||||||
@ -35,14 +28,6 @@ GPU branch code item work list
|
|||||||
- Stencil gather ??
|
- Stencil gather ??
|
||||||
- SIMD dirs in stencil
|
- SIMD dirs in stencil
|
||||||
|
|
||||||
* 4) ET enhancements
|
|
||||||
- eval -> scalar ops in ET engine
|
|
||||||
- coalescedRead, coalescedWrite in expressions.
|
|
||||||
|
|
||||||
* 5) Misc
|
|
||||||
- Conserved current clean up.
|
|
||||||
- multLinkProp eliminate
|
|
||||||
|
|
||||||
8) Merge develop and test HMC
|
8) Merge develop and test HMC
|
||||||
|
|
||||||
9) Gamma tables on GPU; check this. Appear to work, but no idea why. Are these done on CPU?
|
9) Gamma tables on GPU; check this. Appear to work, but no idea why. Are these done on CPU?
|
||||||
@ -52,7 +37,7 @@ GPU branch code item work list
|
|||||||
- Audit NAMESPACE CHANGES
|
- Audit NAMESPACE CHANGES
|
||||||
- Audit changes
|
- Audit changes
|
||||||
|
|
||||||
-----
|
---------
|
||||||
Gianluca's changes
|
Gianluca's changes
|
||||||
- Performance impact of construct in aligned allocator???
|
- Performance impact of construct in aligned allocator???
|
||||||
---------
|
---------
|
||||||
@ -62,6 +47,33 @@ Gianluca's changes
|
|||||||
-----------------------------
|
-----------------------------
|
||||||
DONE:
|
DONE:
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
=====
|
||||||
|
-- Done: Remez X^-1/2 X^-1/2 X = 1 test.
|
||||||
|
Feed in MdagM^2 as a test and take its sqrt.
|
||||||
|
Automated test that MdagM invsqrt(MdagM)invsqrt(MdagM) = 1 in HMC for bounds satisfaction.
|
||||||
|
|
||||||
|
-- Done: Sycl Kernels into develop. Compare to existing unroll and just use.
|
||||||
|
-- Done: sRNG into refresh functions
|
||||||
|
-- Done: Tuned decomposition on CUDA into develop
|
||||||
|
-- Done: Sycl friend accessor. Const view attempt via typedef??
|
||||||
|
|
||||||
|
|
||||||
|
* Done 5) Misc
|
||||||
|
- Conserved current clean up.
|
||||||
|
- multLinkProp eliminate
|
||||||
|
|
||||||
|
* Done 0) Single GPU
|
||||||
|
- 128 bit integer table load in GPU code.
|
||||||
|
- ImprovedStaggered accelerate & measure perf
|
||||||
|
- Gianluca's changes to Cayley into gpu-port
|
||||||
|
- Mobius kernel fusion. -- Gianluca?
|
||||||
|
- Lebesque order reintroduction. StencilView should have pointer to it
|
||||||
|
- Lebesgue reorder in all kernels
|
||||||
|
|
||||||
|
* 4) ET enhancements
|
||||||
|
- Done eval -> scalar ops in ET engine
|
||||||
|
- Done coalescedRead, coalescedWrite in expressions.
|
||||||
|
|
||||||
=============================================================================================
|
=============================================================================================
|
||||||
AUDIT ContractWWVV with respect to develop -- DONE
|
AUDIT ContractWWVV with respect to develop -- DONE
|
||||||
- GPU accelerate EOFA -- DONE
|
- GPU accelerate EOFA -- DONE
|
||||||
@ -125,23 +137,6 @@ AUDIT ContractWWVV with respect to develop -- DONE
|
|||||||
- - (4) omp parallel for collapse(n)
|
- - (4) omp parallel for collapse(n)
|
||||||
- - Only (1) has a natural mirror in accelerator_loop
|
- - Only (1) has a natural mirror in accelerator_loop
|
||||||
- - Nested loop macros get cumbersome made a generic interface for N deep
|
- - Nested loop macros get cumbersome made a generic interface for N deep
|
||||||
- - Don't like thread_region and thread_loop_in_region
|
|
||||||
- - Could replace with
|
|
||||||
|
|
||||||
thread_nested(1,
|
|
||||||
for {
|
|
||||||
|
|
||||||
}
|
|
||||||
);
|
|
||||||
thread_nested(2,
|
|
||||||
for (){
|
|
||||||
for (){
|
|
||||||
|
|
||||||
}
|
|
||||||
}
|
|
||||||
);
|
|
||||||
|
|
||||||
and same "in_region".
|
|
||||||
|
|
||||||
|
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
Loading…
Reference in New Issue
Block a user