Christopher Kelly
1db58a8acc
Precision change improvements
...
Added a new, much faster implementation of precision change that uses (optionally) a precomputed workspace containing pointer offsets that is device resident, such that all lattice copying occurs only on the device and no host<->device transfer is required, other than the pointer table. It also avoids the need to unpack and repack the fields using explicit lane copying. When this new precisionChange is called without a workspace, one will be computed on-the-fly; however it is still considerably faster than the original implementation.
In the special case of using double2 and when the Grids are the same, calls to the new precisionChange will automatically use precisionChangeFast, such that there is a single API call for all precision changes.
Reliable update and mixed-prec multishift have been modified to precompute precision change workspaces
Renamed the original precisionChange as precisionChangeOrig
Fixed incorrect pointer offset bug in copyLane
Added a test and a benchmark for precisionChange
Added a test for reliable update CG
2023-02-21 10:52:42 -05:00
920a51438d
Added batched Mixed precision CG
2023-02-14 17:04:13 +00:00
be528b6d27
Add batched block project/promote functions
2023-02-14 14:37:10 +00:00
Alessandro Lupo
f73691ec47
Merge pull request #18 from nickforce989/sp2n/newbranch
...
Sp2n/newbranch
2023-02-13 10:22:27 +01:00
Peter Boyle
ccd21f96ff
Plaquette agreeing and moving to final form (slowly) need to optimise
2023-02-01 22:57:44 -05:00
Peter Boyle
4b90cb8888
First cut passes combining padded cell with general stencil towards fast plaquette and staggered force
2023-02-01 22:14:10 -05:00
Niccolo Forzano
7ebda3e9ec
Merge commit 'b10e1b7bc8bec809f874e9e48a3ccc7b2619c9d1' into sp2n/newbranch
2023-01-19 12:10:18 +00:00
Niccolo Forzano
b10e1b7bc8
Fixed files giving zero force computation on GPU, issue #8
2023-01-18 18:04:47 +00:00
Peter Boyle
796abfad80
Merge pull request #422 from fjosw/fix/NVCC_DIAG_PRAGMA_SUPPORT
...
Disable diagnostic pragma warnings for CUDA 12+
2023-01-17 09:34:49 -05:00
ad0270ac8c
fix: diagnostic pragma warnings fixed for CUDA 12+
2023-01-12 12:36:30 +00:00
Makis Kappas
7d62f1d6d2
Populate the Cshift_table in the GPU
...
Cshift is allocated in Unified memory and used
in the LambdaApply kernels but also populated
from the host. This creates a lot of Unified HtoD
and DtoH mem operations and has a negative effect
in performance. With this commit we populate the
Cshift table in the device with the
populate_Cshift_table() kernel.
2023-01-11 21:26:25 +00:00
Christoph Lehner
458c943987
merged upstream
2022-12-31 11:16:21 +02:00
Christoph Lehner
88015b0858
Split sum in rankSum and GlobalSum
2022-12-26 10:01:32 +01:00
Peter Boyle
4ca1bf7cca
Added gauge invariance test
2022-12-21 07:23:16 -05:00
Peter Boyle
2ff868f7a5
CPU open doesn't need to free space
2022-12-20 05:10:23 -05:00
Peter Boyle
ede02b6883
Memory manager debug Felix case
2022-12-20 05:10:23 -05:00
Peter Boyle
1822ced302
Bug fix
2022-12-20 05:10:23 -05:00
Peter Boyle
37ba32776f
More logging
2022-12-20 05:10:23 -05:00
Peter Boyle
99b3697b03
More loggin
2022-12-20 05:10:23 -05:00
Peter Boyle
43a45ec97b
SSC_START
2022-12-20 05:10:23 -05:00
Peter Boyle
b00a4142e5
A=A fix
2022-12-20 05:10:23 -05:00
Peter Boyle
3791bc527b
Logging pulled in from dirichlet branch
2022-12-20 05:10:23 -05:00
Alessandro Lupo
d7dea44ce7
Merge pull request #17 from chillenzer/unify_gauge_groups
...
Fix compilation error in nvcc (closes #15 )
2022-12-19 16:24:03 +00:00
Peter Boyle
d8c29f5fcf
Updated FFT test for PETSc
2022-12-18 12:05:00 -05:00
Julian Lenz
37b6b82869
Fix file extensions
2022-12-18 16:12:56 +00:00
Julian Lenz
92ad5b8f74
Compiler error fix: NVCC requires names for templ. par.
2022-12-18 15:50:19 +00:00
Peter Boyle
281f8101fe
Matt FFT test
2022-12-17 20:35:33 -05:00
Peter Boyle
472ed2dd5c
Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet
2022-12-17 20:17:09 -05:00
Peter Boyle
4f85672674
Simpler test for PETSc
2022-12-17 20:16:11 -05:00
Peter Boyle
dc747c54be
Merge branch 'develop' into feature/dirichlet
...
Conflicts:
Grid/qcd/action/fermion/WilsonCompressor.h
Grid/stencil/Stencil.h
2022-12-13 08:24:58 -05:00
Peter Boyle
140684d706
Head to head vs HMC
2022-12-13 08:15:38 -05:00
Peter Boyle
5bb7ba92fa
Test for DDHMC force term
2022-12-13 08:15:11 -05:00
Peter Boyle
b54d0f3c73
Smaller deltaH down to 7000s on t=0.5 trajectory
2022-12-13 08:14:27 -05:00
Peter Boyle
ff6777a98d
Variable depth experiments
2022-12-13 08:13:51 -05:00
Peter Boyle
07acfe89f2
Merge pull request #417 from rrhodgson/feature/fermtoprop
...
Feature/fermtoprop
2022-12-06 12:45:03 -05:00
40234f531f
FermToProp accelerator_for -> thread_for
2022-12-06 17:34:51 +00:00
d49694f38f
PropToFerm fix
2022-12-06 15:48:54 +00:00
Alessandro Lupo
8c80f1c168
Merge pull request #14 from chillenzer/unify_gauge_groups
...
Unify gauge groups (closes #5 )
2022-12-01 17:35:46 +00:00
Chulwoo Jung
dc6a38f177
Minor cleanup
2022-11-30 17:13:12 -05:00
Chulwoo Jung
82c1ecf60f
Block lanczos added
2022-11-30 16:08:40 -05:00
Peter Boyle
67f569354e
Partial dirichlet changes
2022-11-30 15:51:13 -05:00
Peter Boyle
97a098636d
FermToProp
2022-11-30 15:36:35 -05:00
Peter Boyle
e13930c8b2
Faster fermtoprop case
2022-11-30 15:11:29 -05:00
Julian Lenz
0af7d5a793
Rename Grid/qcd/utils/<Group>_impl.h -> Grid/qcd/utils/<Group>.h
2022-11-30 17:12:00 +00:00
Julian Lenz
505fa49983
Renamed SUn.h -> GaugeGroup.h
2022-11-30 17:09:48 +00:00
Julian Lenz
7bcf33def9
Removed Sp2n.h
2022-11-30 16:59:46 +00:00
Julian Lenz
a13820656a
Removed iSUnMatrix, etc.
2022-11-30 15:09:03 +00:00
Julian Lenz
fa71b46a41
Hide nsp
2022-11-30 14:44:23 +00:00
Julian Lenz
b8b3ae6ac1
Make helper functions private
2022-11-30 13:29:14 +00:00
Julian Lenz
55c008da21
Removed forward declaration
2022-11-30 13:12:21 +00:00