1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-10-26 01:29:34 +00:00
Commit Graph

7286 Commits

Author SHA1 Message Date
Peter Boyle
eeb6e0a6e3 Renable cache blocking and efficient UPI type SHM comms 2023-03-14 09:10:27 -07:00
Peter Boyle
cad5b187dd Cleanup 2023-03-14 09:08:16 -07:00
Peter Boyle
87697eb07e SHared compile 2023-03-14 09:07:36 -07:00
a3e935c902 Batched block project/promote size checks 2023-02-27 11:38:16 +00:00
7731c7db8e Add huge cache type and allow Ncache==0 2023-02-26 14:15:28 +00:00
ff97340324 Expose cached bytes 2023-02-26 12:22:45 +00:00
Christopher Kelly
83d86943db Fixed compile bug in MemoryManagerShared caused by Audit function not being passed a string 2023-02-23 13:09:45 -05:00
Christopher Kelly
e82cf1d311 Further prec-change improvements
Mixed prec CG algorithm has been modified to precompute precision change workspaces

As the original Test_dwf_mixedcg_prec has been coopted to do a performance stability and reproducibility test, requiring the single-prec CG to be run 200 times, I have created a new version of Test_dwf_mixedcg_prec in the solver subdirectory that just does the mixed vs double CG test
2023-02-23 09:45:29 -05:00
Christopher Kelly
1db58a8acc Precision change improvements
Added a new, much faster implementation of precision change that uses (optionally) a precomputed workspace containing pointer offsets that is device resident, such that all lattice copying occurs only on the device and no host<->device transfer is required, other than the pointer table. It also avoids the need to unpack and repack the fields using explicit lane copying. When this new precisionChange is called without a workspace, one will be computed on-the-fly; however it is still considerably faster than the original implementation.

In the special case of using double2 and when the Grids are the same, calls to the new precisionChange will automatically use precisionChangeFast, such that there is a single API call for all precision changes.

Reliable update and mixed-prec multishift have been modified to precompute precision change workspaces

Renamed the original precisionChange as precisionChangeOrig

Fixed incorrect pointer offset bug in copyLane

Added a test and a benchmark for precisionChange

Added a test for reliable update CG
2023-02-21 10:52:42 -05:00
920a51438d Added batched Mixed precision CG 2023-02-14 17:04:13 +00:00
be528b6d27 Add batched block project/promote functions 2023-02-14 14:37:10 +00:00
Peter Boyle
ccd21f96ff Plaquette agreeing and moving to final form (slowly) need to optimise 2023-02-01 22:57:44 -05:00
Peter Boyle
4b90cb8888 First cut passes combining padded cell with general stencil towards fast plaquette and staggered force 2023-02-01 22:14:10 -05:00
Peter Boyle
796abfad80 Merge pull request #422 from fjosw/fix/NVCC_DIAG_PRAGMA_SUPPORT
Disable diagnostic pragma warnings for CUDA 12+
2023-01-17 09:34:49 -05:00
ad0270ac8c fix: diagnostic pragma warnings fixed for CUDA 12+ 2023-01-12 12:36:30 +00:00
Makis Kappas
7d62f1d6d2 Populate the Cshift_table in the GPU
Cshift is allocated in Unified memory and used
in the LambdaApply kernels but also populated
from the host. This creates a lot of Unified HtoD
and DtoH mem operations and has a negative effect
in performance. With this commit we populate the
Cshift table in the device with the
populate_Cshift_table() kernel.
2023-01-11 21:26:25 +00:00
Christoph Lehner
458c943987 merged upstream 2022-12-31 11:16:21 +02:00
Christoph Lehner
88015b0858 Split sum in rankSum and GlobalSum 2022-12-26 10:01:32 +01:00
Peter Boyle
4ca1bf7cca Added gauge invariance test 2022-12-21 07:23:16 -05:00
Meifeng Lin
26ad759469 bug fix in HOWTO 2022-12-20 10:33:24 -08:00
Meifeng Lin
ed723909a2 Update HOWTO with an example config-command 2022-12-20 10:20:14 -08:00
Meifeng Lin
36ffe79093 Add simple HOWTO instructions and module load script for Cori GPU 2022-12-20 10:10:27 -08:00
Meifeng Lin
1df8669898 change loop counts to local variables so clang compiler doesn't complain 2022-12-20 10:09:50 -08:00
Peter Boyle
2ff868f7a5 CPU open doesn't need to free space 2022-12-20 05:10:23 -05:00
Peter Boyle
ede02b6883 Memory manager debug Felix case 2022-12-20 05:10:23 -05:00
Peter Boyle
1822ced302 Bug fix 2022-12-20 05:10:23 -05:00
Peter Boyle
37ba32776f More logging 2022-12-20 05:10:23 -05:00
Peter Boyle
99b3697b03 More loggin 2022-12-20 05:10:23 -05:00
Peter Boyle
43a45ec97b SSC_START 2022-12-20 05:10:23 -05:00
Peter Boyle
b00a4142e5 A=A fix 2022-12-20 05:10:23 -05:00
Peter Boyle
3791bc527b Logging pulled in from dirichlet branch 2022-12-20 05:10:23 -05:00
Peter Boyle
d8c29f5fcf Updated FFT test for PETSc 2022-12-18 12:05:00 -05:00
Peter Boyle
281f8101fe Matt FFT test 2022-12-17 20:35:33 -05:00
Peter Boyle
472ed2dd5c Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet 2022-12-17 20:17:09 -05:00
Peter Boyle
4f85672674 Simpler test for PETSc 2022-12-17 20:16:11 -05:00
Meifeng Lin
f6661ce29b Merged openmp offload implementation with develop 2022-12-13 18:32:55 -05:00
Meifeng Lin
9b3ac3c23f Added stdout for number of GPU threads; 2022-12-13 15:14:01 -08:00
Meifeng Lin
c33a3b3b40 Fixed --accelerator-threads input to omp target thread_limit() 2022-12-13 15:13:11 -08:00
meifeng
40ee605591 Merge pull request #1 from paboyle/develop
Update to Grid's main repo
2022-12-13 09:12:57 -05:00
Peter Boyle
dc747c54be Merge branch 'develop' into feature/dirichlet
Conflicts:
	Grid/qcd/action/fermion/WilsonCompressor.h
	Grid/stencil/Stencil.h
2022-12-13 08:24:58 -05:00
Peter Boyle
140684d706 Head to head vs HMC 2022-12-13 08:15:38 -05:00
Peter Boyle
5bb7ba92fa Test for DDHMC force term 2022-12-13 08:15:11 -05:00
Peter Boyle
b54d0f3c73 Smaller deltaH down to 7000s on t=0.5 trajectory 2022-12-13 08:14:27 -05:00
Peter Boyle
ff6777a98d Variable depth experiments 2022-12-13 08:13:51 -05:00
Peter Boyle
07acfe89f2 Merge pull request #417 from rrhodgson/feature/fermtoprop
Feature/fermtoprop
2022-12-06 12:45:03 -05:00
40234f531f FermToProp accelerator_for -> thread_for 2022-12-06 17:34:51 +00:00
d49694f38f PropToFerm fix 2022-12-06 15:48:54 +00:00
Chulwoo Jung
dc6a38f177 Minor cleanup 2022-11-30 17:13:12 -05:00
Chulwoo Jung
82c1ecf60f Block lanczos added 2022-11-30 16:08:40 -05:00
Peter Boyle
67f569354e Partial dirichlet changes 2022-11-30 15:51:13 -05:00