1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-04-20 10:41:01 +01:00
Commit Graph

7186 Commits

Author SHA1 Message Date
Peter Boyle 68428fceab Integrator update 2023-03-21 15:58:49 -04:00
Peter Boyle 4135f2dcd1 Compressor 2023-03-21 15:41:41 -04:00
Peter Boyle c5bdf61215 AUdit fix 2023-03-21 15:38:39 -04:00
Peter Boyle 88e218e8ee Stencil updates 2023-03-21 15:37:58 -04:00
Peter Boyle 0f2b786436 Vector -> vector 2023-03-21 15:36:11 -04:00
Peter Boyle e1c326558a COmms improvements 2023-03-21 08:53:56 -07:00
Peter Boyle bae0f8ea99 Merge pull request #425 from rrhodgson/feature/CacheLogging
Huge Cache
2023-03-21 08:59:08 -04:00
Peter Boyle bbbcd36ae5 Merge pull request #426 from rrhodgson/feature/LCDeflation
Batched Local Coherence Tools
2023-03-21 08:58:40 -04:00
Peter Boyle 39c0815d9e WriteDiscard 2023-03-21 08:57:29 -04:00
Peter Boyle a997d24743 Remove nofma 2023-03-14 12:10:31 -07:00
Peter Boyle 861e5d7f4c SYCL version update. Why do they keep making incompatible changes 2023-03-14 12:10:02 -07:00
Peter Boyle 14cc142a14 Warning remove 2023-03-14 12:09:26 -07:00
Peter Boyle f36b87deb5 syscall fix 2023-03-14 12:09:00 -07:00
Peter Boyle eeb6e0a6e3 Renable cache blocking and efficient UPI type SHM comms 2023-03-14 09:10:27 -07:00
Peter Boyle cad5b187dd Cleanup 2023-03-14 09:08:16 -07:00
Peter Boyle 87697eb07e SHared compile 2023-03-14 09:07:36 -07:00
rhodgson a3e935c902 Batched block project/promote size checks 2023-02-27 11:38:16 +00:00
rhodgson 7731c7db8e Add huge cache type and allow Ncache==0 2023-02-26 14:15:28 +00:00
rhodgson ff97340324 Expose cached bytes 2023-02-26 12:22:45 +00:00
Christopher Kelly 83d86943db Fixed compile bug in MemoryManagerShared caused by Audit function not being passed a string 2023-02-23 13:09:45 -05:00
Christopher Kelly e82cf1d311 Further prec-change improvements
Mixed prec CG algorithm has been modified to precompute precision change workspaces

As the original Test_dwf_mixedcg_prec has been coopted to do a performance stability and reproducibility test, requiring the single-prec CG to be run 200 times, I have created a new version of Test_dwf_mixedcg_prec in the solver subdirectory that just does the mixed vs double CG test
2023-02-23 09:45:29 -05:00
Christopher Kelly 1db58a8acc Precision change improvements
Added a new, much faster implementation of precision change that uses (optionally) a precomputed workspace containing pointer offsets that is device resident, such that all lattice copying occurs only on the device and no host<->device transfer is required, other than the pointer table. It also avoids the need to unpack and repack the fields using explicit lane copying. When this new precisionChange is called without a workspace, one will be computed on-the-fly; however it is still considerably faster than the original implementation.

In the special case of using double2 and when the Grids are the same, calls to the new precisionChange will automatically use precisionChangeFast, such that there is a single API call for all precision changes.

Reliable update and mixed-prec multishift have been modified to precompute precision change workspaces

Renamed the original precisionChange as precisionChangeOrig

Fixed incorrect pointer offset bug in copyLane

Added a test and a benchmark for precisionChange

Added a test for reliable update CG
2023-02-21 10:52:42 -05:00
rhodgson 920a51438d Added batched Mixed precision CG 2023-02-14 17:04:13 +00:00
rhodgson be528b6d27 Add batched block project/promote functions 2023-02-14 14:37:10 +00:00
Peter Boyle 796abfad80 Merge pull request #422 from fjosw/fix/NVCC_DIAG_PRAGMA_SUPPORT
Disable diagnostic pragma warnings for CUDA 12+
2023-01-17 09:34:49 -05:00
fjosw ad0270ac8c fix: diagnostic pragma warnings fixed for CUDA 12+ 2023-01-12 12:36:30 +00:00
Makis Kappas 7d62f1d6d2 Populate the Cshift_table in the GPU
Cshift is allocated in Unified memory and used
in the LambdaApply kernels but also populated
from the host. This creates a lot of Unified HtoD
and DtoH mem operations and has a negative effect
in performance. With this commit we populate the
Cshift table in the device with the
populate_Cshift_table() kernel.
2023-01-11 21:26:25 +00:00
Christoph Lehner 458c943987 merged upstream 2022-12-31 11:16:21 +02:00
Christoph Lehner 88015b0858 Split sum in rankSum and GlobalSum 2022-12-26 10:01:32 +01:00
Peter Boyle 4ca1bf7cca Added gauge invariance test 2022-12-21 07:23:16 -05:00
Peter Boyle 2ff868f7a5 CPU open doesn't need to free space 2022-12-20 05:10:23 -05:00
Peter Boyle ede02b6883 Memory manager debug Felix case 2022-12-20 05:10:23 -05:00
Peter Boyle 1822ced302 Bug fix 2022-12-20 05:10:23 -05:00
Peter Boyle 37ba32776f More logging 2022-12-20 05:10:23 -05:00
Peter Boyle 99b3697b03 More loggin 2022-12-20 05:10:23 -05:00
Peter Boyle 43a45ec97b SSC_START 2022-12-20 05:10:23 -05:00
Peter Boyle b00a4142e5 A=A fix 2022-12-20 05:10:23 -05:00
Peter Boyle 3791bc527b Logging pulled in from dirichlet branch 2022-12-20 05:10:23 -05:00
Peter Boyle d8c29f5fcf Updated FFT test for PETSc 2022-12-18 12:05:00 -05:00
Peter Boyle 281f8101fe Matt FFT test 2022-12-17 20:35:33 -05:00
Peter Boyle 472ed2dd5c Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet 2022-12-17 20:17:09 -05:00
Peter Boyle 4f85672674 Simpler test for PETSc 2022-12-17 20:16:11 -05:00
Peter Boyle dc747c54be Merge branch 'develop' into feature/dirichlet
Conflicts:
	Grid/qcd/action/fermion/WilsonCompressor.h
	Grid/stencil/Stencil.h
2022-12-13 08:24:58 -05:00
Peter Boyle 140684d706 Head to head vs HMC 2022-12-13 08:15:38 -05:00
Peter Boyle 5bb7ba92fa Test for DDHMC force term 2022-12-13 08:15:11 -05:00
Peter Boyle b54d0f3c73 Smaller deltaH down to 7000s on t=0.5 trajectory 2022-12-13 08:14:27 -05:00
Peter Boyle ff6777a98d Variable depth experiments 2022-12-13 08:13:51 -05:00
Peter Boyle 07acfe89f2 Merge pull request #417 from rrhodgson/feature/fermtoprop
Feature/fermtoprop
2022-12-06 12:45:03 -05:00
rhodgson 40234f531f FermToProp accelerator_for -> thread_for 2022-12-06 17:34:51 +00:00
rhodgson d49694f38f PropToFerm fix 2022-12-06 15:48:54 +00:00