Peter Boyle
2fbcf13c46
SYCL fix
2023-03-27 14:25:14 -07:00
Peter Boyle
4ea48ef0c4
Merge pull request #419 from lehner/feature/gpt
...
Separate rankSum from sum
2023-03-24 15:42:16 -04:00
Peter Boyle
5c85774ee3
Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet
2023-03-24 15:40:57 -04:00
Peter Boyle
d8a9a745d8
stream synchronise
2023-03-24 15:40:30 -04:00
Peter Boyle
dcf172da3b
Merge pull request #415 from paboyle/feature/block_lanczos22
...
Feature/block lanczos22
2023-03-24 12:08:16 -04:00
Peter Boyle
d57ed25071
Merge branch 'feature/dirichlet' into feature/block_lanczos22
2023-03-24 12:08:09 -04:00
Peter Boyle
546be724e7
Merge pull request #421 from UniOfLeicester/feature/accel_Copy_plane
...
Populate the Cshift_table in the GPU
2023-03-24 12:04:06 -04:00
Peter Boyle
8a1b9073f9
Mshift update
2023-03-23 15:39:30 -04:00
Peter Boyle
1a7114d4b9
Temporary algorithm while sorting out mixed prec
2023-03-23 15:38:35 -04:00
Peter Boyle
3f385f717c
Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet
...
Conflicts:
systems/PVC/benchmarks/run-2tile-mpi.sh
systems/PVC/config-command
2023-03-23 14:52:53 -04:00
Peter Boyle
481bbaf1fc
Interface to query memory use
2023-03-23 12:55:31 -04:00
Peter Boyle
281488611a
WriteDiscard on construct
2023-03-23 10:28:50 -04:00
Peter Boyle
c180a52518
Merge branch 'feature/dirichlet' of https://www.github.com/paboyle/Grid into feature/dirichlet
2023-03-23 10:28:01 -04:00
Peter Boyle
90130e25e9
TODO list
2023-03-23 10:27:02 -04:00
Peter Boyle
23298acb81
Merge pull request #424 from giltirn/feature/dirichlet-precchange
...
Precision change implementation
2023-03-22 23:04:52 -04:00
Peter Boyle
52384e34cf
Discard on construct
2023-03-22 19:40:32 -04:00
Peter Boyle
d0bb033ea2
Device resident GPU block buffer instead of UVM as hit likely UVM
...
bug. Code worked on CUDA 11.4 but fails on later drivers (certainly 530.30.02, but need to
find the perlmutter driver version).
2023-03-22 19:07:32 -04:00
Peter Boyle
c6621806ca
Compiling on laptop and running
2023-03-21 17:27:09 -04:00
Peter Boyle
0b6f0f6d2f
Merge branch 'feature/dirichlet' of https://www.github.com/paboyle/Grid into feature/dirichlet
2023-03-21 16:06:55 -04:00
Peter Boyle
b5b759df73
Merge branch 'develop' into feature/dirichlet
2023-03-21 16:05:46 -04:00
Peter Boyle
7db8dd7a95
Merge branch 'feature/dirichlet' of https://github.com/paboyle/Grid into feature/dirichlet
2023-03-21 16:04:27 -04:00
Peter Boyle
8b43be39c0
Config command
2023-03-21 16:00:52 -04:00
Peter Boyle
f17f879206
Test update
2023-03-21 15:59:29 -04:00
Peter Boyle
68428fceab
Integrator update
2023-03-21 15:58:49 -04:00
Peter Boyle
4135f2dcd1
Compressor
2023-03-21 15:41:41 -04:00
Peter Boyle
c5bdf61215
AUdit fix
2023-03-21 15:38:39 -04:00
Peter Boyle
88e218e8ee
Stencil updates
2023-03-21 15:37:58 -04:00
Peter Boyle
0f2b786436
Vector -> vector
2023-03-21 15:36:11 -04:00
Peter Boyle
e1c326558a
COmms improvements
2023-03-21 08:53:56 -07:00
Peter Boyle
bae0f8ea99
Merge pull request #425 from rrhodgson/feature/CacheLogging
...
Huge Cache
2023-03-21 08:59:08 -04:00
Peter Boyle
bbbcd36ae5
Merge pull request #426 from rrhodgson/feature/LCDeflation
...
Batched Local Coherence Tools
2023-03-21 08:58:40 -04:00
Peter Boyle
39c0815d9e
WriteDiscard
2023-03-21 08:57:29 -04:00
Peter Boyle
a997d24743
Remove nofma
2023-03-14 12:10:31 -07:00
Peter Boyle
861e5d7f4c
SYCL version update. Why do they keep making incompatible changes
2023-03-14 12:10:02 -07:00
Peter Boyle
14cc142a14
Warning remove
2023-03-14 12:09:26 -07:00
Peter Boyle
f36b87deb5
syscall fix
2023-03-14 12:09:00 -07:00
Peter Boyle
eeb6e0a6e3
Renable cache blocking and efficient UPI type SHM comms
2023-03-14 09:10:27 -07:00
Peter Boyle
cad5b187dd
Cleanup
2023-03-14 09:08:16 -07:00
Peter Boyle
87697eb07e
SHared compile
2023-03-14 09:07:36 -07:00
a3e935c902
Batched block project/promote size checks
2023-02-27 11:38:16 +00:00
7731c7db8e
Add huge cache type and allow Ncache==0
2023-02-26 14:15:28 +00:00
ff97340324
Expose cached bytes
2023-02-26 12:22:45 +00:00
Christopher Kelly
83d86943db
Fixed compile bug in MemoryManagerShared caused by Audit function not being passed a string
2023-02-23 13:09:45 -05:00
Christopher Kelly
e82cf1d311
Further prec-change improvements
...
Mixed prec CG algorithm has been modified to precompute precision change workspaces
As the original Test_dwf_mixedcg_prec has been coopted to do a performance stability and reproducibility test, requiring the single-prec CG to be run 200 times, I have created a new version of Test_dwf_mixedcg_prec in the solver subdirectory that just does the mixed vs double CG test
2023-02-23 09:45:29 -05:00
Christopher Kelly
1db58a8acc
Precision change improvements
...
Added a new, much faster implementation of precision change that uses (optionally) a precomputed workspace containing pointer offsets that is device resident, such that all lattice copying occurs only on the device and no host<->device transfer is required, other than the pointer table. It also avoids the need to unpack and repack the fields using explicit lane copying. When this new precisionChange is called without a workspace, one will be computed on-the-fly; however it is still considerably faster than the original implementation.
In the special case of using double2 and when the Grids are the same, calls to the new precisionChange will automatically use precisionChangeFast, such that there is a single API call for all precision changes.
Reliable update and mixed-prec multishift have been modified to precompute precision change workspaces
Renamed the original precisionChange as precisionChangeOrig
Fixed incorrect pointer offset bug in copyLane
Added a test and a benchmark for precisionChange
Added a test for reliable update CG
2023-02-21 10:52:42 -05:00
920a51438d
Added batched Mixed precision CG
2023-02-14 17:04:13 +00:00
be528b6d27
Add batched block project/promote functions
2023-02-14 14:37:10 +00:00
Peter Boyle
ccd21f96ff
Plaquette agreeing and moving to final form (slowly) need to optimise
2023-02-01 22:57:44 -05:00
Peter Boyle
4b90cb8888
First cut passes combining padded cell with general stencil towards fast plaquette and staggered force
2023-02-01 22:14:10 -05:00
Peter Boyle
796abfad80
Merge pull request #422 from fjosw/fix/NVCC_DIAG_PRAGMA_SUPPORT
...
Disable diagnostic pragma warnings for CUDA 12+
2023-01-17 09:34:49 -05:00