1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-15 02:05:37 +00:00
Commit Graph

7944 Commits

Author SHA1 Message Date
Peter Boyle
e07cb2b9de Accelerator memory 2024-01-17 16:24:31 -05:00
Peter Boyle
a1f8bbb078 accelerator memory print 2024-01-17 16:24:09 -05:00
Peter Boyle
7909683f3b MultiRHS 2024-01-17 16:21:07 -05:00
Peter Boyle
25f71913b7 MultiRHS coarse 2024-01-04 12:01:17 -05:00
Peter Boyle
34ddd2b7b1 MultiRHS coarse space 2024-01-04 12:00:53 -05:00
Peter Boyle
d5fd90b2f3 Add 48^3 rtest 2024-01-04 12:00:01 -05:00
Peter Boyle
b7c7000d0d Don't need the numerical rounding tolerance in multigrid 2023-12-22 18:10:23 -05:00
Peter Boyle
551f6c4edd Synchronise changes 2023-12-22 18:09:11 -05:00
Peter Boyle
defd814750 Speed up the coarsened matrix matrix evaluation.
It is block project limited.
Could be sped up with calls to Batched GEMM and a data layout change.
2023-12-22 18:07:03 -05:00
Peter Boyle
3d517bbd2a Synchronise decouple from the launch
Speeds up multileg stencils
2023-12-22 18:06:13 -05:00
Peter Boyle
78ab955fec Better padded cell exchange 2023-12-22 18:05:41 -05:00
Peter Boyle
dd13937bb6 Better opt face gather scatter 2023-12-22 18:03:38 -05:00
Peter Boyle
66a1b63aa9 Faster grid/blas layout change.
Halo exchange is now the only slow part.
Revisit
2023-12-21 20:50:18 -05:00
Peter Boyle
22c611bd1a Delete temp file 2023-12-21 18:32:31 -05:00
Peter Boyle
c9bb1bf8ea Passing new BLAs based 2023-12-21 18:31:17 -05:00
2a0d75bac2 Aurora files 2023-12-21 23:20:17 +00:00
Peter Boyle
9e489887cf General coarse multiRHS move to BLAS implementation 2023-12-21 15:24:48 -05:00
Peter Boyle
9feb801bb9 Much simpler GPU implementation 2023-12-21 15:24:06 -05:00
Peter Boyle
c00b495933 Multigrid 2023-12-21 15:23:31 -05:00
Peter Boyle
d22eebe553 BLas options 2023-12-21 15:23:03 -05:00
Peter Boyle
8bcbd82680 BLAS based layout and implementation 2023-12-21 15:21:24 -05:00
Peter Boyle
dfa617c439 Batched SGEMM/DGEMM/ZGEMM/CGEMM
Hip, Cuda version and vanilla CPU
One MKL stub in comments, to be tested as different.
2023-12-21 14:01:18 -05:00
Peter Boyle
48d1f0df89 Optimised partially, working 2023-12-21 12:33:47 -05:00
Peter Boyle
b75cb7a12c Blas batched partial implementation on Frontier only for now 2023-12-21 12:31:33 -05:00
Peter Boyle
332563e037 Debugged, reducing verbose 2023-12-21 12:30:57 -05:00
Peter Boyle
0cce97a4fe verbosity only 2023-12-20 21:30:10 -05:00
Peter Boyle
95a8e4be64 rocblas 2023-12-20 21:27:59 -05:00
Peter Boyle
abcd6b8cb6 Faster version 2023-12-19 15:17:46 -05:00
Peter Boyle
e8f21c9b6d Memmory verbose control improvement 2023-12-19 15:16:58 -05:00
Peter Boyle
f48298ad4e Bug fix 2023-12-11 20:57:02 -05:00
root
645e47c1ba Config for Ampere Altra ARM 2023-12-08 16:17:56 -05:00
Peter Boyle
d1d9827263 Integrator logging update 2023-12-08 12:14:00 -05:00
Peter Boyle
e054078b11 Verbose 2023-12-05 16:15:17 -05:00
Peter Boyle
14643c0aab SDCC benchmarking scripts for A100 nodes and IceLake nodes (AVX512) 2023-12-04 15:45:57 -05:00
Peter Boyle
b77a9b8947 SDDC compiles starting 2023-11-30 14:31:51 -05:00
Peter Boyle
6835a7f208 Better logging, test on 81 point stencil 2023-11-29 19:20:47 -05:00
Peter Boyle
f59993b979 Nbasis§ 2023-11-29 09:47:36 -05:00
Peter Boyle
2290b8f680 Verbose 2023-11-29 09:47:04 -05:00
Peter Boyle
2c54be651c Further updates 2023-11-29 09:43:29 -05:00
Peter Boyle
e859a199df Reduce volume to interior for coarse stencil -- worth up to 4x gain 2023-11-28 10:23:16 -05:00
Peter Boyle
0a3682ad0b MultiRHS work 2023-11-28 07:43:37 -05:00
Peter Boyle
59abaeb5cd Time stamp 2023-11-24 12:56:45 -05:00
Peter Boyle
3e448435d3 Restrict to interior 2023-11-23 18:23:29 -05:00
Peter Boyle
a294bc3c5b Relax constraints for multiRHS 2023-11-23 18:20:42 -05:00
Peter Boyle
b302ad3d49 multiRHS test in place, passes Yay! 2023-11-23 18:20:15 -05:00
Peter Boyle
82fc4b1e94 Finalise 2023-11-23 18:19:41 -05:00
Peter Boyle
b4f1740380 Finalise message 2023-11-23 18:19:16 -05:00
Peter Boyle
031f85247c multRHS initial support -- needs optimisation for multi project/promote.
Bug fix in freeing intermediate grids to stop double free
2023-11-23 18:18:35 -05:00
Peter Boyle
639cc6f73a better support for multiRHS coarse space
Still to add restriction of domain of last loop to interior of padded cell (expect about 4.5x on test volume on Crusher)
2023-11-23 18:16:26 -05:00
Peter Boyle
09946cf1ba Improved, works on 48^3 moving to multiRHS optimisations 2023-11-15 18:03:05 -05:00