1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-04-19 02:01:02 +01:00
Commit Graph

7577 Commits

Author SHA1 Message Date
Peter Boyle b754a152c6 Flag guard correctly 2024-01-17 16:25:28 -05:00
Peter Boyle e07cb2b9de Accelerator memory 2024-01-17 16:24:31 -05:00
Peter Boyle a1f8bbb078 accelerator memory print 2024-01-17 16:24:09 -05:00
Peter Boyle 7909683f3b MultiRHS 2024-01-17 16:21:07 -05:00
Peter Boyle 25f71913b7 MultiRHS coarse 2024-01-04 12:01:17 -05:00
Peter Boyle 34ddd2b7b1 MultiRHS coarse space 2024-01-04 12:00:53 -05:00
Peter Boyle d5fd90b2f3 Add 48^3 rtest 2024-01-04 12:00:01 -05:00
Peter Boyle b7c7000d0d Don't need the numerical rounding tolerance in multigrid 2023-12-22 18:10:23 -05:00
Peter Boyle 551f6c4edd Synchronise changes 2023-12-22 18:09:11 -05:00
Peter Boyle defd814750 Speed up the coarsened matrix matrix evaluation.
It is block project limited.
Could be sped up with calls to Batched GEMM and a data layout change.
2023-12-22 18:07:03 -05:00
Peter Boyle 3d517bbd2a Synchronise decouple from the launch
Speeds up multileg stencils
2023-12-22 18:06:13 -05:00
Peter Boyle 78ab955fec Better padded cell exchange 2023-12-22 18:05:41 -05:00
Peter Boyle dd13937bb6 Better opt face gather scatter 2023-12-22 18:03:38 -05:00
Peter Boyle 66a1b63aa9 Faster grid/blas layout change.
Halo exchange is now the only slow part.
Revisit
2023-12-21 20:50:18 -05:00
Peter Boyle 22c611bd1a Delete temp file 2023-12-21 18:32:31 -05:00
Peter Boyle c9bb1bf8ea Passing new BLAs based 2023-12-21 18:31:17 -05:00
Peter Boyle 9e489887cf General coarse multiRHS move to BLAS implementation 2023-12-21 15:24:48 -05:00
Peter Boyle 9feb801bb9 Much simpler GPU implementation 2023-12-21 15:24:06 -05:00
Peter Boyle c00b495933 Multigrid 2023-12-21 15:23:31 -05:00
Peter Boyle d22eebe553 BLas options 2023-12-21 15:23:03 -05:00
Peter Boyle 8bcbd82680 BLAS based layout and implementation 2023-12-21 15:21:24 -05:00
Peter Boyle dfa617c439 Batched SGEMM/DGEMM/ZGEMM/CGEMM
Hip, Cuda version and vanilla CPU
One MKL stub in comments, to be tested as different.
2023-12-21 14:01:18 -05:00
Peter Boyle 48d1f0df89 Optimised partially, working 2023-12-21 12:33:47 -05:00
Peter Boyle b75cb7a12c Blas batched partial implementation on Frontier only for now 2023-12-21 12:31:33 -05:00
Peter Boyle 332563e037 Debugged, reducing verbose 2023-12-21 12:30:57 -05:00
Peter Boyle 0cce97a4fe verbosity only 2023-12-20 21:30:10 -05:00
Peter Boyle 95a8e4be64 rocblas 2023-12-20 21:27:59 -05:00
Peter Boyle abcd6b8cb6 Faster version 2023-12-19 15:17:46 -05:00
Peter Boyle e8f21c9b6d Memmory verbose control improvement 2023-12-19 15:16:58 -05:00
Peter Boyle e054078b11 Verbose 2023-12-05 16:15:17 -05:00
Peter Boyle 6835a7f208 Better logging, test on 81 point stencil 2023-11-29 19:20:47 -05:00
Peter Boyle f59993b979 Nbasis§ 2023-11-29 09:47:36 -05:00
Peter Boyle 2290b8f680 Verbose 2023-11-29 09:47:04 -05:00
Peter Boyle 2c54be651c Further updates 2023-11-29 09:43:29 -05:00
Peter Boyle e859a199df Reduce volume to interior for coarse stencil -- worth up to 4x gain 2023-11-28 10:23:16 -05:00
Peter Boyle 0a3682ad0b MultiRHS work 2023-11-28 07:43:37 -05:00
Peter Boyle 59abaeb5cd Time stamp 2023-11-24 12:56:45 -05:00
Peter Boyle 3e448435d3 Restrict to interior 2023-11-23 18:23:29 -05:00
Peter Boyle a294bc3c5b Relax constraints for multiRHS 2023-11-23 18:20:42 -05:00
Peter Boyle b302ad3d49 multiRHS test in place, passes Yay! 2023-11-23 18:20:15 -05:00
Peter Boyle 82fc4b1e94 Finalise 2023-11-23 18:19:41 -05:00
Peter Boyle b4f1740380 Finalise message 2023-11-23 18:19:16 -05:00
Peter Boyle 031f85247c multRHS initial support -- needs optimisation for multi project/promote.
Bug fix in freeing intermediate grids to stop double free
2023-11-23 18:18:35 -05:00
Peter Boyle 639cc6f73a better support for multiRHS coarse space
Still to add restriction of domain of last loop to interior of padded cell (expect about 4.5x on test volume on Crusher)
2023-11-23 18:16:26 -05:00
Peter Boyle 09946cf1ba Improved, works on 48^3 moving to multiRHS optimisations 2023-11-15 18:03:05 -05:00
Peter Boyle f4fa95e7cb Use 5.3.0 2023-11-15 18:01:38 -05:00
Peter Boyle 100e29e35e Allow expression as argument to norm2 2023-11-15 18:00:44 -05:00
Peter Boyle 4cbe471a83 devVector 2023-11-15 18:00:07 -05:00
Peter Boyle 8bece1f861 Faster to transpose the matrix and apply with column major order 2023-11-15 17:58:38 -05:00
Peter Boyle a3ca71ec01 Lots more setup options, still working on them 2023-11-15 17:58:04 -05:00