1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-09 23:45:36 +00:00
Commit Graph

7585 Commits

Author SHA1 Message Date
Peter Boyle
3d13fd56c5 Precompute phases, save memory in hermitian 2024-01-22 17:43:35 -05:00
Peter Boyle
6f51b49ef8 Use stderr 2024-01-22 17:41:09 -05:00
Peter Boyle
addc638856 Fast localCopyRegion, blockProjectFast 2024-01-22 17:40:38 -05:00
Peter Boyle
42ae36bc28 WOrking 2024-01-17 16:39:14 -05:00
Peter Boyle
c69f73ff9f Working 2024-01-17 16:38:46 -05:00
Peter Boyle
ca5ae8a2e6 Revert to working. 2024-01-17 16:32:05 -05:00
Peter Boyle
d967eb53de Working for first time 2024-01-17 16:31:12 -05:00
Peter Boyle
839f9f1bbe Don't log memory by default 2024-01-17 16:25:50 -05:00
Peter Boyle
b754a152c6 Flag guard correctly 2024-01-17 16:25:28 -05:00
Peter Boyle
e07cb2b9de Accelerator memory 2024-01-17 16:24:31 -05:00
Peter Boyle
a1f8bbb078 accelerator memory print 2024-01-17 16:24:09 -05:00
Peter Boyle
7909683f3b MultiRHS 2024-01-17 16:21:07 -05:00
Peter Boyle
25f71913b7 MultiRHS coarse 2024-01-04 12:01:17 -05:00
Peter Boyle
34ddd2b7b1 MultiRHS coarse space 2024-01-04 12:00:53 -05:00
Peter Boyle
d5fd90b2f3 Add 48^3 rtest 2024-01-04 12:00:01 -05:00
Peter Boyle
b7c7000d0d Don't need the numerical rounding tolerance in multigrid 2023-12-22 18:10:23 -05:00
Peter Boyle
551f6c4edd Synchronise changes 2023-12-22 18:09:11 -05:00
Peter Boyle
defd814750 Speed up the coarsened matrix matrix evaluation.
It is block project limited.
Could be sped up with calls to Batched GEMM and a data layout change.
2023-12-22 18:07:03 -05:00
Peter Boyle
3d517bbd2a Synchronise decouple from the launch
Speeds up multileg stencils
2023-12-22 18:06:13 -05:00
Peter Boyle
78ab955fec Better padded cell exchange 2023-12-22 18:05:41 -05:00
Peter Boyle
dd13937bb6 Better opt face gather scatter 2023-12-22 18:03:38 -05:00
Peter Boyle
66a1b63aa9 Faster grid/blas layout change.
Halo exchange is now the only slow part.
Revisit
2023-12-21 20:50:18 -05:00
Peter Boyle
22c611bd1a Delete temp file 2023-12-21 18:32:31 -05:00
Peter Boyle
c9bb1bf8ea Passing new BLAs based 2023-12-21 18:31:17 -05:00
Peter Boyle
9e489887cf General coarse multiRHS move to BLAS implementation 2023-12-21 15:24:48 -05:00
Peter Boyle
9feb801bb9 Much simpler GPU implementation 2023-12-21 15:24:06 -05:00
Peter Boyle
c00b495933 Multigrid 2023-12-21 15:23:31 -05:00
Peter Boyle
d22eebe553 BLas options 2023-12-21 15:23:03 -05:00
Peter Boyle
8bcbd82680 BLAS based layout and implementation 2023-12-21 15:21:24 -05:00
Peter Boyle
dfa617c439 Batched SGEMM/DGEMM/ZGEMM/CGEMM
Hip, Cuda version and vanilla CPU
One MKL stub in comments, to be tested as different.
2023-12-21 14:01:18 -05:00
Peter Boyle
48d1f0df89 Optimised partially, working 2023-12-21 12:33:47 -05:00
Peter Boyle
b75cb7a12c Blas batched partial implementation on Frontier only for now 2023-12-21 12:31:33 -05:00
Peter Boyle
332563e037 Debugged, reducing verbose 2023-12-21 12:30:57 -05:00
Peter Boyle
0cce97a4fe verbosity only 2023-12-20 21:30:10 -05:00
Peter Boyle
95a8e4be64 rocblas 2023-12-20 21:27:59 -05:00
Peter Boyle
abcd6b8cb6 Faster version 2023-12-19 15:17:46 -05:00
Peter Boyle
e8f21c9b6d Memmory verbose control improvement 2023-12-19 15:16:58 -05:00
Peter Boyle
e054078b11 Verbose 2023-12-05 16:15:17 -05:00
Peter Boyle
6835a7f208 Better logging, test on 81 point stencil 2023-11-29 19:20:47 -05:00
Peter Boyle
f59993b979 Nbasis§ 2023-11-29 09:47:36 -05:00
Peter Boyle
2290b8f680 Verbose 2023-11-29 09:47:04 -05:00
Peter Boyle
2c54be651c Further updates 2023-11-29 09:43:29 -05:00
Peter Boyle
e859a199df Reduce volume to interior for coarse stencil -- worth up to 4x gain 2023-11-28 10:23:16 -05:00
Peter Boyle
0a3682ad0b MultiRHS work 2023-11-28 07:43:37 -05:00
Peter Boyle
59abaeb5cd Time stamp 2023-11-24 12:56:45 -05:00
Peter Boyle
3e448435d3 Restrict to interior 2023-11-23 18:23:29 -05:00
Peter Boyle
a294bc3c5b Relax constraints for multiRHS 2023-11-23 18:20:42 -05:00
Peter Boyle
b302ad3d49 multiRHS test in place, passes Yay! 2023-11-23 18:20:15 -05:00
Peter Boyle
82fc4b1e94 Finalise 2023-11-23 18:19:41 -05:00
Peter Boyle
b4f1740380 Finalise message 2023-11-23 18:19:16 -05:00