| 
							
							
								 Peter Boyle | 7909683f3b | MultiRHS | 2024-01-17 16:21:07 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 25f71913b7 | MultiRHS coarse | 2024-01-04 12:01:17 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 34ddd2b7b1 | MultiRHS coarse space | 2024-01-04 12:00:53 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | d5fd90b2f3 | Add 48^3 rtest | 2024-01-04 12:00:01 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | b7c7000d0d | Don't need the numerical rounding tolerance in multigrid | 2023-12-22 18:10:23 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 551f6c4edd | Synchronise changes | 2023-12-22 18:09:11 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | defd814750 | Speed up the coarsened matrix matrix evaluation. It is block project limited.
Could be sped up with calls to Batched GEMM and a data layout change. | 2023-12-22 18:07:03 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 3d517bbd2a | Synchronise decouple from the launch Speeds up multileg stencils | 2023-12-22 18:06:13 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 78ab955fec | Better padded cell exchange | 2023-12-22 18:05:41 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | dd13937bb6 | Better opt face gather scatter | 2023-12-22 18:03:38 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 66a1b63aa9 | Faster grid/blas layout change. Halo exchange is now the only slow part.
Revisit | 2023-12-21 20:50:18 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 22c611bd1a | Delete temp file | 2023-12-21 18:32:31 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | c9bb1bf8ea | Passing new BLAs based | 2023-12-21 18:31:17 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 9e489887cf | General coarse multiRHS move to BLAS implementation | 2023-12-21 15:24:48 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 9feb801bb9 | Much simpler GPU implementation | 2023-12-21 15:24:06 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | c00b495933 | Multigrid | 2023-12-21 15:23:31 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | d22eebe553 | BLas options | 2023-12-21 15:23:03 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 8bcbd82680 | BLAS based layout and implementation | 2023-12-21 15:21:24 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | dfa617c439 | Batched SGEMM/DGEMM/ZGEMM/CGEMM Hip, Cuda version and vanilla CPU
One MKL stub in comments, to be tested as different. | 2023-12-21 14:01:18 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 48d1f0df89 | Optimised partially, working | 2023-12-21 12:33:47 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | b75cb7a12c | Blas batched partial implementation on Frontier only for now | 2023-12-21 12:31:33 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 332563e037 | Debugged, reducing verbose | 2023-12-21 12:30:57 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 0cce97a4fe | verbosity only | 2023-12-20 21:30:10 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 95a8e4be64 | rocblas | 2023-12-20 21:27:59 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | abcd6b8cb6 | Faster version | 2023-12-19 15:17:46 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | e8f21c9b6d | Memmory verbose control improvement | 2023-12-19 15:16:58 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | e054078b11 | Verbose | 2023-12-05 16:15:17 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 6835a7f208 | Better logging, test on 81 point stencil | 2023-11-29 19:20:47 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | f59993b979 | Nbasis§ | 2023-11-29 09:47:36 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 2290b8f680 | Verbose | 2023-11-29 09:47:04 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 2c54be651c | Further updates | 2023-11-29 09:43:29 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | e859a199df | Reduce volume to interior for coarse stencil -- worth up to 4x gain | 2023-11-28 10:23:16 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 0a3682ad0b | MultiRHS work | 2023-11-28 07:43:37 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 59abaeb5cd | Time stamp | 2023-11-24 12:56:45 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 3e448435d3 | Restrict to interior | 2023-11-23 18:23:29 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | a294bc3c5b | Relax constraints for multiRHS | 2023-11-23 18:20:42 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | b302ad3d49 | multiRHS test in place, passes Yay! | 2023-11-23 18:20:15 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 82fc4b1e94 | Finalise | 2023-11-23 18:19:41 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | b4f1740380 | Finalise message | 2023-11-23 18:19:16 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 031f85247c | multRHS initial support -- needs optimisation for multi project/promote. Bug fix in freeing intermediate grids to stop double free | 2023-11-23 18:18:35 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 639cc6f73a | better support for multiRHS coarse space Still to add restriction of domain of last loop to interior of padded cell (expect about 4.5x on test volume on Crusher) | 2023-11-23 18:16:26 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 09946cf1ba | Improved, works on 48^3 moving to multiRHS optimisations | 2023-11-15 18:03:05 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | f4fa95e7cb | Use 5.3.0 | 2023-11-15 18:01:38 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 100e29e35e | Allow expression as argument to norm2 | 2023-11-15 18:00:44 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 4cbe471a83 | devVector | 2023-11-15 18:00:07 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 8bece1f861 | Faster to transpose the matrix and apply with column major order | 2023-11-15 17:58:38 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | a3ca71ec01 | Lots more setup options, still working on them | 2023-11-15 17:58:04 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | e0543e8af5 | Implement flexible preconditioned CG | 2023-11-15 17:57:39 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | c1eb80d01a | Print which have converged | 2023-11-15 17:57:08 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | a26121d97b | Better printing | 2023-11-15 17:56:45 -05:00 |  |