| 
							
							
								 dbollweg | 3c9012676a | CUDA cub refuses to reduce vSpinColourMatrix, breaking up into smaller parts like already done for HIP case. | 2024-02-27 12:41:45 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | ee3b3c4c56 | relocate deflation support | 2024-02-27 11:52:23 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 462d706a63 | Move to a blas directory | 2024-02-27 11:51:04 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | ee0d460c8e | Blas based block project & deflate for multiRHS | 2024-02-27 11:41:44 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | cd15abe9d1 | Mrhs prep | 2024-02-27 11:41:13 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 9f40467e24 | Warning squash | 2024-02-27 11:40:36 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | d0b6593823 | More verbose on checksum | 2024-02-27 11:40:14 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 79fc821d8d | reorg headers | 2024-02-27 11:39:37 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | d7fdb9a7e6 | Reorg headers | 2024-02-27 11:39:06 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | b74de51c18 | Reorder headers | 2024-02-27 11:38:52 -05:00 |  | 
			
				
					| 
							
							
								 Dennis Bollweg | b507fe209c | Added SpinColourMatrix case to sliceSum Test | 2024-02-27 11:28:32 -05:00 |  | 
			
				
					| 
							
							
								 Dennis Bollweg | 6cd2d8fcd5 | Replace cuda/hip memcpy with Grid functions | 2024-02-26 09:55:07 -05:00 |  | 
			
				
					| 
							
							
								 david clarke | b02d022993 | fixed race condition (thx michael) | 2024-02-23 17:14:28 -07:00 |  | 
			
				
					| 
							
							
								 david clarke | 94581e3c7a | accelerator_for is broken | 2024-02-23 15:58:33 -07:00 |  | 
			
				
					| 
							
							
								 david clarke | 88b52cc045 | Merge branch 'develop' into hisq_fat_links | 2024-02-23 14:47:15 -07:00 |  | 
			
				
					| 
							
							
								 dbollweg | 0a816b5509 | Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu | 2024-02-22 21:43:06 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | 1c8b807c2e | free malloc'd memory | 2024-02-22 21:42:44 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 44b466e072 | Make InsertSliceFast the default at some point in future. Should I do this now? | 2024-02-21 14:51:24 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 5e5b471bb2 | Put/Get and DEviceToDevice | 2024-02-21 14:47:06 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 9c2565f64e | Working and faster version | 2024-02-21 14:46:43 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | e1d0a7cec3 | Batched blas | 2024-02-21 14:38:20 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | b19ae8f465 | Nbasis method for convenience | 2024-02-21 14:36:19 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | cdff2c8e18 | Updated mrhs adef | 2024-02-21 14:27:19 -05:00 |  | 
			
				
					| 
							
							
								 Christoph Lehner | 66391f84f2 | Merge branch 'feature/gpt' of ../Grid into develop | 2024-02-21 19:05:00 +01:00 |  | 
			
				
					|  | 97f7a9ecb3 | fix HMC for non-fundamental representations | 2024-02-21 08:27:55 +00:00 |  | 
			
				
					| 
							
							
								 Dennis Bollweg | 15878f7613 | sliceSumReduction_cub_large now also faster than CPU on Frontier | 2024-02-16 13:55:21 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | e0d5e3c6c7 | Merge branch 'paboyle:develop' into feature/sliceSum_gpu | 2024-02-16 13:16:37 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | 6f3455900e | Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs | 2024-02-16 13:15:02 -05:00 |  | 
			
				
					| 
							
							
								 david clarke | 56827d6ad6 | accelerator_inline bug | 2024-02-14 13:56:57 -07:00 |  | 
			
				
					|  | 73c0b29535 | Merge branch 'develop' of https://github.com/paboyle/Grid into develop | 2024-02-13 20:19:32 +00:00 |  | 
			
				
					|  | 303b83cdb8 | Scaling benchmarks, verbosity and MPICH aware in acceleratorInit() For some reason Dirichlet benchmark fails on several nodes; need to
debug this. | 2024-02-13 19:48:03 +00:00 |  | 
			
				
					|  | 5ef4da3f29 | Silence verbose | 2024-02-13 19:47:36 +00:00 |  | 
			
				
					|  | 1502860004 | Benchmark scripts | 2024-02-13 19:47:02 +00:00 |  | 
			
				
					|  | 585efc6f3f | More benchmark scripts | 2024-02-13 19:40:49 +00:00 |  | 
			
				
					|  | 62055e04dd | missing semicolon generates error with some compilers | 2024-02-13 18:18:27 +01:00 |  | 
			
				
					|  | e4a641b64e | removing old Eigen tensor patch | 2024-02-13 10:37:14 +01:00 |  | 
			
				
					|  | 8849f187f1 | updating Eigen to 3.4.0 | 2024-02-13 10:30:22 +01:00 |  | 
			
				
					| 
							
							
								 david clarke | db420525b3 | fix Simd::Nsimd typo | 2024-02-12 15:03:53 -07:00 |  | 
			
				
					| 
							
							
								 dbollweg | b5659d106e | more test cases | 2024-02-09 13:37:14 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | 4b43307402 | Undo include path changes for level zero api header | 2024-02-09 13:07:56 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | 09af8c25a2 | Merge branch 'paboyle:develop' into feature/sliceSum_gpu | 2024-02-09 13:02:59 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | 9514035b87 | refactor slicesum: slicesum uses GPU version by default now | 2024-02-09 13:02:28 -05:00 |  | 
			
				
					| 
							
							
								 david clarke | 2da09ae99b | acceleration compiles and doesn't break scalar mode | 2024-02-06 18:40:13 -07:00 |  | 
			
				
					| 
							
							
								 david clarke | a38fb0e04a | first effort toward accelerators | 2024-02-06 18:24:55 -07:00 |  | 
			
				
					|  | 7019916294 | RNG seed change safer for large volumes; this is a long term solution | 2024-02-07 00:56:39 +00:00 |  | 
			
				
					| 
							
							
								 dbollweg | 1514b4f137 | slicesum_sycl passes test | 2024-02-06 19:08:44 -05:00 |  | 
			
				
					|  | 91cf5ee312 | Updated bench script | 2024-02-06 23:45:10 +00:00 |  | 
			
				
					| 
							
							
								 david clarke | 0a6e2f42c5 | small amount of cleanup | 2024-02-06 16:32:07 -07:00 |  | 
			
				
					| 
							
							
								 dbollweg | ab2de131bd | work towards sliceSum for sycl backend | 2024-02-06 13:24:45 -05:00 |  | 
			
				
					|  | 5bfa88be85 | Aurora MPI standalone benchmake and options that work well | 2024-02-06 16:28:40 +00:00 |  |