| 
							
							
								 dbollweg | 0a816b5509 | Merge branch 'feature/sliceSum_gpu' of https://github.com/dbollweg/Grid into feature/sliceSum_gpu | 2024-02-22 21:43:06 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | 1c8b807c2e | free malloc'd memory | 2024-02-22 21:42:44 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 44b466e072 | Make InsertSliceFast the default at some point in future. Should I do this now? | 2024-02-21 14:51:24 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 5e5b471bb2 | Put/Get and DEviceToDevice | 2024-02-21 14:47:06 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 9c2565f64e | Working and faster version | 2024-02-21 14:46:43 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | e1d0a7cec3 | Batched blas | 2024-02-21 14:38:20 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | b19ae8f465 | Nbasis method for convenience | 2024-02-21 14:36:19 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | cdff2c8e18 | Updated mrhs adef | 2024-02-21 14:27:19 -05:00 |  | 
			
				
					| 
							
							
								 Christoph Lehner | 66391f84f2 | Merge branch 'feature/gpt' of ../Grid into develop | 2024-02-21 19:05:00 +01:00 |  | 
			
				
					|  | 97f7a9ecb3 | fix HMC for non-fundamental representations | 2024-02-21 08:27:55 +00:00 |  | 
			
				
					| 
							
							
								 Dennis Bollweg | 15878f7613 | sliceSumReduction_cub_large now also faster than CPU on Frontier | 2024-02-16 13:55:21 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | e0d5e3c6c7 | Merge branch 'paboyle:develop' into feature/sliceSum_gpu | 2024-02-16 13:16:37 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | 6f3455900e | Adding sliceSumReduction_cub_small/large since hipcub cannot deal with arb. large vobjs | 2024-02-16 13:15:02 -05:00 |  | 
			
				
					| 
							
							
								 david clarke | 56827d6ad6 | accelerator_inline bug | 2024-02-14 13:56:57 -07:00 |  | 
			
				
					|  | 73c0b29535 | Merge branch 'develop' of https://github.com/paboyle/Grid into develop | 2024-02-13 20:19:32 +00:00 |  | 
			
				
					|  | 303b83cdb8 | Scaling benchmarks, verbosity and MPICH aware in acceleratorInit() For some reason Dirichlet benchmark fails on several nodes; need to
debug this. | 2024-02-13 19:48:03 +00:00 |  | 
			
				
					|  | 5ef4da3f29 | Silence verbose | 2024-02-13 19:47:36 +00:00 |  | 
			
				
					|  | 1502860004 | Benchmark scripts | 2024-02-13 19:47:02 +00:00 |  | 
			
				
					|  | 585efc6f3f | More benchmark scripts | 2024-02-13 19:40:49 +00:00 |  | 
			
				
					|  | 62055e04dd | missing semicolon generates error with some compilers | 2024-02-13 18:18:27 +01:00 |  | 
			
				
					|  | e4a641b64e | removing old Eigen tensor patch | 2024-02-13 10:37:14 +01:00 |  | 
			
				
					|  | 8849f187f1 | updating Eigen to 3.4.0 | 2024-02-13 10:30:22 +01:00 |  | 
			
				
					| 
							
							
								 david clarke | db420525b3 | fix Simd::Nsimd typo | 2024-02-12 15:03:53 -07:00 |  | 
			
				
					| 
							
							
								 dbollweg | b5659d106e | more test cases | 2024-02-09 13:37:14 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | 4b43307402 | Undo include path changes for level zero api header | 2024-02-09 13:07:56 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | 09af8c25a2 | Merge branch 'paboyle:develop' into feature/sliceSum_gpu | 2024-02-09 13:02:59 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | 9514035b87 | refactor slicesum: slicesum uses GPU version by default now | 2024-02-09 13:02:28 -05:00 |  | 
			
				
					| 
							
							
								 david clarke | 2da09ae99b | acceleration compiles and doesn't break scalar mode | 2024-02-06 18:40:13 -07:00 |  | 
			
				
					| 
							
							
								 david clarke | a38fb0e04a | first effort toward accelerators | 2024-02-06 18:24:55 -07:00 |  | 
			
				
					|  | 7019916294 | RNG seed change safer for large volumes; this is a long term solution | 2024-02-07 00:56:39 +00:00 |  | 
			
				
					| 
							
							
								 dbollweg | 1514b4f137 | slicesum_sycl passes test | 2024-02-06 19:08:44 -05:00 |  | 
			
				
					|  | 91cf5ee312 | Updated bench script | 2024-02-06 23:45:10 +00:00 |  | 
			
				
					| 
							
							
								 david clarke | 0a6e2f42c5 | small amount of cleanup | 2024-02-06 16:32:07 -07:00 |  | 
			
				
					| 
							
							
								 dbollweg | ab2de131bd | work towards sliceSum for sycl backend | 2024-02-06 13:24:45 -05:00 |  | 
			
				
					|  | 5bfa88be85 | Aurora MPI standalone benchmake and options that work well | 2024-02-06 16:28:40 +00:00 |  | 
			
				
					| 
							
							
								 Dennis Bollweg | 5af8da76d7 | Fix cuda compilation of Lattice_slicesum_gpu.h | 2024-02-01 18:02:30 -05:00 |  | 
			
				
					| 
							
							
								 Dennis Bollweg | b8b9dc952d | Async memcpy's and cleanup | 2024-02-01 17:55:35 -05:00 |  | 
			
				
					| 
							
							
								 Dennis Bollweg | 79a6ed32d8 | Use accelerator_for2d and DeviceSegmentedRecude to avoid kernel launch latencies | 2024-02-01 16:41:03 -05:00 |  | 
			
				
					| 
							
							
								 dbollweg | caa5f97723 | Add sliceSum gpu using cub/hipcub | 2024-01-31 16:50:06 -05:00 |  | 
			
				
					| 
							
							
								 david clarke | 4924b3209e | projectU3 yields a unitary matrix | 2024-01-23 14:43:58 -07:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | eb702f581b | Running on 12 rhs on 18 nodes of frontier | 2024-01-22 17:44:15 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 3d13fd56c5 | Precompute phases, save memory in hermitian | 2024-01-22 17:43:35 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 6f51b49ef8 | Use stderr | 2024-01-22 17:41:09 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | addc638856 | Fast localCopyRegion, blockProjectFast | 2024-01-22 17:40:38 -05:00 |  | 
			
				
					| 
							
							
								 david clarke | 00f24f8765 | already found some bugs in projection, still needs testing | 2024-01-22 05:50:16 -07:00 |  | 
			
				
					| 
							
							
								 david clarke | f5b3d582b0 | first attempt at U3 projection | 2024-01-22 02:49:40 -07:00 |  | 
			
				
					| 
							
							
								 david clarke | 981c93d67a | update Test_fatLinks to accept Naik | 2024-01-21 21:09:19 -07:00 |  | 
			
				
					| 
							
							
								 david clarke | c020b78e02 | Merge branch 'develop' into hisq_fat_links | 2024-01-21 20:21:08 -07:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 42ae36bc28 | WOrking | 2024-01-17 16:39:14 -05:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | c69f73ff9f | Working | 2024-01-17 16:38:46 -05:00 |  |