Peter Boyle
							
						 
					 | 
					
						
						
							
						
						b2b5137d28
					 | 
					
						
						
							
							Finally starting to get decent performance on Volta
						
						
						
						
						
						
					 | 
					
						2018-07-13 12:06:18 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						2cc07450f4
					 | 
					
						
						
							
							Fastest option for the dslash
						
						
						
						
						
						
					 | 
					
						2018-07-05 09:57:55 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						c0e8bc9da9
					 | 
					
						
						
							
							Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.
						
						
						
						
						
						
					 | 
					
						2018-07-05 07:10:25 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						b1265ae867
					 | 
					
						
						
							
							Prettify code
						
						
						
						
						
						
					 | 
					
						2018-07-05 07:08:06 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						32bb85ea4c
					 | 
					
						
						
							
							Standard extractLane is fast
						
						
						
						
						
						
					 | 
					
						2018-07-05 07:07:30 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						ca0607b6ef
					 | 
					
						
						
							
							Clearer kernel call meaning
						
						
						
						
						
						
					 | 
					
						2018-07-05 07:06:15 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						19b527e83f
					 | 
					
						
						
							
							Better extract merge for GPU. Let the SIMD header files define the pointer type for
						
						
						
						
						
						
						
						access. GPU redirects through builtin float2, double2 for complex 
						
						
					 | 
					
						2018-07-05 07:05:13 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						4730d4692a
					 | 
					
						
						
							
							Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks
						
						
						
						
						
						
					 | 
					
						2018-07-05 07:03:33 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						1bb456c0c5
					 | 
					
						
						
							
							Minor GPU vector width changeÂ
						
						
						
						
						
						
					 | 
					
						2018-07-05 07:02:04 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						4b04ae3611
					 | 
					
						
						
							
							Printing improvement
						
						
						
						
						
						
					 | 
					
						2018-07-05 06:59:38 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						2f776d51c6
					 | 
					
						
						
							
							Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
						
						
						
						
						
						
						
						but a bitof (known) work. 
						
						
					 | 
					
						2018-07-05 06:58:37 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						3a50afe7e7
					 | 
					
						
						
							
							GPU dslash updates
						
						
						
						
						
						
					 | 
					
						2018-06-27 22:32:21 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						f8e880b445
					 | 
					
						
						
							
							Loop for s and xyzt offlow
						
						
						
						
						
						
					 | 
					
						2018-06-27 21:49:57 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						3e947527cb
					 | 
					
						
						
							
							Move looping over "s" and "site" into kernels for GPU optimisatoin
						
						
						
						
						
						
					 | 
					
						2018-06-27 21:29:43 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						31f65beac8
					 | 
					
						
						
							
							Move site and Ls looping into the kernels
						
						
						
						
						
						
					 | 
					
						2018-06-27 21:28:48 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						38e2a32ac9
					 | 
					
						
						
							
							Single SIMD lane operations for CUDA
						
						
						
						
						
						
					 | 
					
						2018-06-27 21:28:06 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						efa84ca50a
					 | 
					
						
						
							
							Keep Cuda 9.1 happy
						
						
						
						
						
						
					 | 
					
						2018-06-27 21:27:32 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						5e96d6d04c
					 | 
					
						
						
							
							Keep CUDA happy
						
						
						
						
						
						
					 | 
					
						2018-06-27 21:27:11 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						df30bdc599
					 | 
					
						
						
							
							CUDA happy
						
						
						
						
						
						
					 | 
					
						2018-06-27 21:26:49 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						7f45222924
					 | 
					
						
						
							
							Diagnostics on memory alloc fail
						
						
						
						
						
						
					 | 
					
						2018-06-27 21:26:20 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						dd891f5e3b
					 | 
					
						
						
							
							Use NVCC to suppress device Eigen
						
						
						
						
						
						
					 | 
					
						2018-06-27 21:25:17 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						6c97a6a071
					 | 
					
						
						
							
							Coalescing version of the kernel
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:52:29 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						73bb2d5128
					 | 
					
						
						
							
							Ugly hack to speed up compile on GPU; we don't use the hand kernels on GPU anyway so why compile
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:35:28 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						b710fec6ea
					 | 
					
						
						
							
							Gpu code first version of specialised kernel
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:34:39 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						b2a8cd60f5
					 | 
					
						
						
							
							Doubled gauge field is useful
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:27:47 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						867ee364ab
					 | 
					
						
						
							
							Explicit instantiation hooks
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:27:12 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						25becc9324
					 | 
					
						
						
							
							GPU tweaks for benchmarking; really necessary?
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:26:07 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						94d1ae4c82
					 | 
					
						
						
							
							Some prep work for GPU shared memory. Need to be careful, as will try GPU direct
						
						
						
						
						
						
						
						RDMA and inter-GPU memory sharing on SUmmit later 
						
						
					 | 
					
						2018-06-13 20:24:06 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						2075b177ef
					 | 
					
						
						
							
							CUDA_ARCH more carefule treatment
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:22:34 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						847c761ccc
					 | 
					
						
						
							
							Move sfw IEEE fp16 into central location
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:22:01 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						8287ed8383
					 | 
					
						
						
							
							New GPU vector targets
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:21:35 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						e6be7416f4
					 | 
					
						
						
							
							Use managed memory
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:14:00 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						26863b6d95
					 | 
					
						
						
							
							User Managed memory
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:13:42 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						ebd730bd54
					 | 
					
						
						
							
							Adding 2D loops
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:13:01 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						066be31a3b
					 | 
					
						
						
							
							Optional GPU target SIMD types; work in progress and trying experiments
						
						
						
						
						
						
					 | 
					
						2018-06-13 20:07:55 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								paboyle
							
						 
					 | 
					
						
						
							
						
						7a4c142955
					 | 
					
						
						
							
							Add GPU specific simd targets
						
						
						
						
						
						
					 | 
					
						2018-06-13 19:55:30 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						eb7d34a4cc
					 | 
					
						
						
							
							GPU version
						
						
						
						
						
						
					 | 
					
						2018-05-14 19:41:47 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						aab27a655a
					 | 
					
						
						
							
							Start of GPU kernels
						
						
						
						
						
						
					 | 
					
						2018-05-14 19:41:17 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						93280bae85
					 | 
					
						
						
							
							Gpu option
						
						
						
						
						
						
					 | 
					
						2018-05-14 19:40:58 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						c5f93abcd7
					 | 
					
						
						
							
							GPU clean up
						
						
						
						
						
						
					 | 
					
						2018-05-14 19:40:33 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						d5deef782d
					 | 
					
						
						
							
							Useful debug comments
						
						
						
						
						
						
					 | 
					
						2018-05-14 19:39:52 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						5f50473c0d
					 | 
					
						
						
							
							Clean up
						
						
						
						
						
						
					 | 
					
						2018-05-14 19:39:11 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						13f50406e3
					 | 
					
						
						
							
							Suppress print statement
						
						
						
						
						
						
					 | 
					
						2018-05-12 18:00:00 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						09cd46d337
					 | 
					
						
						
							
							Lane by Lane operation
						
						
						
						
						
						
					 | 
					
						2018-05-12 17:59:35 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						d3f51065c2
					 | 
					
						
						
							
							Give command line control of blocks/threads split
						
						
						
						
						
						
					 | 
					
						2018-05-12 17:58:56 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						925ac4173d
					 | 
					
						
						
							
							Thread count control for warp scheduler thingy doodaa thing
						
						
						
						
						
						
					 | 
					
						2018-05-12 17:58:22 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						eb921041d0
					 | 
					
						
						
							
							Perf count control
						
						
						
						
						
						
					 | 
					
						2018-05-12 17:57:32 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						87c5c0271b
					 | 
					
						
						
							
							Ficxing eigen
						
						
						
						
						
						
					 | 
					
						2018-04-16 19:08:07 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						a3f5a13591
					 | 
					
						
						
							
							Better Eigen handling
						
						
						
						
						
						
					 | 
					
						2018-04-16 18:02:55 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Peter Boyle
							
						 
					 | 
					
						
						
							
						
						9fe28f00eb
					 | 
					
						
						
							
							Eigen sim link off head revision
						
						
						
						
						
						
					 | 
					
						2018-04-16 17:54:46 -04:00 | 
					
					
						
						
						
							
							
							
							
							
							
						
					 |