| 
							
							
								 Peter Boyle | 6d0f1aabb1 | Fix the multi-node path | 2018-09-09 14:27:37 +01:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 28db0631ff | Hack to force 128bit accesses | 2018-07-23 06:10:27 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | b2b5137d28 | Finally starting to get decent performance on Volta | 2018-07-13 12:06:18 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | c0e8bc9da9 | Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume. | 2018-07-05 07:10:25 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | b1265ae867 | Prettify code | 2018-07-05 07:08:06 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 32bb85ea4c | Standard extractLane is fast | 2018-07-05 07:07:30 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | ca0607b6ef | Clearer kernel call meaning | 2018-07-05 07:06:15 -04:00 |  | 
			
				
					| 
							
							
								 paboyle | 3a50afe7e7 | GPU dslash updates | 2018-06-27 22:32:21 +01:00 |  | 
			
				
					| 
							
							
								 paboyle | 3e947527cb | Move looping over "s" and "site" into kernels for GPU optimisatoin | 2018-06-27 21:29:43 +01:00 |  | 
			
				
					| 
							
							
								 paboyle | 31f65beac8 | Move site and Ls looping into the kernels | 2018-06-27 21:28:48 +01:00 |  | 
			
				
					| 
							
							
								 paboyle | 38e2a32ac9 | Single SIMD lane operations for CUDA | 2018-06-27 21:28:06 +01:00 |  | 
			
				
					| 
							
							
								 paboyle | 6c97a6a071 | Coalescing version of the kernel | 2018-06-13 20:52:29 +01:00 |  | 
			
				
					| 
							
							
								 paboyle | 73bb2d5128 | Ugly hack to speed up compile on GPU; we don't use the hand kernels on GPU anyway so why compile | 2018-06-13 20:35:28 +01:00 |  | 
			
				
					| 
							
							
								 paboyle | b710fec6ea | Gpu code first version of specialised kernel | 2018-06-13 20:34:39 +01:00 |  | 
			
				
					| 
							
							
								 paboyle | b2a8cd60f5 | Doubled gauge field is useful | 2018-06-13 20:27:47 +01:00 |  | 
			
				
					| 
							
							
								 paboyle | 867ee364ab | Explicit instantiation hooks | 2018-06-13 20:27:12 +01:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | eb7d34a4cc | GPU version | 2018-05-14 19:41:47 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | aab27a655a | Start of GPU kernels | 2018-05-14 19:41:17 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 13f50406e3 | Suppress print statement | 2018-05-12 18:00:00 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | b15db11c60 | Kernels -> pure static object to enable device execution | 2018-03-24 19:35:20 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | f6077f9d48 | Kernels -> not instantiaed otherwise object ref on GPU | 2018-03-24 19:33:44 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 572954ef12 | Kernels not an instantiated object, just static | 2018-03-24 19:33:13 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | cedeaae7db | Lebesge -> StencilView if necessary | 2018-03-24 19:32:41 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | e6cf0b1e17 | View typedefs go to OperatorImpl | 2018-03-24 19:32:11 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 1f70cedbab | Have to make all kernel called routines static since object reference will be a host pointer on GPU | 2018-03-24 19:29:26 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 4e1272fabf | Kernels need to be static to work on GPU. No reference to host resident data | 2018-03-22 18:44:53 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 607dc2d3c6 | Remove lebesgue order | 2018-03-22 18:23:09 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 23c880b009 | Remove lebesgue order; stick in stencil if need | 2018-03-22 18:13:41 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 334bb6792f | Lebesgue order removed. Stick in the stencil view | 2018-03-22 18:12:12 -04:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 8a1d303ab9 | GPU friendly stencil improvements | 2018-03-19 07:11:03 -04:00 |  | 
			
				
					| 
							
							
								 paboyle | 4d60b92b7f | Update oSites | 2018-03-08 21:00:25 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | c159c70c84 | View introduced | 2018-03-08 14:58:04 +00:00 |  | 
			
				
					| 
							
							
								 Peter Boyle | 4548523ecc | This modification eliminates what looks like a compiler bug on Intel 2017. | 2018-03-08 04:41:16 -08:00 |  | 
			
				
					| 
							
							
								 paboyle | 44188a5c6f | AVX512 fix | 2018-03-05 00:32:24 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | 3277bda130 | View introduction to prepare for accelerator offload. Probably same problem exists for stencil object | 2018-03-04 16:38:08 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | 078901278c | Coordinate handling gpu friendly | 2018-02-24 22:22:02 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | aa6de818e2 | Copy data needed by Kernels out of the grid object to avoid host reference | 2018-02-02 11:36:11 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | dcf6517a93 | Accelerator offload and copy Opt into the kernel for GPU  host var safety | 2018-02-02 11:35:35 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | a308dff410 | accelerator loop, copy Opt into the GPU | 2018-02-02 11:34:37 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | 14ba20898a | Accelerator loop the key kernel call | 2018-02-02 11:30:07 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | a53d3ee19a | Add Opt to the lambda capture to get it into the GPU | 2018-02-02 11:28:39 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | e4df025d01 | Accelerator related | 2018-02-01 23:20:05 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | cfeda9d536 | constexpr on const ints | 2018-02-01 22:59:12 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | 8ae77d3706 | Small simplification of FermionOperatorImpl towards GPU but not there yet | 2018-02-01 22:41:54 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | 70e276e1ab | parallel_for elimination -> thread_loop | 2018-01-28 01:01:14 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | 2d0bcc2606 | Zero changes, acceleartor on kernels and some thread loop changes | 2018-01-27 23:47:38 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | c4f82e072b | _grid becomes private ; use Grid()§ | 2018-01-27 00:04:12 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | 85771e97e9 | Hide internal data | 2018-01-26 23:04:46 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | 87ee592176 | Pragma changes and layout and warning elimination for nvcc | 2018-01-24 13:14:09 +00:00 |  | 
			
				
					| 
							
							
								 paboyle | e5535f4d72 | Namespace, indent | 2018-01-14 23:46:51 +00:00 |  |