fionnoh 
							
						 
					 
					
						
						
							
						
						355d4b58be 
					 
					
						
						
							
							Merge branch 'feature/hadrons' of github.com:fionnoh/Grid into feature/hadrons  
						
						
						
						
					 
					
						2018-07-19 16:07:54 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						2c54a536f3 
					 
					
						
						
							
							Moved the meson field inner product to its own header file  
						
						
						
						
					 
					
						2018-07-19 15:56:52 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						d868a45120 
					 
					
						
						
							
							Cleaned up some stuff that was erroneously included in a previous "trash" commit. Leaving in the mySliceInnerProdct function for now as it speeds up mesonfield creation quite a lot for 24^3 tests  
						
						
						
						
					 
					
						2018-07-16 16:19:59 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						9deae8c962 
					 
					
						
						
							
							A2A meson field contraction code  
						
						
						
						
					 
					
						2018-07-16 14:18:45 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						b2b5137d28 
					 
					
						
						
							
							Finally starting to get decent performance on Volta  
						
						
						
						
					 
					
						2018-07-13 12:06:18 -04:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						db86cdd7bd 
					 
					
						
						
							
							Possible trash commit  
						
						
						
						
					 
					
						2018-07-10 13:30:45 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						ec9939c1ba 
					 
					
						
						
							
							Test for faster implementation of meson field inner loop  
						
						... 
						
						
						
						This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.
It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.
Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.
This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit. 
						
						
					 
					
						2018-07-10 12:38:51 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						2cc07450f4 
					 
					
						
						
							
							Fastest option for the dslash  
						
						
						
						
					 
					
						2018-07-05 09:57:55 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						c0e8bc9da9 
					 
					
						
						
							
							Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.  
						
						
						
						
					 
					
						2018-07-05 07:10:25 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						b1265ae867 
					 
					
						
						
							
							Prettify code  
						
						
						
						
					 
					
						2018-07-05 07:08:06 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						32bb85ea4c 
					 
					
						
						
							
							Standard extractLane is fast  
						
						
						
						
					 
					
						2018-07-05 07:07:30 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						ca0607b6ef 
					 
					
						
						
							
							Clearer kernel call meaning  
						
						
						
						
					 
					
						2018-07-05 07:06:15 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						19b527e83f 
					 
					
						
						
							
							Better extract merge for GPU. Let the SIMD header files define the pointer type for  
						
						... 
						
						
						
						access. GPU redirects through builtin float2, double2 for complex 
						
						
					 
					
						2018-07-05 07:05:13 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						4730d4692a 
					 
					
						
						
							
							Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks  
						
						
						
						
					 
					
						2018-07-05 07:03:33 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						1bb456c0c5 
					 
					
						
						
							
							Minor GPU vector width change  
						
						
						
						
					 
					
						2018-07-05 07:02:04 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						4b04ae3611 
					 
					
						
						
							
							Printing improvement  
						
						
						
						
					 
					
						2018-07-05 06:59:38 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						2f776d51c6 
					 
					
						
						
							
							Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,  
						
						... 
						
						
						
						but a bitof (known) work. 
						
						
					 
					
						2018-07-05 06:58:37 -04:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						f74617c124 
					 
					
						
						
							
							Added ZFIMPL to meson field module  
						
						
						
						
					 
					
						2018-07-03 14:04:53 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						8c6a3921ed 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/feature/hadrons' into feature/hadrons  
						
						
						
						
					 
					
						2018-07-03 11:35:14 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a8a15dd9d0 
					 
					
						
						
							
							Hadrons: code cleaning  
						
						
						
						
					 
					
						2018-07-02 17:52:39 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3ce68a751a 
					 
					
						
						
							
							Hadrons: stout smearing module  
						
						
						
						
					 
					
						2018-07-02 17:52:04 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						daa0977d01 
					 
					
						
						
							
							Included a print statement that indicates that the guess is being subtracted from the solve.  
						
						
						
						
					 
					
						2018-06-28 16:34:56 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						a2929f4384 
					 
					
						
						
							
							Removed A2A contraction module and replaced it with the beginnings of a meson field module  
						
						
						
						
					 
					
						2018-06-28 16:17:26 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						7fe3974c0a 
					 
					
						
						
							
							Included eigenPacks and action as references, not inputs, of A2A module. They now now longer need to be parameters in the meson field modules.  
						
						
						
						
					 
					
						2018-06-28 16:14:49 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						f7e86f81a0 
					 
					
						
						
							
							Changes A2A class to make use of the new Solver class  
						
						
						
						
					 
					
						2018-06-28 16:14:16 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						fecec803d9 
					 
					
						
						
							
							Merge branch 'feature/hadrons' of  https://github.com/paboyle/Grid  into feature/hadrons  
						
						
						
						
					 
					
						2018-06-28 16:13:43 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						8fe9a13cdd 
					 
					
						
						
							
							Merge branch 'feature/hadrons' of  https://github.com/paboyle/Grid  into feature/hadrons  
						
						
						
						
					 
					
						2018-06-28 16:13:07 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3a50afe7e7 
					 
					
						
						
							
							GPU dslash updates  
						
						
						
						
					 
					
						2018-06-27 22:32:21 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						f8e880b445 
					 
					
						
						
							
							Loop for s and xyzt offlow  
						
						
						
						
					 
					
						2018-06-27 21:49:57 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3e947527cb 
					 
					
						
						
							
							Move looping over "s" and "site" into kernels for GPU optimisatoin  
						
						
						
						
					 
					
						2018-06-27 21:29:43 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						31f65beac8 
					 
					
						
						
							
							Move site and Ls looping into the kernels  
						
						
						
						
					 
					
						2018-06-27 21:28:48 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						38e2a32ac9 
					 
					
						
						
							
							Single SIMD lane operations for CUDA  
						
						
						
						
					 
					
						2018-06-27 21:28:06 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						efa84ca50a 
					 
					
						
						
							
							Keep Cuda 9.1 happy  
						
						
						
						
					 
					
						2018-06-27 21:27:32 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						5e96d6d04c 
					 
					
						
						
							
							Keep CUDA happy  
						
						
						
						
					 
					
						2018-06-27 21:27:11 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						df30bdc599 
					 
					
						
						
							
							CUDA happy  
						
						
						
						
					 
					
						2018-06-27 21:26:49 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						7f45222924 
					 
					
						
						
							
							Diagnostics on memory alloc fail  
						
						
						
						
					 
					
						2018-06-27 21:26:20 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						dd891f5e3b 
					 
					
						
						
							
							Use NVCC to suppress device Eigen  
						
						
						
						
					 
					
						2018-06-27 21:25:17 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d2c42e6f42 
					 
					
						
						
							
							Hadrons: scaled DWF action  
						
						
						
						
					 
					
						2018-06-26 14:59:33 +01:00 
						 
				 
			
				
					
						
							
							
								Daniel Richtmann 
							
						 
					 
					
						
						
							
						
						2881b3e8e5 
					 
					
						
						
							
							WilsonMG: Remove unnecessary static assertions  
						
						
						
						
					 
					
						2018-06-26 14:42:30 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						049cc518f4 
					 
					
						
						
							
							Hadrons: introduction message 2  
						
						
						
						
					 
					
						2018-06-25 19:08:39 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e1c66897f 
					 
					
						
						
							
							Hadrons: introduction message  
						
						
						
						
					 
					
						2018-06-25 19:08:22 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						adcef36189 
					 
					
						
						
							
							Hadrons: Möbius DWF action  
						
						
						
						
					 
					
						2018-06-25 15:58:35 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						2f121c41c9 
					 
					
						
						
							
							Commiting reation of meson field code before a merge with the upstream branch feature/hadrons  
						
						
						
						
					 
					
						2018-06-25 12:20:46 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e0ed7e300f 
					 
					
						
						
							
							Hadrons: spurious Dminus removed  
						
						
						
						
					 
					
						2018-06-22 16:33:43 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						485207901b 
					 
					
						
						
							
							Merge branch 'develop' into feature/hadrons  
						
						
						
						
					 
					
						2018-06-22 16:15:32 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c760f0a4c3 
					 
					
						
						
							
							Hadrons: remove make_5D/4D functions and FreeProp fix  
						
						
						
						
					 
					
						2018-06-22 16:12:46 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c84eeedec3 
					 
					
						
						
							
							Hadrons: GaugeProp module for z-Wilson actions  
						
						
						
						
					 
					
						2018-06-22 15:53:22 +02:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						1ac3526f33 
					 
					
						
						
							
							Small changes to the A2A header and module  
						
						
						
						
					 
					
						2018-06-22 12:29:42 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						0de090ee74 
					 
					
						
						
							
							Temporarily added in the contraction code that produced the working 2-pt function. This is commited for reference only and will be removed in the next push.  
						
						
						
						
					 
					
						2018-06-22 12:28:41 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						91405de3f7 
					 
					
						
						
							
							Hadrons: new solver exposing fermion matrix and generic source/solve import/export  
						
						
						
						
					 
					
						2018-06-22 12:14:37 +02:00