00f31ae83f 
					 
					
						
						
							
							Merge pull request  #163  from goracle/unstaged  
						
						... 
						
						
						
						Add printing of whether there are unstaged changes in the git hash print 
						
						
					 
					
						2018-07-25 19:00:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cce339deaf 
					 
					
						
						
							
							Merge pull request  #172  from fionnoh/feature/hadrons  
						
						... 
						
						
						
						feature/hadrons -> feature/hadrons-a2a 
						
						
					 
					
						2018-07-25 17:20:19 +00:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						24128ff109 
					 
					
						
						
							
							Changes needed for MF benchmark to work with comms correctly  
						
						
						
						
					 
					
						2018-07-23 15:51:37 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						da17a015c7 
					 
					
						
						
							
							Pack the stencil smaller for 128 bit access  
						
						
						
						
					 
					
						2018-07-23 06:12:45 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						1fd08c21ac 
					 
					
						
						
							
							make simd width configure time option for GPU  
						
						
						
						
					 
					
						2018-07-23 06:10:55 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						28db0631ff 
					 
					
						
						
							
							Hack to force 128bit accesses  
						
						
						
						
					 
					
						2018-07-23 06:10:27 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						b35401b86b 
					 
					
						
						
							
							Fix CUDA_ARCH. Need to simplify. See when new eigen release happens  
						
						
						
						
					 
					
						2018-07-23 06:09:33 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						a0714de8ec 
					 
					
						
						
							
							Define vector length for GPU  
						
						
						
						
					 
					
						2018-07-23 06:09:05 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						21a1710b43 
					 
					
						
						
							
							Verbose vector length  
						
						
						
						
					 
					
						2018-07-23 06:08:39 -04:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						34e9d3f0ca 
					 
					
						
						
							
							Moved the creation and resizing of the v and w high modes from the A2A class to the A2A module and made them an output of the module. This means that they have to be inputs of the contration modules and they will freed from memory if they are no longer needed.  
						
						
						
						
					 
					
						2018-07-22 14:40:31 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						c995788259 
					 
					
						
						
							
							Added ImportUnphysicalFermion and included appropriate logic for 5d w vectors in A2A code  
						
						
						
						
					 
					
						2018-07-21 00:08:11 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						94c7198001 
					 
					
						
						
							
							Added ZFIMPL to A2AMeson contraction  
						
						
						
						
					 
					
						2018-07-20 23:08:22 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						04d86fe9f3 
					 
					
						
						
							
							Removed overly verbose print statement  
						
						
						
						
					 
					
						2018-07-20 21:38:19 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						b78074b6a0 
					 
					
						
						
							
							Removed a Dminus from high mode v and removed duplication pf D_oo code  
						
						
						
						
					 
					
						2018-07-20 16:55:24 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						7dfd3cdae8 
					 
					
						
						
							
							Inclusion of ExportPhysicalFermionSource that fixes a bug in the low mode w vectors  
						
						
						
						
					 
					
						2018-07-20 15:45:43 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						cecee1ef2c 
					 
					
						
						
							
							Merge branch 'develop' of github.com:paboyle/Grid into feature/hadrons  
						
						
						
						
					 
					
						2018-07-20 13:37:50 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						355d4b58be 
					 
					
						
						
							
							Merge branch 'feature/hadrons' of github.com:fionnoh/Grid into feature/hadrons  
						
						
						
						
					 
					
						2018-07-19 16:07:54 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						2c54a536f3 
					 
					
						
						
							
							Moved the meson field inner product to its own header file  
						
						
						
						
					 
					
						2018-07-19 15:56:52 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						d868a45120 
					 
					
						
						
							
							Cleaned up some stuff that was erroneously included in a previous "trash" commit. Leaving in the mySliceInnerProdct function for now as it speeds up mesonfield creation quite a lot for 24^3 tests  
						
						
						
						
					 
					
						2018-07-16 16:19:59 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						9deae8c962 
					 
					
						
						
							
							A2A meson field contraction code  
						
						
						
						
					 
					
						2018-07-16 14:18:45 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						b2b5137d28 
					 
					
						
						
							
							Finally starting to get decent performance on Volta  
						
						
						
						
					 
					
						2018-07-13 12:06:18 -04:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						db86cdd7bd 
					 
					
						
						
							
							Possible trash commit  
						
						
						
						
					 
					
						2018-07-10 13:30:45 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						ec9939c1ba 
					 
					
						
						
							
							Test for faster implementation of meson field inner loop  
						
						... 
						
						
						
						This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.
It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.
Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.
This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit. 
						
						
					 
					
						2018-07-10 12:38:51 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						2cc07450f4 
					 
					
						
						
							
							Fastest option for the dslash  
						
						
						
						
					 
					
						2018-07-05 09:57:55 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						c0e8bc9da9 
					 
					
						
						
							
							Current version gets 250 - 320 GF/s on Volta on the target 12^4 volume.  
						
						
						
						
					 
					
						2018-07-05 07:10:25 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						b1265ae867 
					 
					
						
						
							
							Prettify code  
						
						
						
						
					 
					
						2018-07-05 07:08:06 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						32bb85ea4c 
					 
					
						
						
							
							Standard extractLane is fast  
						
						
						
						
					 
					
						2018-07-05 07:07:30 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						ca0607b6ef 
					 
					
						
						
							
							Clearer kernel call meaning  
						
						
						
						
					 
					
						2018-07-05 07:06:15 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						19b527e83f 
					 
					
						
						
							
							Better extract merge for GPU. Let the SIMD header files define the pointer type for  
						
						... 
						
						
						
						access. GPU redirects through builtin float2, double2 for complex 
						
						
					 
					
						2018-07-05 07:05:13 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						4730d4692a 
					 
					
						
						
							
							Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks  
						
						
						
						
					 
					
						2018-07-05 07:03:33 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						1bb456c0c5 
					 
					
						
						
							
							Minor GPU vector width change  
						
						
						
						
					 
					
						2018-07-05 07:02:04 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						4b04ae3611 
					 
					
						
						
							
							Printing improvement  
						
						
						
						
					 
					
						2018-07-05 06:59:38 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						2f776d51c6 
					 
					
						
						
							
							Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,  
						
						... 
						
						
						
						but a bitof (known) work. 
						
						
					 
					
						2018-07-05 06:58:37 -04:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						f74617c124 
					 
					
						
						
							
							Added ZFIMPL to meson field module  
						
						
						
						
					 
					
						2018-07-03 14:04:53 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						8c6a3921ed 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/feature/hadrons' into feature/hadrons  
						
						
						
						
					 
					
						2018-07-03 11:35:14 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a8a15dd9d0 
					 
					
						
						
							
							Hadrons: code cleaning  
						
						
						
						
					 
					
						2018-07-02 17:52:39 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3ce68a751a 
					 
					
						
						
							
							Hadrons: stout smearing module  
						
						
						
						
					 
					
						2018-07-02 17:52:04 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						daa0977d01 
					 
					
						
						
							
							Included a print statement that indicates that the guess is being subtracted from the solve.  
						
						
						
						
					 
					
						2018-06-28 16:34:56 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						a2929f4384 
					 
					
						
						
							
							Removed A2A contraction module and replaced it with the beginnings of a meson field module  
						
						
						
						
					 
					
						2018-06-28 16:17:26 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						7fe3974c0a 
					 
					
						
						
							
							Included eigenPacks and action as references, not inputs, of A2A module. They now now longer need to be parameters in the meson field modules.  
						
						
						
						
					 
					
						2018-06-28 16:14:49 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						f7e86f81a0 
					 
					
						
						
							
							Changes A2A class to make use of the new Solver class  
						
						
						
						
					 
					
						2018-06-28 16:14:16 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						fecec803d9 
					 
					
						
						
							
							Merge branch 'feature/hadrons' of  https://github.com/paboyle/Grid  into feature/hadrons  
						
						
						
						
					 
					
						2018-06-28 16:13:43 +01:00 
						 
				 
			
				
					
						
							
							
								fionnoh 
							
						 
					 
					
						
						
							
						
						8fe9a13cdd 
					 
					
						
						
							
							Merge branch 'feature/hadrons' of  https://github.com/paboyle/Grid  into feature/hadrons  
						
						
						
						
					 
					
						2018-06-28 16:13:07 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3a50afe7e7 
					 
					
						
						
							
							GPU dslash updates  
						
						
						
						
					 
					
						2018-06-27 22:32:21 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						f8e880b445 
					 
					
						
						
							
							Loop for s and xyzt offlow  
						
						
						
						
					 
					
						2018-06-27 21:49:57 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3e947527cb 
					 
					
						
						
							
							Move looping over "s" and "site" into kernels for GPU optimisatoin  
						
						
						
						
					 
					
						2018-06-27 21:29:43 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						31f65beac8 
					 
					
						
						
							
							Move site and Ls looping into the kernels  
						
						
						
						
					 
					
						2018-06-27 21:28:48 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						38e2a32ac9 
					 
					
						
						
							
							Single SIMD lane operations for CUDA  
						
						
						
						
					 
					
						2018-06-27 21:28:06 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						efa84ca50a 
					 
					
						
						
							
							Keep Cuda 9.1 happy  
						
						
						
						
					 
					
						2018-06-27 21:27:32 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						5e96d6d04c 
					 
					
						
						
							
							Keep CUDA happy  
						
						
						
						
					 
					
						2018-06-27 21:27:11 +01:00