Guido Cossu 
							
						 
					 
					
						
						
							
						
						453cf2a1c6 
					 
					
						
						
							
							Moving the topological charge outside the HMC related routines  
						
						
						
						
					 
					
						2017-05-02 14:40:12 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						de7bbfa5f9 
					 
					
						
						
							
							Adding ParameterFile option for the HMC  
						
						
						
						
					 
					
						2017-05-02 12:16:16 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						74f451715f 
					 
					
						
						
							
							Fix for Mac compilation on the size_t uint64_t types  
						
						
						
						
					 
					
						2017-05-01 15:12:07 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						4063238943 
					 
					
						
						
							
							Adding HMC test file example for Mobius + smearing  
						
						
						
						
					 
					
						2017-05-01 13:44:00 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						3344788fa1 
					 
					
						
						
							
							Merge branch 'develop' into feature/hmc_generalise  
						
						
						
						
					 
					
						2017-05-01 12:13:56 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						99220f6531 
					 
					
						
						
							
							Fixes and better timing  
						
						
						
						
					 
					
						2017-04-26 17:24:11 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						f8797e1e3e 
					 
					
						
						
							
							bug fix. works now and great face performance  
						
						
						
						
					 
					
						2017-04-26 03:14:02 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						fd1eb7de13 
					 
					
						
						
							
							Clean implementation of the exterior faces listing only those points on the boudary  
						
						
						
						
					 
					
						2017-04-26 02:34:52 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						2ce898efa3 
					 
					
						
						
							
							Pretty code  
						
						
						
						
					 
					
						2017-04-26 02:34:25 -04:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						ab66bac4e6 
					 
					
						
						
							
							Think I'm getting on top of the reduced cost exterior precomputed list of links  
						
						
						
						
					 
					
						2017-04-25 08:50:26 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						56277a11c8 
					 
					
						
						
							
							Build a list of whats on the surface  
						
						
						
						
					 
					
						2017-04-24 17:06:15 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						5b55867a7a 
					 
					
						
						
							
							Slightly cheaper Ext assembly  
						
						
						
						
					 
					
						2017-04-24 05:36:11 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						3accb1ef89 
					 
					
						
						
							
							Debugged assemply split phase with interior suppression  
						
						
						
						
					 
					
						2017-04-23 19:30:19 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						e3d0e31525 
					 
					
						
						
							
							Debugged assemply split phase with interior suppression  
						
						
						
						
					 
					
						2017-04-23 19:29:27 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						5812eb8a8c 
					 
					
						
						
							
							Partially fixed. But the comms-overlap does not work yet.  
						
						
						
						
					 
					
						2017-04-22 18:50:25 -04:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						ac58565d0a 
					 
					
						
						
							
							Dangerous rewrite of the assembly. If I make a mistake the debug will be painful.  
						
						
						
						
					 
					
						2017-04-22 19:31:04 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3703b718aa 
					 
					
						
						
							
							Mark up a table if a given site only receives from itself; including MPI3 splitting info.  
						
						
						
						
					 
					
						2017-04-22 19:28:37 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						b722889234 
					 
					
						
						
							
							Try a better load balancing loop  
						
						
						
						
					 
					
						2017-04-22 19:27:41 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						abba44a837 
					 
					
						
						
							
							Hand unrolled for overlapped comms  
						
						
						
						
					 
					
						2017-04-22 17:45:17 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						f301be94ce 
					 
					
						
						
							
							Fixed  
						
						
						
						
					 
					
						2017-04-22 17:42:31 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						1d1b225497 
					 
					
						
						
							
							Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).  
						
						
						
						
					 
					
						2017-04-22 09:05:28 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						53a785a3dd 
					 
					
						
						
							
							Fixing the KNL compile  
						
						
						
						
					 
					
						2017-04-22 08:11:51 -04:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						736bf3c866 
					 
					
						
						
							
							Major rework of stencil. Half precision and MPI3 now working.  
						
						
						
						
					 
					
						2017-04-22 11:33:50 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						b9bbe5d188 
					 
					
						
						
							
							L1p config bg/q  
						
						
						
						
					 
					
						2017-04-22 11:33:09 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3844bcf800 
					 
					
						
						
							
							If no f16c instructions supported must use software half precision conversion.  
						
						... 
						
						
						
						This will also become useful on BG/Q, so will move out from SSE4 into a general area.
Lifted the Eigen half precision from web. Looks sensible, but not extensively regressed
against the intrinsics implementation yet. 
						
						
					 
					
						2017-04-20 15:30:52 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						e1a2319d01 
					 
					
						
						
							
							Simple compressor moved out of cshift into stencil  
						
						
						
						
					 
					
						2017-04-20 13:18:15 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						180c732b4c 
					 
					
						
						
							
							Move compressors out of Cshift.  
						
						... 
						
						
						
						Slice iterators would help 
						
						
					 
					
						2017-04-20 13:17:55 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						d2312e9874 
					 
					
						
						
							
							Drop compressor entirely from Cshift to only Stencil.  
						
						
						
						
					 
					
						2017-04-20 13:16:55 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						fc4ab9ccd5 
					 
					
						
						
							
							Working half precision comms  
						
						
						
						
					 
					
						2017-04-20 11:20:26 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						4a340aa5ca 
					 
					
						
						
							
							Massive compressor rework to support reduced precision comms  
						
						
						
						
					 
					
						2017-04-20 09:28:27 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3b7de792d5 
					 
					
						
						
							
							Type comparison in the traits work  
						
						
						
						
					 
					
						2017-04-18 13:28:04 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						557c3fa109 
					 
					
						
						
							
							Pretty change  
						
						
						
						
					 
					
						2017-04-18 13:27:38 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						8e161152e4 
					 
					
						
						
							
							MultiRHS solver improvements with slice operations moved into lattice and sped up.  
						
						... 
						
						
						
						Block solver requires a lot of performance work. 
						
						
					 
					
						2017-04-18 10:51:55 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3141ebac10 
					 
					
						
						
							
							MultiRHS working, starting to optimise. Block doesn't and I thought it already was; puzzled.  
						
						
						
						
					 
					
						2017-04-17 10:50:19 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						7ede696126 
					 
					
						
						
							
							Non compile of tests fixed  
						
						
						
						
					 
					
						2017-04-16 23:40:00 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						bf516c3b81 
					 
					
						
						
							
							higher precision reduction variables in norm and inner product  
						
						
						
						
					 
					
						2017-04-15 12:27:28 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						441a52ee5d 
					 
					
						
						
							
							First cut at higher precision reduction  
						
						
						
						
					 
					
						2017-04-15 10:57:21 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						a8db024c92 
					 
					
						
						
							
							Cleaning up the dense matrix and lanczos sector  
						
						
						
						
					 
					
						2017-04-15 08:54:11 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3ca41458a3 
					 
					
						
						
							
							Fix to no USE_FP16 case  
						
						
						
						
					 
					
						2017-04-14 14:20:54 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						951be75292 
					 
					
						
						
							
							Half precision conversion working on AVX512 now too  
						
						
						
						
					 
					
						2017-04-13 17:35:11 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						b9113ed310 
					 
					
						
						
							
							Patches for knl  
						
						
						
						
					 
					
						2017-04-13 12:02:12 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a6a0da873f 
					 
					
						
						
							
							Merge branch 'feature/hadrons' into feature/qed-fvol  
						
						
						
						
					 
					
						2017-04-13 15:31:06 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						42fb49d3fd 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/paboyle/Grid  into develop  
						
						
						
						
					 
					
						2017-04-13 14:12:47 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						db5ea001a3 
					 
					
						
						
							
							Update to use Xcode 8.3 since -mfp16 causes SIGILL  
						
						
						
						
					 
					
						2017-04-13 12:22:40 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						1d502e4ed6 
					 
					
						
						
							
							FP16 optional compile time  
						
						
						
						
					 
					
						2017-04-13 11:55:24 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						73cdf0fffe 
					 
					
						
						
							
							Drop f16c from SSE because of a macos compile error on travis  
						
						
						
						
					 
					
						2017-04-13 11:23:41 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						1c25773319 
					 
					
						
						
							
							Trap illegal instructions  
						
						
						
						
					 
					
						2017-04-13 10:51:40 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						94eb829d08 
					 
					
						
						
							
							Align cast fixed for __mm128i gcc complained  
						
						
						
						
					 
					
						2017-04-13 08:40:44 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						68392ddb5b 
					 
					
						
						
							
							Exchange in generic  
						
						... 
						
						
						
						Precision change in AVX, SSE, AVX512, Generic. QPX still to do. 
						
						
					 
					
						2017-04-13 08:38:12 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						cb6b81ae82 
					 
					
						
						
							
							Half precision conversion  
						
						
						
						
					 
					
						2017-04-12 19:32:37 +01:00