Peter Boyle 
							
						 
					 
					
						
						
							
						
						f8797e1e3e 
					 
					
						
						
							
							bug fix. works now and great face performance  
						
						
						
						
					 
					
						2017-04-26 03:14:02 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						fd1eb7de13 
					 
					
						
						
							
							Clean implementation of the exterior faces listing only those points on the boudary  
						
						
						
						
					 
					
						2017-04-26 02:34:52 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						2ce898efa3 
					 
					
						
						
							
							Pretty code  
						
						
						
						
					 
					
						2017-04-26 02:34:25 -04:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						ab66bac4e6 
					 
					
						
						
							
							Think I'm getting on top of the reduced cost exterior precomputed list of links  
						
						
						
						
					 
					
						2017-04-25 08:50:26 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						56277a11c8 
					 
					
						
						
							
							Build a list of whats on the surface  
						
						
						
						
					 
					
						2017-04-24 17:06:15 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						5b55867a7a 
					 
					
						
						
							
							Slightly cheaper Ext assembly  
						
						
						
						
					 
					
						2017-04-24 05:36:11 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						3accb1ef89 
					 
					
						
						
							
							Debugged assemply split phase with interior suppression  
						
						
						
						
					 
					
						2017-04-23 19:30:19 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						e3d0e31525 
					 
					
						
						
							
							Debugged assemply split phase with interior suppression  
						
						
						
						
					 
					
						2017-04-23 19:29:27 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						5812eb8a8c 
					 
					
						
						
							
							Partially fixed. But the comms-overlap does not work yet.  
						
						
						
						
					 
					
						2017-04-22 18:50:25 -04:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						ac58565d0a 
					 
					
						
						
							
							Dangerous rewrite of the assembly. If I make a mistake the debug will be painful.  
						
						
						
						
					 
					
						2017-04-22 19:31:04 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3703b718aa 
					 
					
						
						
							
							Mark up a table if a given site only receives from itself; including MPI3 splitting info.  
						
						
						
						
					 
					
						2017-04-22 19:28:37 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						b722889234 
					 
					
						
						
							
							Try a better load balancing loop  
						
						
						
						
					 
					
						2017-04-22 19:27:41 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						abba44a837 
					 
					
						
						
							
							Hand unrolled for overlapped comms  
						
						
						
						
					 
					
						2017-04-22 17:45:17 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						f301be94ce 
					 
					
						
						
							
							Fixed  
						
						
						
						
					 
					
						2017-04-22 17:42:31 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						1d1b225497 
					 
					
						
						
							
							Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).  
						
						
						
						
					 
					
						2017-04-22 09:05:28 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						53a785a3dd 
					 
					
						
						
							
							Fixing the KNL compile  
						
						
						
						
					 
					
						2017-04-22 08:11:51 -04:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						736bf3c866 
					 
					
						
						
							
							Major rework of stencil. Half precision and MPI3 now working.  
						
						
						
						
					 
					
						2017-04-22 11:33:50 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						b9bbe5d188 
					 
					
						
						
							
							L1p config bg/q  
						
						
						
						
					 
					
						2017-04-22 11:33:09 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3844bcf800 
					 
					
						
						
							
							If no f16c instructions supported must use software half precision conversion.  
						
						... 
						
						
						
						This will also become useful on BG/Q, so will move out from SSE4 into a general area.
Lifted the Eigen half precision from web. Looks sensible, but not extensively regressed
against the intrinsics implementation yet. 
						
						
					 
					
						2017-04-20 15:30:52 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						e1a2319d01 
					 
					
						
						
							
							Simple compressor moved out of cshift into stencil  
						
						
						
						
					 
					
						2017-04-20 13:18:15 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						180c732b4c 
					 
					
						
						
							
							Move compressors out of Cshift.  
						
						... 
						
						
						
						Slice iterators would help 
						
						
					 
					
						2017-04-20 13:17:55 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						d2312e9874 
					 
					
						
						
							
							Drop compressor entirely from Cshift to only Stencil.  
						
						
						
						
					 
					
						2017-04-20 13:16:55 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						fc4ab9ccd5 
					 
					
						
						
							
							Working half precision comms  
						
						
						
						
					 
					
						2017-04-20 11:20:26 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						4a340aa5ca 
					 
					
						
						
							
							Massive compressor rework to support reduced precision comms  
						
						
						
						
					 
					
						2017-04-20 09:28:27 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3b7de792d5 
					 
					
						
						
							
							Type comparison in the traits work  
						
						
						
						
					 
					
						2017-04-18 13:28:04 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						557c3fa109 
					 
					
						
						
							
							Pretty change  
						
						
						
						
					 
					
						2017-04-18 13:27:38 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						8e161152e4 
					 
					
						
						
							
							MultiRHS solver improvements with slice operations moved into lattice and sped up.  
						
						... 
						
						
						
						Block solver requires a lot of performance work. 
						
						
					 
					
						2017-04-18 10:51:55 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3141ebac10 
					 
					
						
						
							
							MultiRHS working, starting to optimise. Block doesn't and I thought it already was; puzzled.  
						
						
						
						
					 
					
						2017-04-17 10:50:19 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						7ede696126 
					 
					
						
						
							
							Non compile of tests fixed  
						
						
						
						
					 
					
						2017-04-16 23:40:00 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						bf516c3b81 
					 
					
						
						
							
							higher precision reduction variables in norm and inner product  
						
						
						
						
					 
					
						2017-04-15 12:27:28 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						441a52ee5d 
					 
					
						
						
							
							First cut at higher precision reduction  
						
						
						
						
					 
					
						2017-04-15 10:57:21 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						a8db024c92 
					 
					
						
						
							
							Cleaning up the dense matrix and lanczos sector  
						
						
						
						
					 
					
						2017-04-15 08:54:11 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3ca41458a3 
					 
					
						
						
							
							Fix to no USE_FP16 case  
						
						
						
						
					 
					
						2017-04-14 14:20:54 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						951be75292 
					 
					
						
						
							
							Half precision conversion working on AVX512 now too  
						
						
						
						
					 
					
						2017-04-13 17:35:11 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						b9113ed310 
					 
					
						
						
							
							Patches for knl  
						
						
						
						
					 
					
						2017-04-13 12:02:12 -04:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						42fb49d3fd 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/paboyle/Grid  into develop  
						
						
						
						
					 
					
						2017-04-13 14:12:47 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						db5ea001a3 
					 
					
						
						
							
							Update to use Xcode 8.3 since -mfp16 causes SIGILL  
						
						
						
						
					 
					
						2017-04-13 12:22:40 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						1d502e4ed6 
					 
					
						
						
							
							FP16 optional compile time  
						
						
						
						
					 
					
						2017-04-13 11:55:24 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						73cdf0fffe 
					 
					
						
						
							
							Drop f16c from SSE because of a macos compile error on travis  
						
						
						
						
					 
					
						2017-04-13 11:23:41 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						1c25773319 
					 
					
						
						
							
							Trap illegal instructions  
						
						
						
						
					 
					
						2017-04-13 10:51:40 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						94eb829d08 
					 
					
						
						
							
							Align cast fixed for __mm128i gcc complained  
						
						
						
						
					 
					
						2017-04-13 08:40:44 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						68392ddb5b 
					 
					
						
						
							
							Exchange in generic  
						
						... 
						
						
						
						Precision change in AVX, SSE, AVX512, Generic. QPX still to do. 
						
						
					 
					
						2017-04-13 08:38:12 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						cb6b81ae82 
					 
					
						
						
							
							Half precision conversion  
						
						
						
						
					 
					
						2017-04-12 19:32:37 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						53e76b41d2 
					 
					
						
						
							
							Merge branch 'develop' into feature/hadrons  
						
						
						
						
					 
					
						2017-04-10 17:00:53 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8ef4300412 
					 
					
						
						
							
							spurious .dirstamp files removed  
						
						
						
						
					 
					
						2017-04-10 17:00:22 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						98a24ebf31 
					 
					
						
						
							
							The macro “magics” is very intensive for the preprocessor in the measurement code which has numerous serialisable classes. Reducing the number of serialisable fields to 64 (instead of 1024) helps a lot, this is enough for now and can be extended trivially if needed in the future.  
						
						
						
						
					 
					
						2017-04-10 16:58:54 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						b12dc89d26 
					 
					
						
						
							
							Commenting and clean up  
						
						
						
						
					 
					
						2017-04-10 20:38:20 +09:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						d80d802f9d 
					 
					
						
						
							
							MultiRHS solver test  
						
						
						
						
					 
					
						2017-04-10 00:12:12 +09:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3d99b09dba 
					 
					
						
						
							
							Start of blockCG  
						
						
						
						
					 
					
						2017-04-09 23:42:10 +09:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						db5f6d3ae3 
					 
					
						
						
							
							Verbose fix  
						
						
						
						
					 
					
						2017-04-09 23:41:30 +09:00