Guido Cossu 
							
						 
					 
					
						
						
							
						
						8b6a6c8236 
					 
					
						
						
							
							Resolving small merge conflict  
						
						
						
						
					 
					
						2017-02-09 16:20:24 +00:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						e0571c872b 
					 
					
						
						
							
							Merge branch 'develop' into feature/hmc_generalise  
						
						
						
						
					 
					
						2017-02-09 16:12:00 +00:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						17629b8d9e 
					 
					
						
						
							
							Merge branch 'develop' into feature/hmc_generalise  
						
						
						
						
					 
					
						2017-01-25 11:33:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a37e71f362 
					 
					
						
						
							
							New automatic implementation of gamma matrices, Meson and SeqGamma are broken  
						
						
						
						
					 
					
						2017-01-23 19:13:43 -08:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						55cb22ad67 
					 
					
						
						
							
							Z mobius bmark  
						
						
						
						
					 
					
						2016-12-18 00:55:37 +00:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						0bd296dda4 
					 
					
						
						
							
							Adding check of the Dag part in the benchmark  
						
						
						
						
					 
					
						2016-12-14 03:15:09 +00:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						ff71a8e847 
					 
					
						
						
							
							Ready for sim  
						
						
						
						
					 
					
						2016-12-08 17:00:32 +00:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						e27c6b217c 
					 
					
						
						
							
							Updating  
						
						
						
						
					 
					
						2016-12-01 12:42:53 +00:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						cd01c1dbe9 
					 
					
						
						
							
							Ls 16 more relevant  
						
						
						
						
					 
					
						2016-11-30 22:11:10 +00:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						bd0430b34f 
					 
					
						
						
							
							Serialisation in malloc fixed  
						
						
						
						
					 
					
						2016-11-29 22:27:55 +00:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						2f92b4860b 
					 
					
						
						
							
							Test the full Mooee sector  
						
						
						
						
					 
					
						2016-11-29 00:15:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						433afd36f5 
					 
					
						
						
							
							Makefile rule for simple_* objects  
						
						
						
						
					 
					
						2016-11-19 01:33:13 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						042ae5b87c 
					 
					
						
						
							
							generic 256bits SIMD  
						
						
						
						
					 
					
						2016-11-15 12:16:15 +00:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						33dc1f51b5 
					 
					
						
						
							
							Final  sign off commits from Cori-1  
						
						
						
						
					 
					
						2016-11-09 04:11:03 -08:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						757a928f9a 
					 
					
						
						
							
							Improvement to use own SHM_OPEN call to avoid openmpi bug.  
						
						
						
						
					 
					
						2016-11-02 12:37:46 +00:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						bb94ddd0eb 
					 
					
						
						
							
							Tidy up of mpi3; also some cleaning of the dslash controls.  
						
						
						
						
					 
					
						2016-11-02 08:07:09 +00:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						791cb050c8 
					 
					
						
						
							
							Comms improvements  
						
						
						
						
					 
					
						2016-11-01 11:35:43 +00:00 
						 
				 
			
				
					
						
							
							
								azusayamaguchi 
							
						 
					 
					
						
						
							
						
						b6a65059a2 
					 
					
						
						
							
							Update to use shared memory to contain the stencil comms buffers  
						
						... 
						
						
						
						Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions 
						
						
					 
					
						2016-10-24 17:30:43 +01:00 
						 
				 
			
				
					
						
							
							
								azusayamaguchi 
							
						 
					 
					
						
						
							
						
						c190221fd3 
					 
					
						
						
							
							Internal SHM comms in non-simd directions working  
						
						... 
						
						
						
						Need to fix simd directions 
						
						
					 
					
						2016-10-22 18:14:27 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						a762b1fb71 
					 
					
						
						
							
							MPI3 working with a bounce through shared memory on my laptop.  
						
						... 
						
						
						
						Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node. 
						
						
					 
					
						2016-10-21 09:03:26 +01:00 
						 
				 
			
				
					
						
							
							
								azusayamaguchi 
							
						 
					 
					
						
						
							
						
						81f2aeaece 
					 
					
						
						
							
							KNL streaming stores, and KNL performance coutners  
						
						
						
						
					 
					
						2016-10-12 11:45:22 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						2e453dfbf5 
					 
					
						
						
							
							Added some instrumentation to benchmark the force computation  
						
						
						
						
					 
					
						2016-10-06 17:52:45 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						4089984431 
					 
					
						
						
							
							Timing hooks  
						
						
						
						
					 
					
						2016-10-06 09:25:12 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						0fd179fb33 
					 
					
						
						
							
							Merge branch 'develop' into feature/hirep  
						
						
						
						
					 
					
						2016-09-01 12:59:53 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						fd5614738d 
					 
					
						
						
							
							Merge branch 'develop' into feature/hirep  
						
						
						
						
					 
					
						2016-08-30 18:21:36 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						5a68715be3 
					 
					
						
						
							
							Richards sweep test  
						
						
						
						
					 
					
						2016-08-05 10:51:57 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						32bc7a6ab8 
					 
					
						
						
							
							MPI back out of change that hangs  
						
						... 
						
						
						
						AVX2 for clang, gcc needs the -mfma flag. 
						
						
					 
					
						2016-08-05 10:36:00 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b65e72e521 
					 
					
						
						
							
							Merge pull request  #43  from rprollins/bench/output-format  
						
						... 
						
						
						
						Benchmark_dwf_sweep and Benchmark_zmm output formats 
						
						
					 
					
						2016-08-04 16:47:01 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						629283726b 
					 
					
						
						
							
							build system: local Grid link flag moved to configure.ac  
						
						
						
						
					 
					
						2016-08-03 15:07:42 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9e5b934d21 
					 
					
						
						
							
							improved LAPACK configuration  
						
						
						
						
					 
					
						2016-08-02 17:26:54 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e9f30cab2c 
					 
					
						
						
							
							first working version for the new build system  
						
						
						
						
					 
					
						2016-07-30 17:53:18 +01:00 
						 
				 
			
				
					
						
							
							
								Richard Rollins 
							
						 
					 
					
						
						
							
						
						df6c9f55d1 
					 
					
						
						
							
							Use common benchmark output format for dwf_sweep and zmm  
						
						
						
						
					 
					
						2016-07-20 17:38:56 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						f4dd5062d7 
					 
					
						
						
							
							Merge branch 'develop' of  https://github.com/paboyle/Grid  into develop  
						
						
						
						
					 
					
						2016-07-15 19:26:06 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						9db2c6525d 
					 
					
						
						
							
							updating benchmarks for red black 4d for Ls vectorised code  
						
						
						
						
					 
					
						2016-07-14 23:44:02 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						ef97e32152 
					 
					
						
						
							
							Adding persistent communicators  
						
						
						
						
					 
					
						2016-07-08 17:16:08 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						5028969d4b 
					 
					
						
						
							
							Added generators for the adjoint representation  
						
						
						
						
					 
					
						2016-07-08 15:40:11 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						a0676beeb1 
					 
					
						
						
							
							Open up dependency on Eigen and FFTW  
						
						
						
						
					 
					
						2016-07-07 22:31:07 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						fdfbf11c6d 
					 
					
						
						
							
							Merge branch 'develop' into temporary-smearing  
						
						
						
						
					 
					
						2016-07-04 18:45:10 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						9cb90f714e 
					 
					
						
						
							
							Merge remote-tracking branch 'origin/develop' into temporary-smearing  
						
						
						
						
					 
					
						2016-07-04 17:28:40 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						bfe14000a9 
					 
					
						
						
							
							Double compile fix  
						
						
						
						
					 
					
						2016-07-01 16:33:51 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						680645f849 
					 
					
						
						
							
							Merge branch 'release/v0.5.0'  
						
						
						
						
					 
					
						2016-06-30 15:15:03 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						2d8bb4c594 
					 
					
						
						
							
							Tweaks  
						
						
						
						
					 
					
						2016-06-30 14:35:01 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						51cb2d4328 
					 
					
						
						
							
							update file lists  
						
						
						
						
					 
					
						2016-06-30 14:35:01 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						6d58cb2a68 
					 
					
						
						
							
							Enable reordering of the loops in the assembler for cache friendly.  
						
						... 
						
						
						
						This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching. 
						
						
					 
					
						2016-06-30 14:35:01 -07:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						565e9329ba 
					 
					
						
						
							
							Changed the colouring classes  
						
						
						
						
					 
					
						2016-06-30 16:51:03 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						5e02392f9c 
					 
					
						
						
							
							Fixed compilation error for benchmark_dwf  
						
						... 
						
						
						
						Some parts were assuming floating point precision 
						
						
					 
					
						2016-06-20 12:30:51 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						55f65b81b5 
					 
					
						
						
							
							Improvements to the assembler interface that let us move chunks of the  
						
						... 
						
						
						
						site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work. 
						
						
					 
					
						2016-06-09 01:12:36 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						05acc22920 
					 
					
						
						
							
							placeholder for non temporal loads optimisation  
						
						
						
						
					 
					
						2016-06-07 13:18:21 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						8ac021de73 
					 
					
						
						
							
							Added a test an fixed it for red black precon Ls innermost vectorised DWF  
						
						
						
						
					 
					
						2016-06-07 13:16:56 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						786ca52c43 
					 
					
						
						
							
							Problems remain in the red black preconditioning of the Ls vectorisation  
						
						
						
						
					 
					
						2016-06-06 07:05:51 -07:00