paboyle 
							
						 
					 
					
						
						
							
						
						6d58cb2a68 
					 
					
						
						
							
							Enable reordering of the loops in the assembler for cache friendly.  
						
						... 
						
						
						
						This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching. 
						
						
					 
					
						2016-06-30 14:35:01 -07:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						565e9329ba 
					 
					
						
						
							
							Changed the colouring classes  
						
						
						
						
					 
					
						2016-06-30 16:51:03 +01:00 
						 
				 
			
				
					
						
							
							
								Guido Cossu 
							
						 
					 
					
						
						
							
						
						5e02392f9c 
					 
					
						
						
							
							Fixed compilation error for benchmark_dwf  
						
						... 
						
						
						
						Some parts were assuming floating point precision 
						
						
					 
					
						2016-06-20 12:30:51 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						55f65b81b5 
					 
					
						
						
							
							Improvements to the assembler interface that let us move chunks of the  
						
						... 
						
						
						
						site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work. 
						
						
					 
					
						2016-06-09 01:12:36 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						05acc22920 
					 
					
						
						
							
							placeholder for non temporal loads optimisation  
						
						
						
						
					 
					
						2016-06-07 13:18:21 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						8ac021de73 
					 
					
						
						
							
							Added a test an fixed it for red black precon Ls innermost vectorised DWF  
						
						
						
						
					 
					
						2016-06-07 13:16:56 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						786ca52c43 
					 
					
						
						
							
							Problems remain in the red black preconditioning of the Ls vectorisation  
						
						
						
						
					 
					
						2016-06-06 07:05:51 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						53d06046b0 
					 
					
						
						
							
							Compiling updates for KNL  
						
						
						
						
					 
					
						2016-06-03 03:47:54 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						139cc5f1ae 
					 
					
						
						
							
							Large change with KNL preparation  
						
						
						
						
					 
					
						2016-06-03 03:24:26 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						f2ae9682ff 
					 
					
						
						
							
							Remove some timing hacks  
						
						
						
						
					 
					
						2016-04-19 15:14:32 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						528eb773ad 
					 
					
						
						
							
							Merged.  
						
						... 
						
						
						
						Merge branch 'master' of https://github.com/paboyle/Grid  
						
						
					 
					
						2016-04-19 22:24:34 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						c323425496 
					 
					
						
						
							
							Small change  
						
						
						
						
					 
					
						2016-04-11 10:38:43 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						650e02b344 
					 
					
						
						
							
							Smaller vols too  
						
						
						
						
					 
					
						2016-04-06 06:52:09 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						a524ca2a4b 
					 
					
						
						
							
							New benchmark update  
						
						
						
						
					 
					
						2016-04-06 03:35:56 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						23a7176b71 
					 
					
						
						
							
							Loop over volumes  
						
						
						
						
					 
					
						2016-04-06 03:22:11 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						b1192a8908 
					 
					
						
						
							
							Benchmark_zmm added  
						
						
						
						
					 
					
						2016-04-06 03:00:07 -07:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						e8dddb1596 
					 
					
						
						
							
							Adding extra benchmark  
						
						
						
						
					 
					
						2016-04-06 10:32:54 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						c77b7ee897 
					 
					
						
						
							
							AddSub based alternate SU3 routine  
						
						
						
						
					 
					
						2016-03-28 17:55:22 -06:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						e17c773a0b 
					 
					
						
						
							
							Longer runs for vtune  
						
						
						
						
					 
					
						2016-03-16 02:29:13 -07:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						f7be108e35 
					 
					
						
						
							
							100 iters faster  
						
						
						
						
					 
					
						2016-02-15 16:03:04 -06:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						fc6ad65751 
					 
					
						
						
							
							Pushed the overlap comms tweaks  
						
						
						
						
					 
					
						2016-01-11 06:34:22 -08:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						02452afd36 
					 
					
						
						
							
							Optional overlap of comms with compute  
						
						
						
						
					 
					
						2016-01-04 14:18:40 +00:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						331768dcff 
					 
					
						
						
							
							Added overlap comms compute mode  
						
						
						
						
					 
					
						2016-01-03 01:38:11 +00:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						aae8bf31a7 
					 
					
						
						
							
							Global edit adding copyright and license info to every source file.  
						
						
						
						
					 
					
						2016-01-02 14:51:32 +00:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						3ce10aa975 
					 
					
						
						
							
							Fix a regression failure on Mobius; chroma regression added  
						
						
						
						
					 
					
						2015-12-10 22:55:00 +00:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						1cc0d7b811 
					 
					
						
						
							
							Bigger ncall as timing loops got small on cori  
						
						
						
						
					 
					
						2015-11-07 00:04:40 -08:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						27813cf518 
					 
					
						
						
							
							More timing detail reported  
						
						
						
						
					 
					
						2015-11-06 05:27:13 -06:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						16c7993434 
					 
					
						
						
							
							Merge branch 'master' of github.com:paboyle/Grid  
						
						... 
						
						
						
						Conflicts:
	lib/simd/Grid_avx512.h
	lib/simd/Grid_imci.h 
						
						
					 
					
						2015-11-04 03:32:10 -08:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						32762346ad 
					 
					
						
						
							
							Better run time on KNC  
						
						
						
						
					 
					
						2015-11-04 03:25:34 -08:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						0f48658a27 
					 
					
						
						
							
							Update minor  
						
						
						
						
					 
					
						2015-11-04 03:23:46 -08:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						dfc1de6f60 
					 
					
						
						
							
							Merge branch 'master' of github.com:paboyle/Grid  
						
						
						
						
					 
					
						2015-11-04 05:14:26 -06:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						b3d70a3bb2 
					 
					
						
						
							
							Ncall change  
						
						
						
						
					 
					
						2015-11-04 09:55:21 +00:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						c26220e9ab 
					 
					
						
						
							
							EO benchmark as well as non-eo  
						
						
						
						
					 
					
						2015-11-04 09:54:48 +00:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						3726fe7481 
					 
					
						
						
							
							Bigger vec length  
						
						
						
						
					 
					
						2015-10-09 00:42:54 +02:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						af89c40462 
					 
					
						
						
							
							Better timing tweaks to give sensible results on 24 threads on Edison dual ivybridge nodes.  
						
						
						
						
					 
					
						2015-09-28 16:09:04 -07:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						9f4f65cb46 
					 
					
						
						
							
							Added a decoupled memory system benchmark to remove thread synch overhead  
						
						
						
						
					 
					
						2015-09-26 18:23:57 -07:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						9183380946 
					 
					
						
						
							
							Gparity test added; partial implementation -- this is Chris K's doubled lattice only  
						
						... 
						
						
						
						and have to regress this with the 2 flavour implementation. 
						
						
					 
					
						2015-08-12 09:49:33 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						84a66476ab 
					 
					
						
						
							
							Rework/global edit to enforce type templating of fermion operators.  
						
						... 
						
						
						
						Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types. 
						
						
					 
					
						2015-08-10 20:47:44 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						d1afebf71e 
					 
					
						
						
							
							Sizable improvement in multigrid for unsquared.  
						
						... 
						
						
						
						6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01
Substantial effort on timing and logging infrastructure 
						
						
					 
					
						2015-07-24 01:31:13 +09:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						31a0c8d783 
					 
					
						
						
							
							Merge branch 'master' of  https://github.com/paboyle/Grid  
						
						
						
						
					 
					
						2015-07-01 22:51:04 +01:00 
						 
				 
			
				
					
						
							
							
								paboyle 
							
						 
					 
					
						
						
							
						
						39271b02dd 
					 
					
						
						
							
							Modified memory bw test to display word size  
						
						
						
						
					 
					
						2015-07-01 22:46:53 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						638d2cda11 
					 
					
						
						
							
							Change the SIMD command correctly with precision = double vs. single and  
						
						... 
						
						
						
						connect the "Real" default precisoin to a configure flag.
Have RealF, RealD and Real types, where Real is compile target dependent single/double,
RealF is single and RealD is double etc.. 
						
						
					 
					
						2015-07-01 22:45:15 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						9143f071d7 
					 
					
						
						
							
							Merge branch 'master' of  https://github.com/paboyle/Grid  
						
						
						
						
					 
					
						2015-06-30 15:17:46 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						8ad81bed32 
					 
					
						
						
							
							big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to  
						
						... 
						
						
						
						near the bleeding edge I guess 
						
						
					 
					
						2015-06-30 15:01:44 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						93916f400d 
					 
					
						
						
							
							Update Benchmark_comms.cc  
						
						
						
						
					 
					
						2015-06-25 10:59:53 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						f07a17ba2c 
					 
					
						
						
							
							Assist for generating file lists contained in Make.inc files for convenience when things are added  
						
						
						
						
					 
					
						2015-06-03 13:07:00 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						84b5c7217d 
					 
					
						
						
							
							CG test written and passes i.e. converges with small true residual  
						
						... 
						
						
						
						in RedBlack MpcDagMpc, Unprec MdagM and Schur red black solver for
each of.
DomainWallFermion
MobiusFermion
MobiusZolotarevFermion
ScaledShamirFermion
ScaledShamirZolotarevFermion 
						
						
					 
					
						2015-06-03 10:54:03 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						69f4d58381 
					 
					
						
						
							
							Reorg; moving prec/unprec/schur CG for Wilson and DWF into tests as these are really tests and not benchmarks  
						
						... 
						
						
						
						(no performance reports, only convergence test). 
						
						
					 
					
						2015-06-02 17:25:26 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						3845f267cb 
					 
					
						
						
							
							Domain wall fermions now invert ; have the basis set up for  
						
						... 
						
						
						
						Tanh/Zolo * (Cayley/PartFrac/ContFrac) * (Mobius/Shamir/Wilson)
Approx        Representation               Kernel.
All are done with space-time taking part in checkerboarding, Ls uncheckerboarded
Have only so far tested the Domain Wall limit of mobius, and at that only checked
that it
i)  Inverts
ii) 5dim DW == Ls copies of 4dim D2
iii) MeeInv Mee == 1
iv) Meo+Mee+Moe+Moo == M unprec.
v) MpcDagMpc is hermitan
vi) Mdag is the adjoint of M between stochastic vectors.
That said, the RB schur solve, RB MpcDagMpc solve, Unprec solve
all converge and the true residual becomes small; so pretty good tests. 
						
						
					 
					
						2015-06-02 16:57:12 +01:00 
						 
				 
			
				
					
						
							
							
								Peter Boyle 
							
						 
					 
					
						
						
							
						
						5644ab1e19 
					 
					
						
						
							
							Large scale change to support 5d fermion formulations.  
						
						... 
						
						
						
						Have 5d replicated wilson with 4d gauge working and matrix regressing
to Ls copies of wilson. 
						
						
					 
					
						2015-05-31 15:09:02 +01:00