portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-06-25 21:13:30 +01:00

Author	SHA1	Message	Date
paboyle	6d58cb2a68	Enable reordering of the loops in the assembler for cache friendly. This gets in the way of L2 prefetching however. Do next next link in stencil prefetching.	2016-06-30 14:35:01 -07:00
paboyle	c8b35d960c	Merge branch 'develop' of https://github.com/paboyle/Grid into feature/knl-cache-opt	2016-06-30 14:30:49 -07:00
paboyle	532f41dd61	Asm only for avx512	2016-06-30 14:00:34 -07:00
paboyle	661b0ab45d	Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking.	2016-06-30 13:07:42 -07:00
Guido Cossu	565e9329ba	Changed the colouring classes	2016-06-30 16:51:03 +01:00
paboyle	4bc08ed995	Improved the prefetching when using cache blocking codes	2016-06-26 12:54:14 -07:00
paboyle	b2933a0557	COntrol the prefetch strategy	2016-06-25 12:55:25 -07:00
paboyle	db057cc276	Prefetch change	2016-06-25 12:54:50 -07:00
paboyle	22e88eaf54	Prefetch during save	2016-06-25 12:54:14 -07:00
paboyle	09fe3caebd	Tweaks	2016-06-25 11:08:05 -07:00
Guido Cossu	5e02392f9c	Fixed compilation error for benchmark_dwf Some parts were assuming floating point precision	2016-06-20 12:30:51 +01:00
paboyle	17a8f51a9b	update file lists	2016-06-19 11:59:10 -07:00
paboyle	1b7f88dd00	Enable reordering of the loops in the assembler for cache friendly. This gets in the way of L2 prefetching however. Do next next link in stencil prefetching.	2016-06-19 11:45:58 -07:00
portelli	d6737e4bd8	Travis fix for Linux clang builds	2016-06-14 19:15:08 +01:00
portelli	d539888e57	Merge pull request #37 from rprollins/fix/mpi_communicator Removed write to stdout in constructor for MPI CartesianCommunicator	2016-06-14 17:25:40 +01:00
Richard Rollins	86187d7cca	Removed write to stdout in constructor for MPI CartesianCommunicator	2016-06-14 15:34:20 +01:00
paboyle	87418e7df1	Slightly faster prefetching perf.	2016-06-13 02:32:52 -07:00
paboyle	55f65b81b5	Improvements to the assembler interface that let us move chunks of the site and s loop into the kernels. This will save on function call overhead and guarantee L2 prefetching strategy is right since OMP can't distribute the sub-chunks of work.	2016-06-09 01:12:36 -07:00
Azusa Yamaguchi	d9408893b3	Prefetching in the normal kernel implementation.	2016-06-08 05:43:48 -07:00
paboyle	05acc22920	placeholder for non temporal loads optimisation	2016-06-07 13:18:21 -07:00
paboyle	8ac021de73	Added a test an fixed it for red black precon Ls innermost vectorised DWF	2016-06-07 13:16:56 -07:00
paboyle	e503ef5590	Cleaned up	2016-06-07 00:11:36 +01:00
paboyle	a7682b0060	Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS	2016-06-06 23:48:21 +01:00
paboyle	d4c9d71fc8	Merge branch 'master' of https://github.com/paboyle/Grid	2016-06-06 07:06:54 -07:00
paboyle	786ca52c43	Problems remain in the red black preconditioning of the Ls vectorisation	2016-06-06 07:05:51 -07:00
Peter Boyle	048ac04abc	Update Benchmark_dwf.cc	2016-06-03 13:44:41 +01:00
Peter Boyle	f78d89bcbe	Update Lebesgue.cc kill verbose	2016-06-03 13:33:42 +01:00
paboyle	53d06046b0	Compiling updates for KNL	2016-06-03 03:47:54 -07:00
paboyle	5d3a1a025d	timers flag	2016-06-03 03:25:38 -07:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00
portelli	1c0e922585	Merge pull request #35 from aportelli/master empty SIMD fix	2016-05-27 16:49:13 +01:00
portelli	9d5f693cbe	empty SIMD fix	2016-05-24 10:56:27 +01:00
Peter Boyle	5c90c3b457	Merge pull request #34 from aportelli/master Polymorphic lattices & various small updates	2016-05-24 10:50:04 +01:00
portelli	91e04056f9	fix of the empty SIMD	2016-05-12 19:24:10 +01:00
portelli	3789e3f31c	additional fixed in slice functions	2016-05-12 18:35:38 +01:00
portelli	0c66719210	const fix in slice functions	2016-05-12 13:01:35 +01:00
paboyle	3a5b5c8bec	Save an old tar of tree	2016-05-12 03:20:17 -07:00
paboyle	fdbe071213	space added	2016-05-12 02:59:51 -07:00
portelli	4bc21ec7cb	thread CL argument fix	2016-05-11 15:21:29 +01:00
portelli	e3083b6dfc	Merge commit 'ab894186589224d570e0ecef8eea06443194a8ab'	2016-05-11 15:20:41 +01:00
paboyle	ab89418658	Precision change going in; useful for mixed precision algorithms for example.	2016-05-11 15:18:47 +01:00
paboyle	28cd99882c	Subslicing	2016-05-11 15:06:54 +01:00
paboyle	aceaee774c	ExtractSlice / InsertSlice for lower dimensional lattices where the lattice is not distributed in the orthogonal direction. Useful for fermion 4d/5d etc..	2016-05-11 14:12:02 +01:00
Peter Boyle	f8f9fd6f22	Merge pull request #33 from aportelli/master Travis for clang 3.8 + various updates/fixes	2016-05-05 22:57:13 +01:00
portelli	101aa769eb	LatticeBase contain the grid pointer and a virtual destructor to allow polymorphic lattice pointers	2016-05-04 12:15:31 -07:00
portelli	0bf99bfde5	log polish	2016-05-04 12:14:49 -07:00
portelli	64bf6fe54e	macro to dump NERSC header to a stream	2016-05-04 12:14:38 -07:00
portelli	1161d566b9	minor code cleaning	2016-05-02 19:32:11 -07:00
portelli	c698b16d75	function to generate Chroma-style gamma matrix products	2016-05-01 18:30:35 -07:00
portelli	c4c89336fe	SliceSum: shutting down warning about non-threaded code for now	2016-05-01 18:29:57 -07:00

1 2 3 4 5 ...

1788 Commits