portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2025-06-19 16:27:05 +01:00

Author	SHA1	Message	Date
Peter Boyle	99220f6531	Fixes and better timing	2017-04-26 17:24:11 -04:00
Peter Boyle	fd1eb7de13	Clean implementation of the exterior faces listing only those points on the boudary	2017-04-26 02:34:52 -04:00
paboyle	ab66bac4e6	Think I'm getting on top of the reduced cost exterior precomputed list of links	2017-04-25 08:50:26 +01:00
paboyle	56277a11c8	Build a list of whats on the surface	2017-04-24 17:06:15 +01:00
Peter Boyle	e3d0e31525	Debugged assemply split phase with interior suppression	2017-04-23 19:29:27 -04:00
paboyle	b722889234	Try a better load balancing loop	2017-04-22 19:27:41 +01:00
paboyle	736bf3c866	Major rework of stencil. Half precision and MPI3 now working.	2017-04-22 11:33:50 +01:00
paboyle	fc4ab9ccd5	Working half precision comms	2017-04-20 11:20:26 +01:00
paboyle	4a340aa5ca	Massive compressor rework to support reduced precision comms	2017-04-20 09:28:27 +01:00
paboyle	4b17e8eba8	Merge branch 'develop' into feature/bgq-asm Conflicts: lib/qcd/action/fermion/Fermion.h lib/qcd/action/fermion/WilsonFermion.cc lib/util/Init.cc tests/Test_cayley_even_odd_vec.cc	2017-03-28 04:49:30 -04:00
paboyle	18bde08d1b	Merge branch 'feature/staggering' into develop	2017-03-28 15:25:55 +09:00
paboyle	af230a1fb8	Average the time across the whole machine for outliers	2017-02-28 17:05:22 -05:00
paboyle	e099dcdae7	Merge branch 'develop' into feature/bgq-asm	2017-02-23 00:25:29 +00:00
paboyle	4e7ab3166f	Refactoring header layout	2017-02-22 18:09:33 +00:00
paboyle	3ae92fa2e6	Global changes to parallel_for structure. Move the comms flags to more sensible names	2017-02-21 05:24:27 -05:00
paboyle	2c246551d0	Overlap comms and compute options in wilson kernels	2017-02-07 01:37:10 -05:00
Antonin Portelli	a0cfbb6e88	Merge branch 'feature/gammas' into feature/hadrons # Conflicts: # .gitignore # lib/qcd/spin/Dirac.cc # scripts/filelist	2017-01-30 09:10:49 -08:00
Antonin Portelli	fad743fbb1	Build system sanity check: corrected several headers not in the <Grid/*> format	2017-01-26 17:00:41 -08:00
Antonin Portelli	a37e71f362	New automatic implementation of gamma matrices, Meson and SeqGamma are broken	2017-01-23 19:13:43 -08:00
Peter Boyle	03c81bd902	Merge branch 'feature/bgq-asm' of https://github.com/paboyle/Grid into feature/bgq-asm	2016-12-27 11:25:35 +00:00
Peter Boyle	a869addef1	Stats switch off	2016-12-27 11:25:22 +00:00
Peter Boyle	3d21297bbb	Call the fast path compressor for wilson kernels to avoid if else on projector	2016-12-27 11:23:13 +00:00
Peter Boyle	25efefc5b4	Back to original thread policy post test	2016-12-23 09:49:04 +00:00
Peter Boyle	b8cdb3e90a	Debug hack; raises from 62GF/s to 72 GF/s per node on BG/Q	2016-12-22 17:50:14 +00:00
azusayamaguchi	eabc577940	Assembler possibly working	2016-12-16 16:55:36 +00:00
Peter Boyle	fb8d4b2357	Lots of debug on performance Mobius	2016-12-08 17:28:28 +00:00
Guido Cossu	ae9688e343	Reporting also the total mflops	2016-11-28 11:37:02 +00:00
Antonin Portelli	ca21003f01	Merge branch 'feature/fft-opt' into feature/feynman-rules # Conflicts: # lib/FFT.h # lib/qcd/action/fermion/WilsonFermion5D.h # tests/core/Test_fft.cc	2016-10-26 18:44:47 +01:00
azusayamaguchi	c190221fd3	Internal SHM comms in non-simd directions working Need to fix simd directions	2016-10-22 18:14:27 +01:00
azusayamaguchi	6a9eae6b6b	Reporting improvements	2016-10-21 13:36:18 +01:00
Antonin Portelli	997fd882ff	Merge branch 'develop' into feature/feynman-rules # Conflicts: # lib/Threads.h # lib/qcd/action/fermion/WilsonFermion.cc # lib/qcd/action/fermion/WilsonFermion.h # lib/qcd/utils/SUn.h # lib/simd/Grid_avx.h # lib/simd/Intel512common.h	2016-10-19 18:35:18 +01:00
azusayamaguchi	81f2aeaece	KNL streaming stores, and KNL performance coutners	2016-10-12 11:45:22 +01:00
paboyle	96f1d1b828	Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass).	2016-10-10 23:46:45 +01:00
Guido Cossu	2e453dfbf5	Added some instrumentation to benchmark the force computation	2016-10-06 17:52:45 +01:00
paboyle	4089984431	Timing hooks	2016-10-06 09:25:12 +01:00
paboyle	b6713ecb60	Momentum space rules for Overlap, DWF untested to date	2016-09-26 09:39:09 +01:00
paboyle	48fb1cdc11	Update domain 5d vectorised impl type, move the type over to 4d redblack with the dense OO inverse	2016-07-14 23:54:35 +01:00
paboyle	6d58cb2a68	Enable reordering of the loops in the assembler for cache friendly. This gets in the way of L2 prefetching however. Do next next link in stencil prefetching.	2016-06-30 14:35:01 -07:00
paboyle	55f65b81b5	Improvements to the assembler interface that let us move chunks of the site and s loop into the kernels. This will save on function call overhead and guarantee L2 prefetching strategy is right since OMP can't distribute the sub-chunks of work.	2016-06-09 01:12:36 -07:00
paboyle	8ac021de73	Added a test an fixed it for red black precon Ls innermost vectorised DWF	2016-06-07 13:16:56 -07:00
paboyle	53d06046b0	Compiling updates for KNL	2016-06-03 03:47:54 -07:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00
paboyle	1e554350ac	The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug.	2016-04-29 16:49:18 -07:00
paboyle	9b6ab6db16	simd in 5th dimension support	2016-04-19 15:38:01 -07:00
paboyle	e8dddb1596	Adding extra benchmark	2016-04-06 10:32:54 +01:00
paboyle	e67fc2be18	Adding a trial for openmp overhead minimisation	2016-03-31 16:00:37 +01:00
paboyle	165bffc2e7	Avx512 changes for assembler kernels	2016-03-26 22:25:45 -06:00
paboyle	090e7aa930	Merge remote-tracking branch 'origin/chulwoo-dec12-2015' Merge Chulwoo's Lanczos related improvements. Merge Nd!=4 fixes for pure gauge HMC from Evan.	2016-03-08 09:55:14 +00:00
paboyle	61413565d0	Back off the inlined spin proj as not working	2016-03-02 07:03:09 -08:00
Peter Boyle	6aeaf6f568	Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then turned up problems on the BlueWaters Cray. Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel, and also to look at aggregating bigger writes for the parallel write. Not sure what the home filesystem is.	2016-02-21 08:03:21 -06:00

1 2

89 Commits