portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-05-21 01:24:16 +01:00

Author	SHA1	Message	Date
Peter Boyle	3d21297bbb	Call the fast path compressor for wilson kernels to avoid if else on projector	2016-12-27 11:23:13 +00:00
Peter Boyle	25efefc5b4	Back to original thread policy post test	2016-12-23 09:49:04 +00:00
Peter Boyle	b8cdb3e90a	Debug hack; raises from 62GF/s to 72 GF/s per node on BG/Q	2016-12-22 17:50:14 +00:00
Peter Boyle	fb8d4b2357	Lots of debug on performance Mobius	2016-12-08 17:28:28 +00:00
Guido Cossu	ae9688e343	Reporting also the total mflops	2016-11-28 11:37:02 +00:00
portelli	ca21003f01	Merge branch 'feature/fft-opt' into feature/feynman-rules # Conflicts: # lib/FFT.h # lib/qcd/action/fermion/WilsonFermion5D.h # tests/core/Test_fft.cc	2016-10-26 18:44:47 +01:00
azusayamaguchi	c190221fd3	Internal SHM comms in non-simd directions working Need to fix simd directions	2016-10-22 18:14:27 +01:00
azusayamaguchi	6a9eae6b6b	Reporting improvements	2016-10-21 13:36:18 +01:00
portelli	997fd882ff	Merge branch 'develop' into feature/feynman-rules # Conflicts: # lib/Threads.h # lib/qcd/action/fermion/WilsonFermion.cc # lib/qcd/action/fermion/WilsonFermion.h # lib/qcd/utils/SUn.h # lib/simd/Grid_avx.h # lib/simd/Intel512common.h	2016-10-19 18:35:18 +01:00
azusayamaguchi	81f2aeaece	KNL streaming stores, and KNL performance coutners	2016-10-12 11:45:22 +01:00
paboyle	96f1d1b828	Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass).	2016-10-10 23:46:45 +01:00
Guido Cossu	2e453dfbf5	Added some instrumentation to benchmark the force computation	2016-10-06 17:52:45 +01:00
paboyle	4089984431	Timing hooks	2016-10-06 09:25:12 +01:00
paboyle	b6713ecb60	Momentum space rules for Overlap, DWF untested to date	2016-09-26 09:39:09 +01:00
paboyle	48fb1cdc11	Update domain 5d vectorised impl type, move the type over to 4d redblack with the dense OO inverse	2016-07-14 23:54:35 +01:00
paboyle	6d58cb2a68	Enable reordering of the loops in the assembler for cache friendly. This gets in the way of L2 prefetching however. Do next next link in stencil prefetching.	2016-06-30 14:35:01 -07:00
paboyle	55f65b81b5	Improvements to the assembler interface that let us move chunks of the site and s loop into the kernels. This will save on function call overhead and guarantee L2 prefetching strategy is right since OMP can't distribute the sub-chunks of work.	2016-06-09 01:12:36 -07:00
paboyle	8ac021de73	Added a test an fixed it for red black precon Ls innermost vectorised DWF	2016-06-07 13:16:56 -07:00
paboyle	53d06046b0	Compiling updates for KNL	2016-06-03 03:47:54 -07:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00
paboyle	1e554350ac	The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug.	2016-04-29 16:49:18 -07:00
paboyle	9b6ab6db16	simd in 5th dimension support	2016-04-19 15:38:01 -07:00
paboyle	e8dddb1596	Adding extra benchmark	2016-04-06 10:32:54 +01:00
paboyle	e67fc2be18	Adding a trial for openmp overhead minimisation	2016-03-31 16:00:37 +01:00
paboyle	165bffc2e7	Avx512 changes for assembler kernels	2016-03-26 22:25:45 -06:00
paboyle	090e7aa930	Merge remote-tracking branch 'origin/chulwoo-dec12-2015' Merge Chulwoo's Lanczos related improvements. Merge Nd!=4 fixes for pure gauge HMC from Evan.	2016-03-08 09:55:14 +00:00
paboyle	61413565d0	Back off the inlined spin proj as not working	2016-03-02 07:03:09 -08:00
Peter Boyle	6aeaf6f568	Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then turned up problems on the BlueWaters Cray. Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel, and also to look at aggregating bigger writes for the parallel write. Not sure what the home filesystem is.	2016-02-21 08:03:21 -06:00
Peter Boyle	22422a84d9	Small problem in compressor fix	2016-02-17 19:03:09 -06:00
Peter Boyle	c9fadf97a5	Simplify the compressor interface again.	2016-02-17 18:16:45 -06:00
Peter Boyle	81395e85d1	Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it.	2016-02-16 13:56:44 -06:00
Peter Boyle	a0fc47c6f9	Cheaper implementation	2016-02-15 16:02:36 -06:00
paboyle	e2f73e3ead	Updates for shmem	2016-02-10 16:50:32 -08:00
Jung	411ac49dd7	GparityWilsonTM typedef added. Not yet tested Conflicts: configure lib/qcd/action/fermion/WilsonKernels.h	2016-01-25 01:36:28 -05:00
Jung	5c57d4f403	Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2 Conflicts: lib/qcd/action/fermion/WilsonKernels.h	2016-01-11 11:36:45 -05:00
paboyle	fc6ad65751	Pushed the overlap comms tweaks	2016-01-11 06:34:22 -08:00
paboyle	dafc74020c	Overlap comms compute improvements in hand op kernels, and better timing from Edison and Cori	2016-01-10 16:54:27 -08:00
paboyle	d19321dfde	Overlap comms compute changes	2016-01-10 19:20:16 +00:00
Jung	5924e5a562	Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2 Conflicts: configure lib/qcd/action/Actions.h lib/qcd/action/fermion/WilsonKernels.h	2016-01-06 03:44:57 -05:00
paboyle	c99d748da6	Timing reports in benchmarks now reflect the asynch comms thread statistics	2016-01-04 14:42:16 +00:00
paboyle	02452afd36	Optional overlap of comms with compute	2016-01-04 14:18:40 +00:00
paboyle	331768dcff	Added overlap comms compute mode	2016-01-03 01:38:11 +00:00
paboyle	aae8bf31a7	Global edit adding copyright and license info to every source file.	2016-01-02 14:51:32 +00:00
paboyle	34a0fde2ad	Fixes to fermion force terms after sign of gamma_mu (0...3) change. Thought I had already committed these. Believe I have got the Gparity fermion force working. * tests/Test_gpdwf_force.cc -- correctly predicts dS for two flavour pseudofermion based on a small dt update of U field. * tests/Test_hmc_EODWFRatio_Gparity.cc -- ran 1 trajectory on 8^4 with dH=0.21. Need to accumulate a full plaquette log to believe fully which will take some hours of run time.	2015-12-15 23:14:12 +00:00
Jung	f2b4edc090	Fixes for Gparity comparison with CPS (Instantiation, Gamma matrix convention)	2015-12-07 02:04:57 -05:00
paboyle	b2c02a6106	Runs fastst on cori	2015-11-28 16:58:16 -08:00
paboyle	e9ff25b06b	Small threading change makes a difference on Cori.	2015-11-07 00:07:05 -08:00
paboyle	899ca41cb8	Merge branch 'master' of github.com:paboyle/Grid Conflicts: lib/qcd/action/fermion/WilsonFermion5D.cc	2015-11-06 03:50:04 -08:00
paboyle	a2ff068e29	Asm and threading for many core	2015-11-06 03:47:14 -08:00
Peter Boyle	28022755ae	Stencil class name global change to StencilImpl typedef	2015-11-06 05:30:17 -06:00

1 2

67 Commits