portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2024-11-14 01:35:36 +00:00

Author	SHA1	Message	Date
azusayamaguchi	eabc577940	Assembler possibly working	2016-12-16 16:55:36 +00:00
Guido Cossu	ae9688e343	Reporting also the total mflops	2016-11-28 11:37:02 +00:00
Antonin Portelli	ca21003f01	Merge branch 'feature/fft-opt' into feature/feynman-rules # Conflicts: # lib/FFT.h # lib/qcd/action/fermion/WilsonFermion5D.h # tests/core/Test_fft.cc	2016-10-26 18:44:47 +01:00
azusayamaguchi	c190221fd3	Internal SHM comms in non-simd directions working Need to fix simd directions	2016-10-22 18:14:27 +01:00
azusayamaguchi	6a9eae6b6b	Reporting improvements	2016-10-21 13:36:18 +01:00
Antonin Portelli	997fd882ff	Merge branch 'develop' into feature/feynman-rules # Conflicts: # lib/Threads.h # lib/qcd/action/fermion/WilsonFermion.cc # lib/qcd/action/fermion/WilsonFermion.h # lib/qcd/utils/SUn.h # lib/simd/Grid_avx.h # lib/simd/Intel512common.h	2016-10-19 18:35:18 +01:00
azusayamaguchi	81f2aeaece	KNL streaming stores, and KNL performance coutners	2016-10-12 11:45:22 +01:00
paboyle	96f1d1b828	Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass).	2016-10-10 23:46:45 +01:00
Guido Cossu	2e453dfbf5	Added some instrumentation to benchmark the force computation	2016-10-06 17:52:45 +01:00
paboyle	4089984431	Timing hooks	2016-10-06 09:25:12 +01:00
paboyle	b6713ecb60	Momentum space rules for Overlap, DWF untested to date	2016-09-26 09:39:09 +01:00
paboyle	48fb1cdc11	Update domain 5d vectorised impl type, move the type over to 4d redblack with the dense OO inverse	2016-07-14 23:54:35 +01:00
paboyle	6d58cb2a68	Enable reordering of the loops in the assembler for cache friendly. This gets in the way of L2 prefetching however. Do next next link in stencil prefetching.	2016-06-30 14:35:01 -07:00
paboyle	55f65b81b5	Improvements to the assembler interface that let us move chunks of the site and s loop into the kernels. This will save on function call overhead and guarantee L2 prefetching strategy is right since OMP can't distribute the sub-chunks of work.	2016-06-09 01:12:36 -07:00
paboyle	8ac021de73	Added a test an fixed it for red black precon Ls innermost vectorised DWF	2016-06-07 13:16:56 -07:00
paboyle	53d06046b0	Compiling updates for KNL	2016-06-03 03:47:54 -07:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00
paboyle	1e554350ac	The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug.	2016-04-29 16:49:18 -07:00
paboyle	9b6ab6db16	simd in 5th dimension support	2016-04-19 15:38:01 -07:00
paboyle	e8dddb1596	Adding extra benchmark	2016-04-06 10:32:54 +01:00
paboyle	e67fc2be18	Adding a trial for openmp overhead minimisation	2016-03-31 16:00:37 +01:00
paboyle	165bffc2e7	Avx512 changes for assembler kernels	2016-03-26 22:25:45 -06:00
paboyle	090e7aa930	Merge remote-tracking branch 'origin/chulwoo-dec12-2015' Merge Chulwoo's Lanczos related improvements. Merge Nd!=4 fixes for pure gauge HMC from Evan.	2016-03-08 09:55:14 +00:00
paboyle	61413565d0	Back off the inlined spin proj as not working	2016-03-02 07:03:09 -08:00
Peter Boyle	6aeaf6f568	Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then turned up problems on the BlueWaters Cray. Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel, and also to look at aggregating bigger writes for the parallel write. Not sure what the home filesystem is.	2016-02-21 08:03:21 -06:00
Peter Boyle	22422a84d9	Small problem in compressor fix	2016-02-17 19:03:09 -06:00
Peter Boyle	c9fadf97a5	Simplify the compressor interface again.	2016-02-17 18:16:45 -06:00
Peter Boyle	81395e85d1	Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it.	2016-02-16 13:56:44 -06:00
Peter Boyle	a0fc47c6f9	Cheaper implementation	2016-02-15 16:02:36 -06:00
paboyle	e2f73e3ead	Updates for shmem	2016-02-10 16:50:32 -08:00
Jung	411ac49dd7	GparityWilsonTM typedef added. Not yet tested Conflicts: configure lib/qcd/action/fermion/WilsonKernels.h	2016-01-25 01:36:28 -05:00
Jung	5c57d4f403	Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2 Conflicts: lib/qcd/action/fermion/WilsonKernels.h	2016-01-11 11:36:45 -05:00
paboyle	fc6ad65751	Pushed the overlap comms tweaks	2016-01-11 06:34:22 -08:00
paboyle	dafc74020c	Overlap comms compute improvements in hand op kernels, and better timing from Edison and Cori	2016-01-10 16:54:27 -08:00
paboyle	d19321dfde	Overlap comms compute changes	2016-01-10 19:20:16 +00:00
Jung	5924e5a562	Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2 Conflicts: configure lib/qcd/action/Actions.h lib/qcd/action/fermion/WilsonKernels.h	2016-01-06 03:44:57 -05:00
paboyle	c99d748da6	Timing reports in benchmarks now reflect the asynch comms thread statistics	2016-01-04 14:42:16 +00:00
paboyle	02452afd36	Optional overlap of comms with compute	2016-01-04 14:18:40 +00:00
paboyle	331768dcff	Added overlap comms compute mode	2016-01-03 01:38:11 +00:00
paboyle	aae8bf31a7	Global edit adding copyright and license info to every source file.	2016-01-02 14:51:32 +00:00
paboyle	34a0fde2ad	Fixes to fermion force terms after sign of gamma_mu (0...3) change. Thought I had already committed these. Believe I have got the Gparity fermion force working. * tests/Test_gpdwf_force.cc -- correctly predicts dS for two flavour pseudofermion based on a small dt update of U field. * tests/Test_hmc_EODWFRatio_Gparity.cc -- ran 1 trajectory on 8^4 with dH=0.21. Need to accumulate a full plaquette log to believe fully which will take some hours of run time.	2015-12-15 23:14:12 +00:00
Jung	f2b4edc090	Fixes for Gparity comparison with CPS (Instantiation, Gamma matrix convention)	2015-12-07 02:04:57 -05:00
paboyle	b2c02a6106	Runs fastst on cori	2015-11-28 16:58:16 -08:00
paboyle	e9ff25b06b	Small threading change makes a difference on Cori.	2015-11-07 00:07:05 -08:00
paboyle	899ca41cb8	Merge branch 'master' of github.com:paboyle/Grid Conflicts: lib/qcd/action/fermion/WilsonFermion5D.cc	2015-11-06 03:50:04 -08:00
paboyle	a2ff068e29	Asm and threading for many core	2015-11-06 03:47:14 -08:00
Peter Boyle	28022755ae	Stencil class name global change to StencilImpl typedef	2015-11-06 05:30:17 -06:00
Peter Boyle	55cfc89459	* Finished the template/policy style introduction of gparity, except the gparity force terms. So valence sector looks ok. FermionOperatorImpl.h provides the policy classes. Expect HMC will introduce a smearing policy and a fermion representation change policy template param. Will also probably need multi-precision work. * HMC is running even-odd and non-checkerboarded (checked 4^4 wilson fermion/wilson gauge). There appears to be a bug in the multi-level integrator -- <e-dH> passes with single level but not with multi-level. In any case there looks to be quite a bit to clean up. This is the "const det" style implementation that is not appropriate yet for clover since it assumes that Mee is indept of the gauge fields. Easily fixed in future.	2015-08-15 23:25:49 +01:00
Peter Boyle	7d3512ab21	Gparity valence test now working. Interface in FermionOperator will change a lot in future	2015-08-14 00:01:04 +01:00
Peter Boyle	84a66476ab	Rework/global edit to enforce type templating of fermion operators. Allows multi-precision work and paves the way for alternate BC's and such like allowing for example G-parity which is important for K pipi programme. In particular, can drive an extra flavour index into the fermion fields using template types.	2015-08-10 20:47:44 +01:00

1 2

64 Commits