portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2024-11-15 10:15:36 +00:00

Author	SHA1	Message	Date
paboyle	738c1a11c2	longer nloop	2017-04-26 08:43:20 +01:00
paboyle	ab66bac4e6	Think I'm getting on top of the reduced cost exterior precomputed list of links	2017-04-25 08:50:26 +01:00
paboyle	c429ace748	Cleaner OpenMP use	2017-04-22 20:28:42 +01:00
Peter Boyle	1d1b225497	Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).	2017-04-22 09:05:28 -04:00
paboyle	fc4ab9ccd5	Working half precision comms	2017-04-20 11:20:26 +01:00
paboyle	e099dcdae7	Merge branch 'develop' into feature/bgq-asm	2017-02-23 00:25:29 +00:00
paboyle	1a30455a10	1000 iters on bmark for more accurate timing	2017-02-20 17:47:01 -05:00
paboyle	aca7a3ef0a	Optimisation control improvements	2017-02-10 18:22:31 -05:00
paboyle	2bf4688e83	Running on BNL KNL	2017-02-07 01:32:10 -05:00
Antonin Portelli	a37e71f362	New automatic implementation of gamma matrices, Meson and SeqGamma are broken	2017-01-23 19:13:43 -08:00
paboyle	33dc1f51b5	Final sign off commits from Cori-1	2016-11-09 04:11:03 -08:00
paboyle	bb94ddd0eb	Tidy up of mpi3; also some cleaning of the dslash controls.	2016-11-02 08:07:09 +00:00
azusayamaguchi	b6a65059a2	Update to use shared memory to contain the stencil comms buffers Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions	2016-10-24 17:30:43 +01:00
azusayamaguchi	c190221fd3	Internal SHM comms in non-simd directions working Need to fix simd directions	2016-10-22 18:14:27 +01:00
paboyle	a762b1fb71	MPI3 working with a bounce through shared memory on my laptop. Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the send between ranks on same node.	2016-10-21 09:03:26 +01:00
azusayamaguchi	81f2aeaece	KNL streaming stores, and KNL performance coutners	2016-10-12 11:45:22 +01:00
Guido Cossu	2e453dfbf5	Added some instrumentation to benchmark the force computation	2016-10-06 17:52:45 +01:00
paboyle	4089984431	Timing hooks	2016-10-06 09:25:12 +01:00
paboyle	9db2c6525d	updating benchmarks for red black 4d for Ls vectorised code	2016-07-14 23:44:02 +01:00
paboyle	a0676beeb1	Open up dependency on Eigen and FFTW	2016-07-07 22:31:07 +01:00
Guido Cossu	5e02392f9c	Fixed compilation error for benchmark_dwf Some parts were assuming floating point precision	2016-06-20 12:30:51 +01:00
paboyle	55f65b81b5	Improvements to the assembler interface that let us move chunks of the site and s loop into the kernels. This will save on function call overhead and guarantee L2 prefetching strategy is right since OMP can't distribute the sub-chunks of work.	2016-06-09 01:12:36 -07:00
paboyle	8ac021de73	Added a test an fixed it for red black precon Ls innermost vectorised DWF	2016-06-07 13:16:56 -07:00
paboyle	786ca52c43	Problems remain in the red black preconditioning of the Ls vectorisation	2016-06-06 07:05:51 -07:00
paboyle	53d06046b0	Compiling updates for KNL	2016-06-03 03:47:54 -07:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00
paboyle	c77b7ee897	AddSub based alternate SU3 routine	2016-03-28 17:55:22 -06:00
paboyle	e17c773a0b	Longer runs for vtune	2016-03-16 02:29:13 -07:00
Peter Boyle	f7be108e35	100 iters faster	2016-02-15 16:03:04 -06:00
paboyle	fc6ad65751	Pushed the overlap comms tweaks	2016-01-11 06:34:22 -08:00
paboyle	02452afd36	Optional overlap of comms with compute	2016-01-04 14:18:40 +00:00
paboyle	aae8bf31a7	Global edit adding copyright and license info to every source file.	2016-01-02 14:51:32 +00:00
paboyle	3ce10aa975	Fix a regression failure on Mobius; chroma regression added	2015-12-10 22:55:00 +00:00
paboyle	1cc0d7b811	Bigger ncall as timing loops got small on cori	2015-11-07 00:04:40 -08:00
Peter Boyle	27813cf518	More timing detail reported	2015-11-06 05:27:13 -06:00
Peter Boyle	c26220e9ab	EO benchmark as well as non-eo	2015-11-04 09:54:48 +00:00
Peter Boyle	84a66476ab	Rework/global edit to enforce type templating of fermion operators. Allows multi-precision work and paves the way for alternate BC's and such like allowing for example G-parity which is important for K pipi programme. In particular, can drive an extra flavour index into the fermion fields using template types.	2015-08-10 20:47:44 +01:00
Peter Boyle	d1afebf71e	Sizable improvement in multigrid for unsquared. 6000 matmuls CG unprec 2000 matmuls CG prec (4000 eo muls) 1050 matmuls PGCR on 16^3 x 32 x 8 m=.01 Substantial effort on timing and logging infrastructure	2015-07-24 01:31:13 +09:00
Peter Boyle	638d2cda11	Change the SIMD command correctly with precision = double vs. single and connect the "Real" default precisoin to a configure flag. Have RealF, RealD and Real types, where Real is compile target dependent single/double, RealF is single and RealD is double etc..	2015-07-01 22:45:15 +01:00
Peter Boyle	8ad81bed32	big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to near the bleeding edge I guess	2015-06-30 15:01:44 +01:00
Peter Boyle	84b5c7217d	CG test written and passes i.e. converges with small true residual in RedBlack MpcDagMpc, Unprec MdagM and Schur red black solver for each of. DomainWallFermion MobiusFermion MobiusZolotarevFermion ScaledShamirFermion ScaledShamirZolotarevFermion	2015-06-03 10:54:03 +01:00

41 Commits