portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2024-11-15 02:05:37 +00:00

Author	SHA1	Message	Date
Guido Cossu	cff3bae155	Adding support for general Nc in the benchmark outputs	2018-01-25 13:46:31 +01:00
Peter Boyle	bfb68e6f02	Merge pull request #130 from giltirn/gparity-handunroll Gparity handunroll	2017-09-21 10:11:00 +01:00
Christopher Kelly	d36d2fb40d	Added ability to override default Ls in Benchmark_dwf	2017-08-28 06:53:56 -07:00
paboyle	ae56e556c6	finalise issue on new OPA revert	2017-08-20 02:53:12 +01:00
Peter Boyle	7d88198387	Merge branch 'develop' into feature/multi-communicator	2017-08-19 13:03:35 -04:00
Peter Boyle	14d53e1c9e	Threaded MPI calls patches	2017-07-29 13:08:10 -04:00
Peter Boyle	b73bd151bb	Switch off counters by default	2017-06-30 10:16:35 +01:00
Peter Boyle	694b305cab	Update to reporting	2017-06-30 10:16:13 +01:00
Guido Cossu	20999c1370	Merge branch 'develop' into feature/hmc_generalise	2017-05-05 12:47:17 +01:00
Peter Boyle	945767c6d8	More info	2017-05-03 20:26:35 -04:00
Peter Boyle	92e364a35f	Better reporting in benchmark for MPI3	2017-05-03 15:43:36 -04:00
Guido Cossu	4063238943	Adding HMC test file example for Mobius + smearing	2017-05-01 13:44:00 +01:00
Guido Cossu	3344788fa1	Merge branch 'develop' into feature/hmc_generalise	2017-05-01 12:13:56 +01:00
paboyle	738c1a11c2	longer nloop	2017-04-26 08:43:20 +01:00
paboyle	ab66bac4e6	Think I'm getting on top of the reduced cost exterior precomputed list of links	2017-04-25 08:50:26 +01:00
paboyle	c429ace748	Cleaner OpenMP use	2017-04-22 20:28:42 +01:00
Peter Boyle	1d1b225497	Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).	2017-04-22 09:05:28 -04:00
paboyle	fc4ab9ccd5	Working half precision comms	2017-04-20 11:20:26 +01:00
Guido Cossu	8c540333d5	Merge branch 'develop' into feature/hmc_generalise	2017-04-05 14:41:04 +01:00
paboyle	e099dcdae7	Merge branch 'develop' into feature/bgq-asm	2017-02-23 00:25:29 +00:00
paboyle	1a30455a10	1000 iters on bmark for more accurate timing	2017-02-20 17:47:01 -05:00
paboyle	aca7a3ef0a	Optimisation control improvements	2017-02-10 18:22:31 -05:00
Guido Cossu	8b6a6c8236	Resolving small merge conflict	2017-02-09 16:20:24 +00:00
Guido Cossu	e0571c872b	Merge branch 'develop' into feature/hmc_generalise	2017-02-09 16:12:00 +00:00
paboyle	2bf4688e83	Running on BNL KNL	2017-02-07 01:32:10 -05:00
Antonin Portelli	a37e71f362	New automatic implementation of gamma matrices, Meson and SeqGamma are broken	2017-01-23 19:13:43 -08:00
Guido Cossu	0bd296dda4	Adding check of the Dag part in the benchmark	2016-12-14 03:15:09 +00:00
paboyle	33dc1f51b5	Final sign off commits from Cori-1	2016-11-09 04:11:03 -08:00
paboyle	bb94ddd0eb	Tidy up of mpi3; also some cleaning of the dslash controls.	2016-11-02 08:07:09 +00:00
azusayamaguchi	b6a65059a2	Update to use shared memory to contain the stencil comms buffers Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions	2016-10-24 17:30:43 +01:00
azusayamaguchi	c190221fd3	Internal SHM comms in non-simd directions working Need to fix simd directions	2016-10-22 18:14:27 +01:00
paboyle	a762b1fb71	MPI3 working with a bounce through shared memory on my laptop. Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the send between ranks on same node.	2016-10-21 09:03:26 +01:00
azusayamaguchi	81f2aeaece	KNL streaming stores, and KNL performance coutners	2016-10-12 11:45:22 +01:00
Guido Cossu	2e453dfbf5	Added some instrumentation to benchmark the force computation	2016-10-06 17:52:45 +01:00
paboyle	4089984431	Timing hooks	2016-10-06 09:25:12 +01:00
paboyle	9db2c6525d	updating benchmarks for red black 4d for Ls vectorised code	2016-07-14 23:44:02 +01:00
paboyle	a0676beeb1	Open up dependency on Eigen and FFTW	2016-07-07 22:31:07 +01:00
Guido Cossu	5e02392f9c	Fixed compilation error for benchmark_dwf Some parts were assuming floating point precision	2016-06-20 12:30:51 +01:00
paboyle	55f65b81b5	Improvements to the assembler interface that let us move chunks of the site and s loop into the kernels. This will save on function call overhead and guarantee L2 prefetching strategy is right since OMP can't distribute the sub-chunks of work.	2016-06-09 01:12:36 -07:00
paboyle	8ac021de73	Added a test an fixed it for red black precon Ls innermost vectorised DWF	2016-06-07 13:16:56 -07:00
paboyle	786ca52c43	Problems remain in the red black preconditioning of the Ls vectorisation	2016-06-06 07:05:51 -07:00
paboyle	53d06046b0	Compiling updates for KNL	2016-06-03 03:47:54 -07:00
paboyle	139cc5f1ae	Large change with KNL preparation	2016-06-03 03:24:26 -07:00
paboyle	c77b7ee897	AddSub based alternate SU3 routine	2016-03-28 17:55:22 -06:00
paboyle	e17c773a0b	Longer runs for vtune	2016-03-16 02:29:13 -07:00
Peter Boyle	f7be108e35	100 iters faster	2016-02-15 16:03:04 -06:00
paboyle	fc6ad65751	Pushed the overlap comms tweaks	2016-01-11 06:34:22 -08:00
paboyle	02452afd36	Optional overlap of comms with compute	2016-01-04 14:18:40 +00:00
paboyle	aae8bf31a7	Global edit adding copyright and license info to every source file.	2016-01-02 14:51:32 +00:00
paboyle	3ce10aa975	Fix a regression failure on Mobius; chroma regression added	2015-12-10 22:55:00 +00:00

1 2

58 Commits