Peter Boyle
b57a4d32aa
Merge branch 'develop' into feature/gpu-port
2018-12-13 05:11:34 +00:00
0ba3d469c7
Benchmark IO in single and double precision
2018-10-17 20:27:34 +01:00
291bc2a1f0
IO benchmark on a list of directories
2018-10-15 17:25:08 +01:00
Peter Boyle
adbdc4e65b
Half comms not working on GPU yet, so disable.
2018-09-11 05:15:22 +01:00
Peter Boyle
f4bfeb835d
Drop back to smaller Ls
2018-09-09 14:25:06 +01:00
a15a2dfd29
Merge branch 'develop' into feature/hadrons
2018-08-10 16:08:22 +01:00
paboyle
27cdb79063
Sha used to seed from a unique string
2018-08-10 15:11:01 +01:00
Peter Boyle
00b92a91b5
Optimising
2018-07-28 23:46:22 +01:00
paboyle
65533741f7
7 moms
2018-07-28 16:17:47 +01:00
Peter Boyle
131a6785d4
Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a
2018-07-27 23:03:42 +01:00
paboyle
44f4f5c8e2
Momentum loop
2018-07-27 23:00:16 +01:00
fionnoh
2679df034f
Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute
...
lhs^dag*Gamma*rhs (previously Gamma*lhs^dag*rhs), and checks results.
2018-07-27 18:31:10 +01:00
paboyle
71e1006ba8
Updated meson field benchmark for dirac structures
2018-07-26 09:09:29 +01:00
fionnoh
24128ff109
Changes needed for MF benchmark to work with comms correctly
2018-07-23 15:51:37 +01:00
Peter Boyle
21a1710b43
Verbose vector length
2018-07-23 06:08:39 -04:00
paboyle
ec9939c1ba
Test for faster implementation of meson field inner loop
...
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.
It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.
Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.
This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
Peter Boyle
4b04ae3611
Printing improvement
2018-07-05 06:59:38 -04:00
Peter Boyle
2f776d51c6
Gpu specific benchmark saturates memory. Can enhance Grid to do this for expressions,
...
but a bitof (known) work.
2018-07-05 06:58:37 -04:00
paboyle
25becc9324
GPU tweaks for benchmarking; really necessary?
2018-06-13 20:26:07 +01:00
Peter Boyle
eb921041d0
Perf count control
2018-05-12 17:57:32 -04:00
bfbf2f1fa0
no threaded stencil benchmark if OpenMP is not supported
2018-05-03 16:20:01 +01:00
Dr Peter Boyle
1dddd17e3c
Benchmark improvements from tesseract
2018-04-27 11:44:46 +01:00
Peter Boyle
fa0d8feff4
Performance of CovariantCshift now non-embarrassing.
2018-04-26 17:56:27 +01:00
Peter Boyle
05b44aef6b
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
...
Conflicts:
benchmarks/Benchmark_su3.cc
2018-04-26 15:38:49 +01:00
Peter Boyle
91a0a3f820
Improvement
2018-04-26 14:48:35 +01:00
Peter Boyle
8f44c799a6
Saving the benchmarking tests for Cshift
2018-04-26 14:48:03 +01:00
Guido Cossu
43f5a0df50
More timers in the integrator
2018-04-26 12:01:56 +09:00
paboyle
2baf193031
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2018-04-25 00:14:03 +01:00
paboyle
362ba0443a
Cshift updates
2018-04-25 00:12:11 +01:00
Guido Cossu
c5b9147b53
Correction of a minor bug in the su3 benchmark
2018-04-24 08:03:57 -07:00
Guido Cossu
a1be533329
Corrected Flop count in Benchmark su3 and expanded the Wilson flow output
2018-04-24 01:19:53 -07:00
paboyle
b5510427f9
physical fermion interface, cshift benchmark in SU3.
2018-04-18 01:43:29 +01:00
paboyle
276f113f28
IO uses master boss node for metadata.
2018-03-30 16:17:05 +01:00
paboyle
ab6afd18ac
Still compile if no LIME
2018-03-30 13:39:20 +01:00
c5a885dcd6
I/O benchmark
2018-03-29 19:57:41 +01:00
Peter Boyle
6fe9b28a82
Cosmetic
2018-03-24 19:27:14 -04:00
Peter Boyle
b002587d7c
Simplify
2018-03-24 19:26:44 -04:00
Peter Boyle
6c08385782
Simplify
2018-03-24 19:26:19 -04:00
Peter Boyle
a3690071b4
Warm up GPu
2018-03-22 18:05:20 -04:00
Peter Boyle
5ac96dbdc6
Warm behaviour in SU3 benchmark
2018-03-20 07:18:31 -04:00
paboyle
aead94e9a7
View introduced
2018-03-04 16:39:29 +00:00
paboyle
36ea5f6b77
gpu friendly coordinates ; no std::vector on GPU
2018-02-24 22:20:14 +00:00
Guido Cossu
fb24e3a7d2
Adding utilities for perf profiling
2018-01-29 11:11:45 +01:00
paboyle
604c05f4b8
parallel_for elimination -> thread_loop
2018-01-28 01:01:36 +00:00
paboyle
ce4da83bc2
Zero changes, literally
2018-01-27 23:51:10 +00:00
paboyle
c4f82e072b
_grid becomes private ; use Grid()§
2018-01-27 00:04:12 +00:00
paboyle
2a4a0e43c1
Hide internals
2018-01-26 23:08:27 +00:00
paboyle
f4010023ca
Warning fixes
2018-01-25 23:46:47 +00:00
paboyle
e7cba358c2
Temporary update to reflect the new dropping of std::vector in Lattice
...
Will update again to hide the internals in an interface
2018-01-25 23:31:41 +00:00
Guido Cossu
cff3bae155
Adding support for general Nc in the benchmark outputs
2018-01-25 13:46:31 +01:00
paboyle
918c105c57
NVCC warning elimination
2018-01-24 13:23:59 +00:00
paboyle
d74c21a386
GLobal edit for QCD namespace removal & NAMESPACE macros
2018-01-15 09:37:58 +00:00
paboyle
9b32d51cd1
Simplify comms layer proliferatoin
2018-01-08 11:27:14 +00:00
paboyle
4f8b6f26b4
Merge branch 'develop' into feature/dwf-multirhs
2017-10-02 11:41:49 +01:00
Peter Boyle
bfb68e6f02
Merge pull request #130 from giltirn/gparity-handunroll
...
Gparity handunroll
2017-09-21 10:11:00 +01:00
paboyle
17c5b0f152
Patching comparison point
2017-09-16 18:18:07 +01:00
Peter Boyle
b331be9101
Better reporting
2017-08-31 11:32:57 +01:00
Peter Boyle
49c20a9fa8
Patch to reporting
2017-08-31 11:32:21 +01:00
paboyle
7359df3501
Full reporting for benchmark; save robustness factor
2017-08-31 10:42:35 +01:00
Christopher Kelly
d36d2fb40d
Added ability to override default Ls in Benchmark_dwf
2017-08-28 06:53:56 -07:00
Peter Boyle
5b9267e88d
Cleaner comms benchmark treatment for one node runs
2017-08-27 18:24:48 -04:00
paboyle
15fd4003ef
Improving presentation of results
2017-08-27 13:46:02 +01:00
paboyle
ad89abb018
Fix
2017-08-25 20:43:37 +01:00
paboyle
80c5bce5bb
Merge branch 'develop' into feature/multi-communicator
2017-08-25 20:21:26 +01:00
Peter Boyle
d0f3d525d5
Optimal block size for KNL
2017-08-25 19:33:54 +01:00
Peter Boyle
3a58217405
Updated
2017-08-25 14:29:53 +01:00
Peter Boyle
c289699d9a
updated from cambridge mpi3 shakeout
2017-08-25 11:41:01 +01:00
Peter Boyle
c3b1263e75
Benchmark prep
2017-08-25 09:25:54 +01:00
Christopher Kelly
edabb3577f
Imported Benchmark_gparity
2017-08-23 16:54:06 -04:00
paboyle
ae56e556c6
finalise issue on new OPA revert
2017-08-20 02:53:12 +01:00
paboyle
383ca7d392
Switch off comms for now until feature/multi-communicator is merged
2017-08-20 01:27:48 +01:00
paboyle
a446d95c33
Trying to pass TeamCity and Travis
2017-08-20 01:10:50 +01:00
paboyle
be66e7dd95
Merge branch 'develop' into feature/multi-communicator
2017-08-19 23:12:38 +01:00
paboyle
bfef525ed2
New benchmark prep
2017-08-19 23:10:12 +01:00
Peter Boyle
7d88198387
Merge branch 'develop' into feature/multi-communicator
2017-08-19 13:03:35 -04:00
Peter Boyle
9e658de238
Use Vector
2017-08-19 12:52:44 -04:00
Peter Boyle
14d53e1c9e
Threaded MPI calls patches
2017-07-29 13:08:10 -04:00
Peter Boyle
40e119c61c
NUMA improvements worth preserving from AMD EPYC tests
2017-07-08 22:27:11 -04:00
Peter Boyle
b73bd151bb
Switch off counters by default
2017-06-30 10:16:35 +01:00
Peter Boyle
694b305cab
Update to reporting
2017-06-30 10:16:13 +01:00
paboyle
6f5a5cd9b3
Improved threaded comms benchmark
2017-06-28 23:27:02 +01:00
Peter Boyle
08e04b9676
Better benchmarks
2017-06-28 15:30:06 +01:00
paboyle
54e94360ad
Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit
2017-06-24 23:10:24 +01:00
paboyle
6ebf9f15b7
Splitting communicators first cut
2017-06-22 08:14:34 +01:00
paboyle
3bfd1f13e6
I/O improvements
2017-06-11 23:14:10 +01:00
Peter Boyle
725c513d94
Better MPI3 benchmarking
2017-05-29 16:47:32 -04:00
Guido Cossu
0ffc235741
Adding more statistics to the Benchmark_comms. Min and max
2017-05-19 10:55:04 +01:00
Guido Cossu
8e19c99c7d
Adding more statistical info in the Benchmark_comms
2017-05-18 19:07:35 +01:00
Guido Cossu
a0bc0ad06f
Reverting change in Bechmark_comms. Keeping 300 iterations
2017-05-18 17:48:11 +01:00
Guido Cossu
bc862ce3ab
Fixing an allocation issue in Benchmark_comms
2017-05-18 14:44:56 +01:00
paboyle
751f2b9703
Better check and benchmark driving
2017-05-05 19:54:38 +01:00
Guido Cossu
20999c1370
Merge branch 'develop' into feature/hmc_generalise
2017-05-05 12:47:17 +01:00
Peter Boyle
945767c6d8
More info
2017-05-03 20:26:35 -04:00
Peter Boyle
92e364a35f
Better reporting in benchmark for MPI3
2017-05-03 15:43:36 -04:00
Guido Cossu
4063238943
Adding HMC test file example for Mobius + smearing
2017-05-01 13:44:00 +01:00
Guido Cossu
3344788fa1
Merge branch 'develop' into feature/hmc_generalise
2017-05-01 12:13:56 +01:00
paboyle
738c1a11c2
longer nloop
2017-04-26 08:43:20 +01:00
paboyle
ab66bac4e6
Think I'm getting on top of the reduced cost exterior precomputed list of links
2017-04-25 08:50:26 +01:00
paboyle
c429ace748
Cleaner OpenMP use
2017-04-22 20:28:42 +01:00
Peter Boyle
1d1b225497
Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).
2017-04-22 09:05:28 -04:00