Guido Cossu
20999c1370
Merge branch 'develop' into feature/hmc_generalise
2017-05-05 12:47:17 +01:00
Peter Boyle
945767c6d8
More info
2017-05-03 20:26:35 -04:00
Peter Boyle
92e364a35f
Better reporting in benchmark for MPI3
2017-05-03 15:43:36 -04:00
Guido Cossu
4063238943
Adding HMC test file example for Mobius + smearing
2017-05-01 13:44:00 +01:00
Guido Cossu
3344788fa1
Merge branch 'develop' into feature/hmc_generalise
2017-05-01 12:13:56 +01:00
paboyle
738c1a11c2
longer nloop
2017-04-26 08:43:20 +01:00
paboyle
ab66bac4e6
Think I'm getting on top of the reduced cost exterior precomputed list of links
2017-04-25 08:50:26 +01:00
paboyle
c429ace748
Cleaner OpenMP use
2017-04-22 20:28:42 +01:00
Peter Boyle
1d1b225497
Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).
2017-04-22 09:05:28 -04:00
paboyle
fc4ab9ccd5
Working half precision comms
2017-04-20 11:20:26 +01:00
Guido Cossu
8c540333d5
Merge branch 'develop' into feature/hmc_generalise
2017-04-05 14:41:04 +01:00
paboyle
f18f5ed926
Drop random device
2017-04-02 00:26:26 +09:00
paboyle
4b17e8eba8
Merge branch 'develop' into feature/bgq-asm
...
Conflicts:
lib/qcd/action/fermion/Fermion.h
lib/qcd/action/fermion/WilsonFermion.cc
lib/util/Init.cc
tests/Test_cayley_even_odd_vec.cc
2017-03-28 04:49:30 -04:00
paboyle
18bde08d1b
Merge branch 'feature/staggering' into develop
2017-03-28 15:25:55 +09:00
paboyle
e099dcdae7
Merge branch 'develop' into feature/bgq-asm
2017-02-23 00:25:29 +00:00
azusayamaguchi
1c30e9a961
Verified
2017-02-21 23:01:25 +00:00
paboyle
3ae92fa2e6
Global changes to parallel_for structure.
...
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
paboyle
1a30455a10
1000 iters on bmark for more accurate timing
2017-02-20 17:47:01 -05:00
paboyle
aca7a3ef0a
Optimisation control improvements
2017-02-10 18:22:31 -05:00
Guido Cossu
8b6a6c8236
Resolving small merge conflict
2017-02-09 16:20:24 +00:00
Guido Cossu
e0571c872b
Merge branch 'develop' into feature/hmc_generalise
2017-02-09 16:12:00 +00:00
paboyle
2bf4688e83
Running on BNL KNL
2017-02-07 01:32:10 -05:00
paboyle
060da786e9
Comms benchmark improvements
2017-02-07 01:07:39 -05:00
Guido Cossu
17629b8d9e
Merge branch 'develop' into feature/hmc_generalise
2017-01-25 11:33:53 +00:00
a37e71f362
New automatic implementation of gamma matrices, Meson and SeqGamma are broken
2017-01-23 19:13:43 -08:00
azusayamaguchi
05c1924819
Timing loop change
2017-01-23 10:43:45 +00:00
Peter Boyle
55cb22ad67
Z mobius bmark
2016-12-18 00:55:37 +00:00
Guido Cossu
0bd296dda4
Adding check of the Dag part in the benchmark
2016-12-14 03:15:09 +00:00
Peter Boyle
ff71a8e847
Ready for sim
2016-12-08 17:00:32 +00:00
Peter Boyle
e27c6b217c
Updating
2016-12-01 12:42:53 +00:00
Peter Boyle
cd01c1dbe9
Ls 16 more relevant
2016-11-30 22:11:10 +00:00
paboyle
bd0430b34f
Serialisation in malloc fixed
2016-11-29 22:27:55 +00:00
Azusa Yamaguchi
c097fd041a
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering
2016-11-29 13:44:17 +00:00
Azusa Yamaguchi
389e0a77bd
Staggerd Fermion 5D
2016-11-29 13:13:56 +00:00
paboyle
2f92b4860b
Test the full Mooee sector
2016-11-29 00:15:08 +00:00
Azusa Yamaguchi
95f43d27ae
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering
2016-11-22 13:49:22 +00:00
Azusa Yamaguchi
668ca57702
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering
2016-11-22 13:49:11 +00:00
433afd36f5
Makefile rule for simple_* objects
2016-11-19 01:33:13 +01:00
042ae5b87c
generic 256bits SIMD
2016-11-15 12:16:15 +00:00
paboyle
33dc1f51b5
Final sign off commits from Cori-1
2016-11-09 04:11:03 -08:00
Azusa Yamaguchi
ee686a7d85
Compiles now
2016-11-03 16:58:23 +00:00
Azusa Yamaguchi
1c5b7a6be5
Staggered phases first cut, c1, c2, u0
2016-11-03 16:26:56 +00:00
paboyle
757a928f9a
Improvement to use own SHM_OPEN call to avoid openmpi bug.
2016-11-02 12:37:46 +00:00
paboyle
bb94ddd0eb
Tidy up of mpi3; also some cleaning of the dslash controls.
2016-11-02 08:07:09 +00:00
paboyle
791cb050c8
Comms improvements
2016-11-01 11:35:43 +00:00
azusayamaguchi
b6a65059a2
Update to use shared memory to contain the stencil comms buffers
...
Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions
2016-10-24 17:30:43 +01:00
azusayamaguchi
c190221fd3
Internal SHM comms in non-simd directions working
...
Need to fix simd directions
2016-10-22 18:14:27 +01:00
paboyle
a762b1fb71
MPI3 working with a bounce through shared memory on my laptop.
...
Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node.
2016-10-21 09:03:26 +01:00
azusayamaguchi
81f2aeaece
KNL streaming stores, and KNL performance coutners
2016-10-12 11:45:22 +01:00
Guido Cossu
2e453dfbf5
Added some instrumentation to benchmark the force computation
2016-10-06 17:52:45 +01:00
paboyle
4089984431
Timing hooks
2016-10-06 09:25:12 +01:00
Guido Cossu
0fd179fb33
Merge branch 'develop' into feature/hirep
2016-09-01 12:59:53 +01:00
Guido Cossu
fd5614738d
Merge branch 'develop' into feature/hirep
2016-08-30 18:21:36 +01:00
paboyle
5a68715be3
Richards sweep test
2016-08-05 10:51:57 +01:00
paboyle
32bc7a6ab8
MPI back out of change that hangs
...
AVX2 for clang, gcc needs the -mfma flag.
2016-08-05 10:36:00 +01:00
b65e72e521
Merge pull request #43 from rprollins/bench/output-format
...
Benchmark_dwf_sweep and Benchmark_zmm output formats
2016-08-04 16:47:01 +01:00
629283726b
build system: local Grid link flag moved to configure.ac
2016-08-03 15:07:42 +01:00
9e5b934d21
improved LAPACK configuration
2016-08-02 17:26:54 +01:00
e9f30cab2c
first working version for the new build system
2016-07-30 17:53:18 +01:00
Richard Rollins
df6c9f55d1
Use common benchmark output format for dwf_sweep and zmm
2016-07-20 17:38:56 +01:00
paboyle
f4dd5062d7
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2016-07-15 19:26:06 +01:00
paboyle
9db2c6525d
updating benchmarks for red black 4d for Ls vectorised code
2016-07-14 23:44:02 +01:00
paboyle
ef97e32152
Adding persistent communicators
2016-07-08 17:16:08 +01:00
Guido Cossu
5028969d4b
Added generators for the adjoint representation
2016-07-08 15:40:11 +01:00
paboyle
a0676beeb1
Open up dependency on Eigen and FFTW
2016-07-07 22:31:07 +01:00
Guido Cossu
fdfbf11c6d
Merge branch 'develop' into temporary-smearing
2016-07-04 18:45:10 +01:00
Guido Cossu
9cb90f714e
Merge remote-tracking branch 'origin/develop' into temporary-smearing
2016-07-04 17:28:40 +01:00
paboyle
bfe14000a9
Double compile fix
2016-07-01 16:33:51 +01:00
paboyle
680645f849
Merge branch 'release/v0.5.0'
2016-06-30 15:15:03 -07:00
paboyle
2d8bb4c594
Tweaks
2016-06-30 14:35:01 -07:00
paboyle
51cb2d4328
update file lists
2016-06-30 14:35:01 -07:00
paboyle
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
Guido Cossu
565e9329ba
Changed the colouring classes
2016-06-30 16:51:03 +01:00
Guido Cossu
5e02392f9c
Fixed compilation error for benchmark_dwf
...
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
paboyle
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
paboyle
05acc22920
placeholder for non temporal loads optimisation
2016-06-07 13:18:21 -07:00
paboyle
8ac021de73
Added a test an fixed it for red black precon Ls innermost vectorised DWF
2016-06-07 13:16:56 -07:00
paboyle
786ca52c43
Problems remain in the red black preconditioning of the Ls vectorisation
2016-06-06 07:05:51 -07:00
paboyle
53d06046b0
Compiling updates for KNL
2016-06-03 03:47:54 -07:00
paboyle
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
paboyle
f2ae9682ff
Remove some timing hacks
2016-04-19 15:14:32 -07:00
paboyle
528eb773ad
Merged.
...
Merge branch 'master' of https://github.com/paboyle/Grid
2016-04-19 22:24:34 +01:00
paboyle
c323425496
Small change
2016-04-11 10:38:43 +01:00
paboyle
650e02b344
Smaller vols too
2016-04-06 06:52:09 -07:00
paboyle
a524ca2a4b
New benchmark update
2016-04-06 03:35:56 -07:00
paboyle
23a7176b71
Loop over volumes
2016-04-06 03:22:11 -07:00
paboyle
b1192a8908
Benchmark_zmm added
2016-04-06 03:00:07 -07:00
paboyle
e8dddb1596
Adding extra benchmark
2016-04-06 10:32:54 +01:00
paboyle
c77b7ee897
AddSub based alternate SU3 routine
2016-03-28 17:55:22 -06:00
paboyle
e17c773a0b
Longer runs for vtune
2016-03-16 02:29:13 -07:00
Peter Boyle
f7be108e35
100 iters faster
2016-02-15 16:03:04 -06:00
paboyle
fc6ad65751
Pushed the overlap comms tweaks
2016-01-11 06:34:22 -08:00
paboyle
02452afd36
Optional overlap of comms with compute
2016-01-04 14:18:40 +00:00
paboyle
331768dcff
Added overlap comms compute mode
2016-01-03 01:38:11 +00:00
paboyle
aae8bf31a7
Global edit adding copyright and license info to every source file.
2016-01-02 14:51:32 +00:00
paboyle
3ce10aa975
Fix a regression failure on Mobius; chroma regression added
2015-12-10 22:55:00 +00:00
paboyle
1cc0d7b811
Bigger ncall as timing loops got small on cori
2015-11-07 00:04:40 -08:00
Peter Boyle
27813cf518
More timing detail reported
2015-11-06 05:27:13 -06:00
paboyle
16c7993434
Merge branch 'master' of github.com:paboyle/Grid
...
Conflicts:
lib/simd/Grid_avx512.h
lib/simd/Grid_imci.h
2015-11-04 03:32:10 -08:00
paboyle
32762346ad
Better run time on KNC
2015-11-04 03:25:34 -08:00