ae9688e343
Reporting also the total mflops
2016-11-28 11:37:02 +00:00
ca21003f01
Merge branch 'feature/fft-opt' into feature/feynman-rules
...
# Conflicts:
# lib/FFT.h
# lib/qcd/action/fermion/WilsonFermion5D.h
# tests/core/Test_fft.cc
2016-10-26 18:44:47 +01:00
c190221fd3
Internal SHM comms in non-simd directions working
...
Need to fix simd directions
2016-10-22 18:14:27 +01:00
6a9eae6b6b
Reporting improvements
2016-10-21 13:36:18 +01:00
997fd882ff
Merge branch 'develop' into feature/feynman-rules
...
# Conflicts:
# lib/Threads.h
# lib/qcd/action/fermion/WilsonFermion.cc
# lib/qcd/action/fermion/WilsonFermion.h
# lib/qcd/utils/SUn.h
# lib/simd/Grid_avx.h
# lib/simd/Intel512common.h
2016-10-19 18:35:18 +01:00
81f2aeaece
KNL streaming stores, and KNL performance coutners
2016-10-12 11:45:22 +01:00
96f1d1b828
Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass).
2016-10-10 23:46:45 +01:00
2e453dfbf5
Added some instrumentation to benchmark the force computation
2016-10-06 17:52:45 +01:00
4089984431
Timing hooks
2016-10-06 09:25:12 +01:00
b6713ecb60
Momentum space rules for Overlap, DWF untested to date
2016-09-26 09:39:09 +01:00
48fb1cdc11
Update domain 5d vectorised impl type, move the type over to 4d redblack with
...
the dense OO inverse
2016-07-14 23:54:35 +01:00
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
8ac021de73
Added a test an fixed it for red black precon Ls innermost vectorised DWF
2016-06-07 13:16:56 -07:00
53d06046b0
Compiling updates for KNL
2016-06-03 03:47:54 -07:00
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
1e554350ac
The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug.
2016-04-29 16:49:18 -07:00
9b6ab6db16
simd in 5th dimension support
2016-04-19 15:38:01 -07:00
e8dddb1596
Adding extra benchmark
2016-04-06 10:32:54 +01:00
e67fc2be18
Adding a trial for openmp overhead minimisation
2016-03-31 16:00:37 +01:00
165bffc2e7
Avx512 changes for assembler kernels
2016-03-26 22:25:45 -06:00
090e7aa930
Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
...
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
61413565d0
Back off the inlined spin proj as not working
2016-03-02 07:03:09 -08:00
6aeaf6f568
Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
...
turned up problems on the BlueWaters Cray.
Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00
22422a84d9
Small problem in compressor fix
2016-02-17 19:03:09 -06:00
c9fadf97a5
Simplify the compressor interface again.
2016-02-17 18:16:45 -06:00
81395e85d1
Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it.
2016-02-16 13:56:44 -06:00
a0fc47c6f9
Cheaper implementation
2016-02-15 16:02:36 -06:00
e2f73e3ead
Updates for shmem
2016-02-10 16:50:32 -08:00
411ac49dd7
GparityWilsonTM typedef added. Not yet tested
...
Conflicts:
configure
lib/qcd/action/fermion/WilsonKernels.h
2016-01-25 01:36:28 -05:00
5c57d4f403
Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
...
Conflicts:
lib/qcd/action/fermion/WilsonKernels.h
2016-01-11 11:36:45 -05:00
fc6ad65751
Pushed the overlap comms tweaks
2016-01-11 06:34:22 -08:00
dafc74020c
Overlap comms compute improvements in hand op kernels, and better timing from Edison and Cori
2016-01-10 16:54:27 -08:00
d19321dfde
Overlap comms compute changes
2016-01-10 19:20:16 +00:00
5924e5a562
Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
...
Conflicts:
configure
lib/qcd/action/Actions.h
lib/qcd/action/fermion/WilsonKernels.h
2016-01-06 03:44:57 -05:00
c99d748da6
Timing reports in benchmarks now reflect the asynch comms thread statistics
2016-01-04 14:42:16 +00:00
02452afd36
Optional overlap of comms with compute
2016-01-04 14:18:40 +00:00
331768dcff
Added overlap comms compute mode
2016-01-03 01:38:11 +00:00
aae8bf31a7
Global edit adding copyright and license info to every source file.
2016-01-02 14:51:32 +00:00
34a0fde2ad
Fixes to fermion force terms after sign of gamma_mu (0...3) change.
...
Thought I had already committed these.
Believe I have got the Gparity fermion force working.
* tests/Test_gpdwf_force.cc -- correctly predicts dS for two flavour pseudofermion
based on a small dt update of U field.
* tests/Test_hmc_EODWFRatio_Gparity.cc -- ran 1 trajectory on 8^4 with dH=0.21.
Need to accumulate a full plaquette log to believe fully which will take some hours of run time.
2015-12-15 23:14:12 +00:00
f2b4edc090
Fixes for Gparity comparison with CPS (Instantiation, Gamma matrix convention)
2015-12-07 02:04:57 -05:00
b2c02a6106
Runs fastst on cori
2015-11-28 16:58:16 -08:00
e9ff25b06b
Small threading change makes a difference on Cori.
2015-11-07 00:07:05 -08:00
899ca41cb8
Merge branch 'master' of github.com:paboyle/Grid
...
Conflicts:
lib/qcd/action/fermion/WilsonFermion5D.cc
2015-11-06 03:50:04 -08:00
a2ff068e29
Asm and threading for many core
2015-11-06 03:47:14 -08:00
28022755ae
Stencil class name global change to StencilImpl typedef
2015-11-06 05:30:17 -06:00
55cfc89459
* Finished the template/policy style introduction of gparity, except the gparity force terms.
...
So valence sector looks ok.
FermionOperatorImpl.h provides the policy classes.
Expect HMC will introduce a smearing policy and a fermion representation change policy template
param. Will also probably need multi-precision work.
* HMC is running even-odd and non-checkerboarded (checked 4^4 wilson fermion/wilson gauge).
There appears to be a bug in the multi-level integrator -- <e-dH> passes with single level but
not with multi-level.
In any case there looks to be quite a bit to clean up.
This is the "const det" style implementation that is not appropriate yet for clover since
it assumes that Mee is indept of the gauge fields. Easily fixed in future.
2015-08-15 23:25:49 +01:00
7d3512ab21
Gparity valence test now working.
...
Interface in FermionOperator will change a lot in future
2015-08-14 00:01:04 +01:00
84a66476ab
Rework/global edit to enforce type templating of fermion operators.
...
Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types.
2015-08-10 20:47:44 +01:00
2157a6919a
Changes making force term test for DWF pass.
2015-08-01 22:06:07 +09:00