paboyle
f4dd5062d7
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2016-07-15 19:26:06 +01:00
paboyle
980ff18956
Solving the instantiation no compile issue
2016-07-15 17:19:44 +01:00
Guido Cossu
7edf4c6c04
Added HMC utitities for the higher representations
...
TODO: Inherit types for the pseudofermions, Debugging, testing
2016-07-15 13:39:47 +01:00
paboyle
1a6c7204ac
Disable instantiation; Use cache version instead
2016-07-15 00:34:39 +01:00
paboyle
49310fbab3
Done with red black change over
2016-07-15 00:08:43 +01:00
paboyle
dfd714e1ef
Multiple implementations for the 5d hopping terms, depending on cache friendly
...
ops and/or the 5th direction being vectorised
All use 4d redblack.
2016-07-15 00:00:09 +01:00
paboyle
79a8ca1a62
Rewrite for performance. Impl dependent instantiations give
...
4d linalg impls of the 5d hopping terms (and inverse)
Cache friendly loop orderings of the above
Dense matrix stored and apply to the above
-- Switch to Ls vectorised, and use dense matrix approach for the MooeeInv
and rotate/shift of the Mooee M5D routines.
2016-07-14 23:58:15 +01:00
paboyle
fb45eb2eb2
5d ls vec rename of impl class
2016-07-14 23:57:26 +01:00
paboyle
a307274c96
Fermion impl rename for ls vectorised 5d approaches
2016-07-14 23:56:13 +01:00
paboyle
3f2c44a5fe
Updating the class to 5d selection based on impl type
2016-07-14 23:55:26 +01:00
paboyle
48fb1cdc11
Update domain 5d vectorised impl type, move the type over to 4d redblack with
...
the dense OO inverse
2016-07-14 23:54:35 +01:00
paboyle
8a79e93cc2
Rename the 5d domain wall fermion vectorised Ls impl class
2016-07-14 23:53:00 +01:00
paboyle
adbc7c1188
Adding files for multiple implementations (cache opt) and Ls vectorisation
...
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.
The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.
This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.
Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00
Guido Cossu
9dc345e8e8
Debugged smearing and adding HMC functions for hirep
2016-07-13 17:51:18 +01:00
Guido Cossu
a9ae30f868
Added representations definitions for the HMC
2016-07-12 13:36:10 +01:00
paboyle
ef97e32152
Adding persistent communicators
2016-07-08 17:16:08 +01:00
Guido Cossu
daea5297ee
Wrote the projector in the adjoint representation algebra
2016-07-08 16:14:16 +01:00
Guido Cossu
5028969d4b
Added generators for the adjoint representation
2016-07-08 15:40:11 +01:00
paboyle
a0676beeb1
Open up dependency on Eigen and FFTW
2016-07-07 22:31:07 +01:00
Guido Cossu
fbf96b1bbb
]Merge branch 'develop' into feature/hirep
2016-07-07 14:20:10 +01:00
Guido Cossu
3c49ddfaa4
Merge branch 'temporary-smearing' into develop
2016-07-07 14:04:59 +01:00
Guido Cossu
ffb8b3116c
Tested smeared RHMC Wilson1p1, accepting
2016-07-07 11:49:36 +01:00
Christopher Kelly
4774a3bcd2
Generalized HotConfiguration and functions it calls to accept gauge fields with precision other than the default.
2016-07-06 18:01:08 -04:00
Guido Cossu
e87182cf98
Debugged the copy constructor of the Lattice class
2016-07-06 15:31:00 +01:00
Guido Cossu
e3d5319470
Debugged the real() and imag() functions and added tests to Test_Simd
2016-07-06 14:16:03 +01:00
Guido Cossu
ffedeb1c58
Minor modifications
2016-07-06 11:41:27 +01:00
Guido Cossu
3e80947c2b
Cleaned up HMC output. Tested smeared HMCs for single precision (OK)
2016-07-05 12:03:54 +01:00
Guido Cossu
fdfbf11c6d
Merge branch 'develop' into temporary-smearing
2016-07-04 18:45:10 +01:00
Guido Cossu
9cb90f714e
Merge remote-tracking branch 'origin/develop' into temporary-smearing
2016-07-04 17:28:40 +01:00
Guido Cossu
2daffdf95d
Tested smeared WilsonRatio action, accepts
2016-07-04 16:17:28 +01:00
Guido Cossu
149f826601
Tested smearing for Nf2 WilsonFermionAction, non EO: accepts
2016-07-04 16:09:19 +01:00
Guido Cossu
cd8ee27080
Simple change in iGamma for smearing
2016-07-04 16:02:57 +01:00
Guido Cossu
0fa66e8f3c
Debugged smearing for EOWilson, accepts
2016-07-04 15:35:37 +01:00
Guido Cossu
8dd099267d
Corrected a bug in the Expression Templates (acso and asin were wrong)
2016-07-03 12:28:25 +01:00
Guido Cossu
1a6d65c6a4
Converted set_uw and set_fj to all complex functions
2016-07-03 10:27:43 +01:00
Guido Cossu
092fa0d8da
Debugged set_fj,
...
to be fixed: BUG in imag()
2016-07-01 16:06:20 +01:00
paboyle
680645f849
Merge branch 'release/v0.5.0'
2016-06-30 15:15:03 -07:00
paboyle
712b9a3489
Asm only for avx512
2016-06-30 14:35:02 -07:00
paboyle
bdaa5b1767
Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking.
2016-06-30 14:35:02 -07:00
paboyle
8fcefc021a
Improved the prefetching when using cache blocking codes
2016-06-30 14:35:02 -07:00
paboyle
05c884a62a
Prefetch change
2016-06-30 14:35:01 -07:00
paboyle
2d8bb4c594
Tweaks
2016-06-30 14:35:01 -07:00
paboyle
6d58cb2a68
Enable reordering of the loops in the assembler for cache friendly.
...
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
Guido Cossu
565e9329ba
Changed the colouring classes
2016-06-30 16:51:03 +01:00
Guido Cossu
5e02392f9c
Fixed compilation error for benchmark_dwf
...
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
paboyle
87418e7df1
Slightly faster prefetching perf.
2016-06-13 02:32:52 -07:00
paboyle
55f65b81b5
Improvements to the assembler interface that let us move chunks of the
...
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
Azusa Yamaguchi
d9408893b3
Prefetching in the normal kernel implementation.
2016-06-08 05:43:48 -07:00
paboyle
8ac021de73
Added a test an fixed it for red black precon Ls innermost vectorised DWF
2016-06-07 13:16:56 -07:00
paboyle
e503ef5590
Cleaned up
2016-06-07 00:11:36 +01:00
paboyle
a7682b0060
Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS
2016-06-06 23:48:21 +01:00
paboyle
53d06046b0
Compiling updates for KNL
2016-06-03 03:47:54 -07:00
paboyle
139cc5f1ae
Large change with KNL preparation
2016-06-03 03:24:26 -07:00
c698b16d75
function to generate Chroma-style gamma matrix products
2016-05-01 18:30:35 -07:00
paboyle
5341977948
IMCI fixes. Thought I had committed these. The "real" disambiguation
...
between std::real and Grid::real shouldn't have been necessary and I don't
know why only the icpc v16.0 on babbage hits it.
May need a longer term rename of Grid::real or some careful EnableIf work.
2016-04-30 03:34:16 -07:00
f6c53e5039
Merge commit '1e554350acae0e67fa7177ed0db9d4f684a54af2'
2016-04-30 00:17:52 -07:00
6aa000176f
Fermion <-> Propagator functions
2016-04-30 00:14:33 -07:00
paboyle
1e554350ac
The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug.
2016-04-29 16:49:18 -07:00
paboyle
c79ea0dcef
Fixingn IMCI
2016-04-22 21:52:54 -07:00
paboyle
8fd8bc25e9
simd 5th dim with rotation
2016-04-19 15:39:00 -07:00
paboyle
ba427abde9
simd 5d
2016-04-19 15:38:39 -07:00
paboyle
9b6ab6db16
simd in 5th dimension support
2016-04-19 15:38:01 -07:00
paboyle
806a83d38b
simd in fifth dim support for dwf
2016-04-19 15:36:19 -07:00
neo
339be37dba
Debugging smeared HMC
2016-04-13 17:00:14 +09:00
neo
a87b744621
HMC runs but does not accept with smearing on
2016-04-07 16:45:11 +09:00
paboyle
b1192a8908
Benchmark_zmm added
2016-04-06 03:00:07 -07:00
paboyle
e8dddb1596
Adding extra benchmark
2016-04-06 10:32:54 +01:00
97d0d56bcb
Debugging Smearing routines (set_fj)
2016-04-06 17:58:43 +09:00
7c7ea35ffb
Putting the Traceless Antihermitian part outside the deriv in pseudofermion actions
2016-04-05 16:28:09 +09:00
4b1cf580e0
Debugging the Smearing routines
2016-04-05 16:19:30 +09:00
paboyle
e67fc2be18
Adding a trial for openmp overhead minimisation
2016-03-31 16:00:37 +01:00
paboyle
8052556275
Cleaning up the single/double kernel implementation switch
2016-03-31 14:51:32 +01:00
paboyle
60d965f79e
AVX512 improvements; sigfpe trapping too
2016-03-30 08:42:34 +01:00
paboyle
1ecbf9794d
Merge branch 'master' of https://github.com/paboyle/Grid
2016-03-30 08:37:55 +01:00
paboyle
c77b7ee897
AddSub based alternate SU3 routine
2016-03-28 17:55:22 -06:00
paboyle
1e355a51e1
Interface change
2016-03-27 23:46:55 -07:00
paboyle
21abaf7e91
Gamma sign change
2016-03-28 00:35:45 -06:00
paboyle
165bffc2e7
Avx512 changes for assembler kernels
2016-03-26 22:25:45 -06:00
paboyle
644fd6d32e
Build avx512 clean
2016-03-25 09:35:33 -07:00
paboyle
60d4564151
ICC no compile fix
2016-03-16 02:30:40 -07:00
paboyle
090e7aa930
Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
...
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
paboyle
325e745daa
Merge branch 'master' of https://github.com/paboyle/Grid
2016-03-02 07:04:03 -08:00
paboyle
61413565d0
Back off the inlined spin proj as not working
2016-03-02 07:03:09 -08:00
2d8bb356e3
Smearing routines compile (still untested)
2016-02-25 02:43:59 +09:00
a7251f28c7
Stout smearing compiles (untested)
2016-02-24 03:16:50 +09:00
Antonin Portelli
497e7e4c53
BG/Q compatibility fix
2016-02-23 15:57:38 +00:00
Peter Boyle
6aeaf6f568
Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
...
turned up problems on the BlueWaters Cray.
Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00
Peter Boyle
40f2db9bc0
Disable metropolis step until 10 traj covered. Should move to exposing these
...
in XML input and start having "applications" directory.
2016-02-21 08:01:44 -06:00
Jung
9f0d9ade68
Added configure flag for LAPACK. Tested ImplicitlyRestartedLanczos::calc()
...
Checking in before cleaning up
2016-02-20 02:50:32 -05:00
neo
c1b1b89d17
More on smearing routines, writing APEsmear (dev)
2016-02-19 17:15:27 +09:00
neo
771235017d
Adding smearing routines (development)
2016-02-19 15:30:41 +09:00
paboyle
3425751cb8
Missing return value
2016-02-19 01:06:03 +00:00
Peter Boyle
22422a84d9
Small problem in compressor fix
2016-02-17 19:03:09 -06:00
Peter Boyle
c9fadf97a5
Simplify the compressor interface again.
2016-02-17 18:16:45 -06:00
Peter Boyle
81395e85d1
Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it.
2016-02-16 13:56:44 -06:00
Peter Boyle
a0fc47c6f9
Cheaper implementation
2016-02-15 16:02:36 -06:00
paboyle
e2f73e3ead
Updates for shmem
2016-02-10 16:50:32 -08:00
neo
6371676a75
Correcting some compilation errors for clang-sse
2016-02-10 11:37:03 +09:00
Jung
bd84c23298
definitions reconciled.
2016-01-25 16:30:59 -05:00
Jung
7aa8d5e8af
Faiing to compile, comparing with master
2016-01-25 16:03:02 -05:00