Guido Cossu
453cf2a1c6
Moving the topological charge outside the HMC related routines
2017-05-02 14:40:12 +01:00
Guido Cossu
de7bbfa5f9
Adding ParameterFile option for the HMC
2017-05-02 12:16:16 +01:00
Chulwoo Jung
867fe93018
First Rotate reorg done.
2017-05-02 01:26:22 -04:00
Chulwoo Jung
09651c3326
Checking in before rearranging Lanczos
2017-05-02 00:47:18 -04:00
Chulwoo Jung
f87f2a3f8b
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/Lanczos
2017-05-01 12:00:47 -04:00
Guido Cossu
74f451715f
Fix for Mac compilation on the size_t uint64_t types
2017-05-01 15:12:07 +01:00
Guido Cossu
4063238943
Adding HMC test file example for Mobius + smearing
2017-05-01 13:44:00 +01:00
Guido Cossu
3344788fa1
Merge branch 'develop' into feature/hmc_generalise
2017-05-01 12:13:56 +01:00
Peter Boyle
99220f6531
Fixes and better timing
2017-04-26 17:24:11 -04:00
Peter Boyle
f8797e1e3e
bug fix. works now and great face performance
2017-04-26 03:14:02 -04:00
Peter Boyle
fd1eb7de13
Clean implementation of the exterior faces listing only those points on the boudary
2017-04-26 02:34:52 -04:00
Peter Boyle
2ce898efa3
Pretty code
2017-04-26 02:34:25 -04:00
paboyle
ab66bac4e6
Think I'm getting on top of the reduced cost exterior precomputed list of links
2017-04-25 08:50:26 +01:00
paboyle
56277a11c8
Build a list of whats on the surface
2017-04-24 17:06:15 +01:00
Peter Boyle
5b55867a7a
Slightly cheaper Ext assembly
2017-04-24 05:36:11 -04:00
Peter Boyle
3accb1ef89
Debugged assemply split phase with interior suppression
2017-04-23 19:30:19 -04:00
Peter Boyle
e3d0e31525
Debugged assemply split phase with interior suppression
2017-04-23 19:29:27 -04:00
Peter Boyle
5812eb8a8c
Partially fixed. But the comms-overlap does not work yet.
2017-04-22 18:50:25 -04:00
paboyle
ac58565d0a
Dangerous rewrite of the assembly. If I make a mistake the debug will be painful.
2017-04-22 19:31:04 +01:00
paboyle
3703b718aa
Mark up a table if a given site only receives from itself; including MPI3 splitting info.
2017-04-22 19:28:37 +01:00
paboyle
b722889234
Try a better load balancing loop
2017-04-22 19:27:41 +01:00
paboyle
abba44a837
Hand unrolled for overlapped comms
2017-04-22 17:45:17 +01:00
paboyle
f301be94ce
Fixed
2017-04-22 17:42:31 +01:00
Peter Boyle
1d1b225497
Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).
2017-04-22 09:05:28 -04:00
Peter Boyle
53a785a3dd
Fixing the KNL compile
2017-04-22 08:11:51 -04:00
paboyle
736bf3c866
Major rework of stencil. Half precision and MPI3 now working.
2017-04-22 11:33:50 +01:00
paboyle
b9bbe5d188
L1p config bg/q
2017-04-22 11:33:09 +01:00
paboyle
3844bcf800
If no f16c instructions supported must use software half precision conversion.
...
This will also become useful on BG/Q, so will move out from SSE4 into a general area.
Lifted the Eigen half precision from web. Looks sensible, but not extensively regressed
against the intrinsics implementation yet.
2017-04-20 15:30:52 +01:00
paboyle
e1a2319d01
Simple compressor moved out of cshift into stencil
2017-04-20 13:18:15 +01:00
paboyle
180c732b4c
Move compressors out of Cshift.
...
Slice iterators would help
2017-04-20 13:17:55 +01:00
paboyle
d2312e9874
Drop compressor entirely from Cshift to only Stencil.
2017-04-20 13:16:55 +01:00
paboyle
fc4ab9ccd5
Working half precision comms
2017-04-20 11:20:26 +01:00
paboyle
4a340aa5ca
Massive compressor rework to support reduced precision comms
2017-04-20 09:28:27 +01:00
paboyle
3b7de792d5
Type comparison in the traits work
2017-04-18 13:28:04 +01:00
paboyle
557c3fa109
Pretty change
2017-04-18 13:27:38 +01:00
paboyle
8e161152e4
MultiRHS solver improvements with slice operations moved into lattice and sped up.
...
Block solver requires a lot of performance work.
2017-04-18 10:51:55 +01:00
paboyle
3141ebac10
MultiRHS working, starting to optimise. Block doesn't and I thought it already was; puzzled.
2017-04-17 10:50:19 +01:00
paboyle
7ede696126
Non compile of tests fixed
2017-04-16 23:40:00 +01:00
Chulwoo Jung
a07556dd5f
Added back the convergence test from evecs of tridiagonal matrix. Bugfixes
2017-04-15 09:32:15 -04:00
paboyle
bf516c3b81
higher precision reduction variables in norm and inner product
2017-04-15 12:27:28 +01:00
paboyle
441a52ee5d
First cut at higher precision reduction
2017-04-15 10:57:21 +01:00
paboyle
a8db024c92
Cleaning up the dense matrix and lanczos sector
2017-04-15 08:54:11 +01:00
paboyle
3ca41458a3
Fix to no USE_FP16 case
2017-04-14 14:20:54 +01:00
Peter Boyle
951be75292
Half precision conversion working on AVX512 now too
2017-04-13 17:35:11 +01:00
Peter Boyle
b9113ed310
Patches for knl
2017-04-13 12:02:12 -04:00
paboyle
42fb49d3fd
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2017-04-13 14:12:47 +01:00
paboyle
db5ea001a3
Update to use Xcode 8.3 since -mfp16 causes SIGILL
2017-04-13 12:22:40 +01:00
paboyle
1d502e4ed6
FP16 optional compile time
2017-04-13 11:55:24 +01:00
paboyle
73cdf0fffe
Drop f16c from SSE because of a macos compile error on travis
2017-04-13 11:23:41 +01:00
paboyle
1c25773319
Trap illegal instructions
2017-04-13 10:51:40 +01:00
paboyle
94eb829d08
Align cast fixed for __mm128i gcc complained
2017-04-13 08:40:44 +01:00
paboyle
68392ddb5b
Exchange in generic
...
Precision change in AVX, SSE, AVX512, Generic. QPX still to do.
2017-04-13 08:38:12 +01:00
paboyle
cb6b81ae82
Half precision conversion
2017-04-12 19:32:37 +01:00
53e76b41d2
Merge branch 'develop' into feature/hadrons
2017-04-10 17:00:53 +01:00
8ef4300412
spurious .dirstamp files removed
2017-04-10 17:00:22 +01:00
98a24ebf31
The macro “magics” is very intensive for the preprocessor in the measurement code which has numerous serialisable classes. Reducing the number of serialisable fields to 64 (instead of 1024) helps a lot, this is enough for now and can be extended trivially if needed in the future.
2017-04-10 16:58:54 +01:00
paboyle
b12dc89d26
Commenting and clean up
2017-04-10 20:38:20 +09:00
paboyle
d80d802f9d
MultiRHS solver test
2017-04-10 00:12:12 +09:00
paboyle
3d99b09dba
Start of blockCG
2017-04-09 23:42:10 +09:00
paboyle
db5f6d3ae3
Verbose fix
2017-04-09 23:41:30 +09:00
paboyle
683550f116
Const args improvement
2017-04-09 23:41:04 +09:00
Chulwoo Jung
f80a847aef
Merge branch 'develop' into bugfix/dminus
2017-04-06 23:49:10 -04:00
Chulwoo Jung
93cb5d4e97
Working version of Lanczos without the extra copy.
2017-04-06 23:35:30 -04:00
Chulwoo Jung
9e48b7dfda
MEM_SAVE in Lanczos seems to be working, but not pretty
2017-04-06 22:21:56 -04:00
paboyle
86aaa35294
Christoph needs SchurDiagTwoKappa which is mobius specific.
2017-04-07 11:07:40 +09:00
Guido Cossu
8c540333d5
Merge branch 'develop' into feature/hmc_generalise
2017-04-05 14:41:04 +01:00
Chulwoo Jung
d0c2c9c71f
Merge branch 'develop' of https://github.com/paboyle/Grid into bugfix/dminus
2017-04-04 15:20:17 -04:00
Chulwoo Jung
c8cafa77ca
Checking in the latest Lacnzos
2017-04-04 15:18:12 -04:00
paboyle
5592f7b8c1
Creation mode better implementation
2017-04-05 02:35:34 +09:00
paboyle
35da4ece0b
UID fix
2017-04-05 02:18:15 +09:00
ff4e54ef80
Merge branch 'develop' into feature/hadrons
2017-04-03 18:56:21 +01:00
paboyle
83f6fab8fa
Big/Small crush test, and fast SITMO rng init, faster but not ideal
...
MT and Ranlux init.
2017-04-02 12:10:51 +09:00
paboyle
9dc7ca4c3b
Sitmo fast init
2017-04-02 00:28:22 +09:00
paboyle
935d82f5b1
sanity checks
2017-04-02 00:27:28 +09:00
paboyle
9cbcdd65d7
No random device seed
2017-04-02 00:26:57 +09:00
paboyle
7e5faa0f34
Multiple RNGs
2017-04-02 00:25:44 +09:00
paboyle
1c4bc7ed38
Debugged staggered conventions
2017-03-31 14:41:48 +09:00
Chulwoo Jung
a3bcad3804
Added preconditioned SYM2 solver (SchurRedBlackDiagTwoSolve)
2017-03-30 20:33:27 -04:00
Chulwoo Jung
5a5b66292b
Merge branch 'develop' of https://github.com/paboyle/Grid into bugfix/dminus
2017-03-30 10:44:02 -04:00
paboyle
93ea5d9468
Pretty code
2017-03-30 15:00:03 +09:00
paboyle
9fd23faadf
Pretty layout
2017-03-30 13:44:45 +09:00
paboyle
10e4fa0dc8
Template instantiation improvements
2017-03-30 13:44:25 +09:00
paboyle
c4aca1dde4
Conjugate coefficients on adjoint
2017-03-30 13:44:05 +09:00
paboyle
b9e8ea3aaa
conjugate coefficient on the dagger
2017-03-30 13:43:13 +09:00
paboyle
077aa728b9
Fix the ZMobius (I think)
2017-03-30 13:42:09 +09:00
paboyle
a8d83d886e
Macro controls
2017-03-30 13:31:34 +09:00
paboyle
7fd46eeec4
Trailing whitespace removal
2017-03-30 13:31:10 +09:00
paboyle
2b115929dc
Small AVX512 asm ifdef patch
2017-03-29 18:51:23 +09:00
paboyle
417ec56cca
Release candidate
2017-03-29 05:45:33 -04:00
paboyle
756bc25008
Verbose header print by default
2017-03-29 04:44:17 -04:00
paboyle
35695ba57a
Bug fix in MPI3
2017-03-29 04:43:55 -04:00
paboyle
d805867e02
Better init
2017-03-28 13:25:05 -04:00
paboyle
98f9318279
Build on AVX2 and MPI passing with clang++
2017-03-28 23:16:04 +09:00
paboyle
4b17e8eba8
Merge branch 'develop' into feature/bgq-asm
...
Conflicts:
lib/qcd/action/fermion/Fermion.h
lib/qcd/action/fermion/WilsonFermion.cc
lib/util/Init.cc
tests/Test_cayley_even_odd_vec.cc
2017-03-28 04:49:30 -04:00
Chulwoo Jung
e63be32ad2
zmobius Meooe5D fixed?
2017-03-28 03:48:50 -04:00
paboyle
75112a632a
IO improvements to fail on IO error
2017-03-28 02:28:04 -04:00
paboyle
18bde08d1b
Merge branch 'feature/staggering' into develop
2017-03-28 15:25:55 +09:00
Chulwoo Jung
33d59c8869
Adding Zmobius prec test
2017-03-27 21:40:27 -04:00
Chulwoo Jung
a833fd8dbf
Merge branch 'develop' of https://github.com/paboyle/Grid into bugfix/dminus
2017-03-27 21:37:26 -04:00
Guido Cossu
4c1ea8677e
Small cosmetic changes and vscode gitignore
2017-03-23 14:09:35 +09:00