Peter Boyle
|
99220f6531
|
Fixes and better timing
|
2017-04-26 17:24:11 -04:00 |
|
Peter Boyle
|
f8797e1e3e
|
bug fix. works now and great face performance
|
2017-04-26 03:14:02 -04:00 |
|
Peter Boyle
|
fd1eb7de13
|
Clean implementation of the exterior faces listing only those points on the boudary
|
2017-04-26 02:34:52 -04:00 |
|
Peter Boyle
|
2ce898efa3
|
Pretty code
|
2017-04-26 02:34:25 -04:00 |
|
paboyle
|
ab66bac4e6
|
Think I'm getting on top of the reduced cost exterior precomputed list of links
|
2017-04-25 08:50:26 +01:00 |
|
paboyle
|
56277a11c8
|
Build a list of whats on the surface
|
2017-04-24 17:06:15 +01:00 |
|
Peter Boyle
|
5b55867a7a
|
Slightly cheaper Ext assembly
|
2017-04-24 05:36:11 -04:00 |
|
Peter Boyle
|
3accb1ef89
|
Debugged assemply split phase with interior suppression
|
2017-04-23 19:30:19 -04:00 |
|
Peter Boyle
|
e3d0e31525
|
Debugged assemply split phase with interior suppression
|
2017-04-23 19:29:27 -04:00 |
|
Peter Boyle
|
5812eb8a8c
|
Partially fixed. But the comms-overlap does not work yet.
|
2017-04-22 18:50:25 -04:00 |
|
paboyle
|
ac58565d0a
|
Dangerous rewrite of the assembly. If I make a mistake the debug will be painful.
|
2017-04-22 19:31:04 +01:00 |
|
paboyle
|
b722889234
|
Try a better load balancing loop
|
2017-04-22 19:27:41 +01:00 |
|
paboyle
|
abba44a837
|
Hand unrolled for overlapped comms
|
2017-04-22 17:45:17 +01:00 |
|
paboyle
|
f301be94ce
|
Fixed
|
2017-04-22 17:42:31 +01:00 |
|
Peter Boyle
|
1d1b225497
|
Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).
|
2017-04-22 09:05:28 -04:00 |
|
Peter Boyle
|
53a785a3dd
|
Fixing the KNL compile
|
2017-04-22 08:11:51 -04:00 |
|
paboyle
|
736bf3c866
|
Major rework of stencil. Half precision and MPI3 now working.
|
2017-04-22 11:33:50 +01:00 |
|
paboyle
|
fc4ab9ccd5
|
Working half precision comms
|
2017-04-20 11:20:26 +01:00 |
|
paboyle
|
4a340aa5ca
|
Massive compressor rework to support reduced precision comms
|
2017-04-20 09:28:27 +01:00 |
|
paboyle
|
42fb49d3fd
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2017-04-13 14:12:47 +01:00 |
|
|
8ef4300412
|
spurious .dirstamp files removed
|
2017-04-10 17:00:22 +01:00 |
|
paboyle
|
db5f6d3ae3
|
Verbose fix
|
2017-04-09 23:41:30 +09:00 |
|
paboyle
|
86aaa35294
|
Christoph needs SchurDiagTwoKappa which is mobius specific.
|
2017-04-07 11:07:40 +09:00 |
|
paboyle
|
1c4bc7ed38
|
Debugged staggered conventions
|
2017-03-31 14:41:48 +09:00 |
|
paboyle
|
9fd23faadf
|
Pretty layout
|
2017-03-30 13:44:45 +09:00 |
|
paboyle
|
10e4fa0dc8
|
Template instantiation improvements
|
2017-03-30 13:44:25 +09:00 |
|
paboyle
|
c4aca1dde4
|
Conjugate coefficients on adjoint
|
2017-03-30 13:44:05 +09:00 |
|
paboyle
|
b9e8ea3aaa
|
conjugate coefficient on the dagger
|
2017-03-30 13:43:13 +09:00 |
|
paboyle
|
077aa728b9
|
Fix the ZMobius (I think)
|
2017-03-30 13:42:09 +09:00 |
|
paboyle
|
a8d83d886e
|
Macro controls
|
2017-03-30 13:31:34 +09:00 |
|
paboyle
|
7fd46eeec4
|
Trailing whitespace removal
|
2017-03-30 13:31:10 +09:00 |
|
paboyle
|
2b115929dc
|
Small AVX512 asm ifdef patch
|
2017-03-29 18:51:23 +09:00 |
|
paboyle
|
d805867e02
|
Better init
|
2017-03-28 13:25:05 -04:00 |
|
paboyle
|
98f9318279
|
Build on AVX2 and MPI passing with clang++
|
2017-03-28 23:16:04 +09:00 |
|
paboyle
|
4b17e8eba8
|
Merge branch 'develop' into feature/bgq-asm
Conflicts:
lib/qcd/action/fermion/Fermion.h
lib/qcd/action/fermion/WilsonFermion.cc
lib/util/Init.cc
tests/Test_cayley_even_odd_vec.cc
|
2017-03-28 04:49:30 -04:00 |
|
paboyle
|
18bde08d1b
|
Merge branch 'feature/staggering' into develop
|
2017-03-28 15:25:55 +09:00 |
|
paboyle
|
e7c36771ed
|
ZMobius prep for asm
|
2017-03-15 14:23:33 -04:00 |
|
paboyle
|
8dc57a1e25
|
Layout change
|
2017-03-13 11:11:46 +00:00 |
|
paboyle
|
f57bd770b0
|
Merge branch 'bugfix/dminus' into feature/bgq-asm
|
2017-03-13 11:11:03 +00:00 |
|
Chulwoo Jung
|
33edde245d
|
Changing Dminus(Dag) to use full vectors to work correctly
|
2017-03-12 23:02:42 -04:00 |
|
paboyle
|
447c5e6cd7
|
Z mobius hermiticity correction
|
2017-03-13 01:30:43 +00:00 |
|
paboyle
|
8b99d80d8c
|
Merge branch 'bgq-asm-shmemfixes' into feature/bgq-asm
|
2017-03-12 23:30:09 +00:00 |
|
paboyle
|
af230a1fb8
|
Average the time across the whole machine for outliers
|
2017-02-28 17:05:22 -05:00 |
|
Christopher Kelly
|
06a132e3f9
|
Fixes to SHMEM comms
|
2017-02-28 13:31:54 -08:00 |
|
paboyle
|
e099dcdae7
|
Merge branch 'develop' into feature/bgq-asm
|
2017-02-23 00:25:29 +00:00 |
|
paboyle
|
4e7ab3166f
|
Refactoring header layout
|
2017-02-22 18:09:33 +00:00 |
|
azusayamaguchi
|
1c30e9a961
|
Verified
|
2017-02-21 23:01:25 +00:00 |
|
azusayamaguchi
|
bf7e3f20d4
|
Staggaered fermion optimised version
|
2017-02-21 14:35:42 +00:00 |
|
paboyle
|
3ae92fa2e6
|
Global changes to parallel_for structure.
Move the comms flags to more sensible names
|
2017-02-21 05:24:27 -05:00 |
|
paboyle
|
2c246551d0
|
Overlap comms and compute options in wilson kernels
|
2017-02-07 01:37:10 -05:00 |
|