paboyle
c429ace748
Cleaner OpenMP use
2017-04-22 20:28:42 +01:00
paboyle
ac58565d0a
Dangerous rewrite of the assembly. If I make a mistake the debug will be painful.
2017-04-22 19:31:04 +01:00
paboyle
3703b718aa
Mark up a table if a given site only receives from itself; including MPI3 splitting info.
2017-04-22 19:28:37 +01:00
paboyle
b722889234
Try a better load balancing loop
2017-04-22 19:27:41 +01:00
paboyle
abba44a837
Hand unrolled for overlapped comms
2017-04-22 17:45:17 +01:00
paboyle
f301be94ce
Fixed
2017-04-22 17:42:31 +01:00
Peter Boyle
1d1b225497
Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).
2017-04-22 09:05:28 -04:00
Peter Boyle
53a785a3dd
Fixing the KNL compile
2017-04-22 08:11:51 -04:00
paboyle
736bf3c866
Major rework of stencil. Half precision and MPI3 now working.
2017-04-22 11:33:50 +01:00
paboyle
b9bbe5d188
L1p config bg/q
2017-04-22 11:33:09 +01:00
paboyle
3844bcf800
If no f16c instructions supported must use software half precision conversion.
...
This will also become useful on BG/Q, so will move out from SSE4 into a general area.
Lifted the Eigen half precision from web. Looks sensible, but not extensively regressed
against the intrinsics implementation yet.
2017-04-20 15:30:52 +01:00
paboyle
e1a2319d01
Simple compressor moved out of cshift into stencil
2017-04-20 13:18:15 +01:00
paboyle
180c732b4c
Move compressors out of Cshift.
...
Slice iterators would help
2017-04-20 13:17:55 +01:00
paboyle
957a706d0b
Useful script
2017-04-20 13:17:44 +01:00
paboyle
d2312e9874
Drop compressor entirely from Cshift to only Stencil.
2017-04-20 13:16:55 +01:00
paboyle
fc4ab9ccd5
Working half precision comms
2017-04-20 11:20:26 +01:00
paboyle
4a340aa5ca
Massive compressor rework to support reduced precision comms
2017-04-20 09:28:27 +01:00
paboyle
3b7de792d5
Type comparison in the traits work
2017-04-18 13:28:04 +01:00
paboyle
557c3fa109
Pretty change
2017-04-18 13:27:38 +01:00
paboyle
ec18e9f7f6
Merge branch 'develop' into feature/half-prec-comms
2017-04-18 11:39:39 +01:00
paboyle
a839d5bc55
Updated todo list
2017-04-18 11:22:17 +01:00
paboyle
de41b84c5c
Merge branch 'feature/normHP' into develop
2017-04-18 10:57:21 +01:00
paboyle
8e161152e4
MultiRHS solver improvements with slice operations moved into lattice and sped up.
...
Block solver requires a lot of performance work.
2017-04-18 10:51:55 +01:00
paboyle
3141ebac10
MultiRHS working, starting to optimise. Block doesn't and I thought it already was; puzzled.
2017-04-17 10:50:19 +01:00
paboyle
7ede696126
Non compile of tests fixed
2017-04-16 23:40:00 +01:00
paboyle
bf516c3b81
higher precision reduction variables in norm and inner product
2017-04-15 12:27:28 +01:00
paboyle
441a52ee5d
First cut at higher precision reduction
2017-04-15 10:57:21 +01:00
paboyle
a8db024c92
Cleaning up the dense matrix and lanczos sector
2017-04-15 08:54:11 +01:00
paboyle
a9c22d5f43
Verbose removal
2017-04-14 14:38:49 +01:00
paboyle
3ca41458a3
Fix to no USE_FP16 case
2017-04-14 14:20:54 +01:00
paboyle
9e2d29c644
USE_FP16 macro
2017-04-14 14:17:14 +01:00
Peter Boyle
951be75292
Half precision conversion working on AVX512 now too
2017-04-13 17:35:11 +01:00
Peter Boyle
b9113ed310
Patches for knl
2017-04-13 12:02:12 -04:00
paboyle
42fb49d3fd
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2017-04-13 14:12:47 +01:00
paboyle
2a54c9aaab
Merge branch 'feature/block-cg' into develop
2017-04-13 14:12:24 +01:00
paboyle
0957378679
Fixing conditional ugly way
2017-04-13 13:47:56 +01:00
paboyle
2ed6c76fc5
Getting multiline if then fi working
2017-04-13 13:43:13 +01:00
paboyle
d3b9a7fa14
F16c apparently requires AVX, even if the 128 bit are used.
...
Seems odd.
2017-04-13 13:19:11 +01:00
paboyle
75ea306ce9
Another try at travis
2017-04-13 13:05:32 +01:00
paboyle
4226c633c4
Default to FP16 off again
2017-04-13 12:51:39 +01:00
paboyle
5a4eafbf7e
.travis
2017-04-13 12:50:43 +01:00
paboyle
eb8e26018b
Travis update for macos
2017-04-13 12:35:11 +01:00
paboyle
db5ea001a3
Update to use Xcode 8.3 since -mfp16 causes SIGILL
2017-04-13 12:22:40 +01:00
paboyle
2846f079e5
Predicate tests on fp16 being enabled
2017-04-13 12:08:05 +01:00
paboyle
1d502e4ed6
FP16 optional compile time
2017-04-13 11:55:24 +01:00
paboyle
73cdf0fffe
Drop f16c from SSE because of a macos compile error on travis
2017-04-13 11:23:41 +01:00
paboyle
1c25773319
Trap illegal instructions
2017-04-13 10:51:40 +01:00
paboyle
c38400b26f
Trap signals
2017-04-13 10:35:20 +01:00
paboyle
9c3065b860
Debug flags off again
2017-04-13 10:01:32 +01:00
paboyle
94eb829d08
Align cast fixed for __mm128i gcc complained
2017-04-13 08:40:44 +01:00