abba44a837
Hand unrolled for overlapped comms
2017-04-22 17:45:17 +01:00
f301be94ce
Fixed
2017-04-22 17:42:31 +01:00
1d1b225497
Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).
2017-04-22 09:05:28 -04:00
53a785a3dd
Fixing the KNL compile
2017-04-22 08:11:51 -04:00
736bf3c866
Major rework of stencil. Half precision and MPI3 now working.
2017-04-22 11:33:50 +01:00
b9bbe5d188
L1p config bg/q
2017-04-22 11:33:09 +01:00
3844bcf800
If no f16c instructions supported must use software half precision conversion.
...
This will also become useful on BG/Q, so will move out from SSE4 into a general area.
Lifted the Eigen half precision from web. Looks sensible, but not extensively regressed
against the intrinsics implementation yet.
2017-04-20 15:30:52 +01:00
e1a2319d01
Simple compressor moved out of cshift into stencil
2017-04-20 13:18:15 +01:00
180c732b4c
Move compressors out of Cshift.
...
Slice iterators would help
2017-04-20 13:17:55 +01:00
957a706d0b
Useful script
2017-04-20 13:17:44 +01:00
d2312e9874
Drop compressor entirely from Cshift to only Stencil.
2017-04-20 13:16:55 +01:00
fc4ab9ccd5
Working half precision comms
2017-04-20 11:20:26 +01:00
4a340aa5ca
Massive compressor rework to support reduced precision comms
2017-04-20 09:28:27 +01:00
3b7de792d5
Type comparison in the traits work
2017-04-18 13:28:04 +01:00
557c3fa109
Pretty change
2017-04-18 13:27:38 +01:00
ec18e9f7f6
Merge branch 'develop' into feature/half-prec-comms
2017-04-18 11:39:39 +01:00
a839d5bc55
Updated todo list
2017-04-18 11:22:17 +01:00
de41b84c5c
Merge branch 'feature/normHP' into develop
2017-04-18 10:57:21 +01:00
8e161152e4
MultiRHS solver improvements with slice operations moved into lattice and sped up.
...
Block solver requires a lot of performance work.
2017-04-18 10:51:55 +01:00
3141ebac10
MultiRHS working, starting to optimise. Block doesn't and I thought it already was; puzzled.
2017-04-17 10:50:19 +01:00
7ede696126
Non compile of tests fixed
2017-04-16 23:40:00 +01:00
bf516c3b81
higher precision reduction variables in norm and inner product
2017-04-15 12:27:28 +01:00
441a52ee5d
First cut at higher precision reduction
2017-04-15 10:57:21 +01:00
a8db024c92
Cleaning up the dense matrix and lanczos sector
2017-04-15 08:54:11 +01:00
a9c22d5f43
Verbose removal
2017-04-14 14:38:49 +01:00
3ca41458a3
Fix to no USE_FP16 case
2017-04-14 14:20:54 +01:00
9e2d29c644
USE_FP16 macro
2017-04-14 14:17:14 +01:00
951be75292
Half precision conversion working on AVX512 now too
2017-04-13 17:35:11 +01:00
b9113ed310
Patches for knl
2017-04-13 12:02:12 -04:00
42fb49d3fd
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2017-04-13 14:12:47 +01:00
2a54c9aaab
Merge branch 'feature/block-cg' into develop
2017-04-13 14:12:24 +01:00
0957378679
Fixing conditional ugly way
2017-04-13 13:47:56 +01:00
2ed6c76fc5
Getting multiline if then fi working
2017-04-13 13:43:13 +01:00
d3b9a7fa14
F16c apparently requires AVX, even if the 128 bit are used.
...
Seems odd.
2017-04-13 13:19:11 +01:00
75ea306ce9
Another try at travis
2017-04-13 13:05:32 +01:00
4226c633c4
Default to FP16 off again
2017-04-13 12:51:39 +01:00
5a4eafbf7e
.travis
2017-04-13 12:50:43 +01:00
eb8e26018b
Travis update for macos
2017-04-13 12:35:11 +01:00
db5ea001a3
Update to use Xcode 8.3 since -mfp16 causes SIGILL
2017-04-13 12:22:40 +01:00
2846f079e5
Predicate tests on fp16 being enabled
2017-04-13 12:08:05 +01:00
1d502e4ed6
FP16 optional compile time
2017-04-13 11:55:24 +01:00
73cdf0fffe
Drop f16c from SSE because of a macos compile error on travis
2017-04-13 11:23:41 +01:00
1c25773319
Trap illegal instructions
2017-04-13 10:51:40 +01:00
c38400b26f
Trap signals
2017-04-13 10:35:20 +01:00
9c3065b860
Debug flags off again
2017-04-13 10:01:32 +01:00
94eb829d08
Align cast fixed for __mm128i gcc complained
2017-04-13 08:40:44 +01:00
68392ddb5b
Exchange in generic
...
Precision change in AVX, SSE, AVX512, Generic. QPX still to do.
2017-04-13 08:38:12 +01:00
cb6b81ae82
Half precision conversion
2017-04-12 19:32:37 +01:00
8ef4300412
spurious .dirstamp files removed
2017-04-10 17:00:22 +01:00
98a24ebf31
The macro “magics” is very intensive for the preprocessor in the measurement code which has numerous serialisable classes. Reducing the number of serialisable fields to 64 (instead of 1024) helps a lot, this is enough for now and can be extended trivially if needed in the future.
2017-04-10 16:58:54 +01:00