1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-15 02:05:37 +00:00
Commit Graph

1714 Commits

Author SHA1 Message Date
Chulwoo Jung
25d4c175c3 Cleaning up Lanczos 2017-05-18 18:33:47 -04:00
Chulwoo Jung
a8d7986e1c Temporary (hopefully) change to run with GCC for now. 2017-05-05 10:55:07 -04:00
Chulwoo Jung
92ec509bfa Commiting to move to Jlab 2017-05-04 19:32:00 -04:00
Chulwoo Jung
e80a87ff7f Checking in before modifying 2017-05-04 16:05:07 -04:00
Chulwoo Jung
867fe93018 First Rotate reorg done. 2017-05-02 01:26:22 -04:00
Chulwoo Jung
09651c3326 Checking in before rearranging Lanczos 2017-05-02 00:47:18 -04:00
Chulwoo Jung
f87f2a3f8b Merge branch 'develop' of https://github.com/paboyle/Grid into feature/Lanczos 2017-05-01 12:00:47 -04:00
Peter Boyle
99220f6531 Fixes and better timing 2017-04-26 17:24:11 -04:00
Peter Boyle
f8797e1e3e bug fix. works now and great face performance 2017-04-26 03:14:02 -04:00
Peter Boyle
fd1eb7de13 Clean implementation of the exterior faces listing only those points on the boudary 2017-04-26 02:34:52 -04:00
Peter Boyle
2ce898efa3 Pretty code 2017-04-26 02:34:25 -04:00
paboyle
ab66bac4e6 Think I'm getting on top of the reduced cost exterior precomputed list of links 2017-04-25 08:50:26 +01:00
paboyle
56277a11c8 Build a list of whats on the surface 2017-04-24 17:06:15 +01:00
Peter Boyle
5b55867a7a Slightly cheaper Ext assembly 2017-04-24 05:36:11 -04:00
Peter Boyle
3accb1ef89 Debugged assemply split phase with interior suppression 2017-04-23 19:30:19 -04:00
Peter Boyle
e3d0e31525 Debugged assemply split phase with interior suppression 2017-04-23 19:29:27 -04:00
Peter Boyle
5812eb8a8c Partially fixed. But the comms-overlap does not work yet. 2017-04-22 18:50:25 -04:00
paboyle
ac58565d0a Dangerous rewrite of the assembly. If I make a mistake the debug will be painful. 2017-04-22 19:31:04 +01:00
paboyle
3703b718aa Mark up a table if a given site only receives from itself; including MPI3 splitting info. 2017-04-22 19:28:37 +01:00
paboyle
b722889234 Try a better load balancing loop 2017-04-22 19:27:41 +01:00
paboyle
abba44a837 Hand unrolled for overlapped comms 2017-04-22 17:45:17 +01:00
paboyle
f301be94ce Fixed 2017-04-22 17:42:31 +01:00
Peter Boyle
1d1b225497 Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node). 2017-04-22 09:05:28 -04:00
Peter Boyle
53a785a3dd Fixing the KNL compile 2017-04-22 08:11:51 -04:00
paboyle
736bf3c866 Major rework of stencil. Half precision and MPI3 now working. 2017-04-22 11:33:50 +01:00
paboyle
b9bbe5d188 L1p config bg/q 2017-04-22 11:33:09 +01:00
paboyle
3844bcf800 If no f16c instructions supported must use software half precision conversion.
This will also become useful on BG/Q, so will move out from SSE4 into a general area.
Lifted the Eigen half precision from web. Looks sensible, but not extensively regressed
against the intrinsics implementation yet.
2017-04-20 15:30:52 +01:00
paboyle
e1a2319d01 Simple compressor moved out of cshift into stencil 2017-04-20 13:18:15 +01:00
paboyle
180c732b4c Move compressors out of Cshift.
Slice iterators would help
2017-04-20 13:17:55 +01:00
paboyle
d2312e9874 Drop compressor entirely from Cshift to only Stencil. 2017-04-20 13:16:55 +01:00
paboyle
fc4ab9ccd5 Working half precision comms 2017-04-20 11:20:26 +01:00
paboyle
4a340aa5ca Massive compressor rework to support reduced precision comms 2017-04-20 09:28:27 +01:00
paboyle
3b7de792d5 Type comparison in the traits work 2017-04-18 13:28:04 +01:00
paboyle
557c3fa109 Pretty change 2017-04-18 13:27:38 +01:00
paboyle
8e161152e4 MultiRHS solver improvements with slice operations moved into lattice and sped up.
Block solver requires a lot of performance work.
2017-04-18 10:51:55 +01:00
paboyle
3141ebac10 MultiRHS working, starting to optimise. Block doesn't and I thought it already was; puzzled. 2017-04-17 10:50:19 +01:00
paboyle
7ede696126 Non compile of tests fixed 2017-04-16 23:40:00 +01:00
Chulwoo Jung
a07556dd5f Added back the convergence test from evecs of tridiagonal matrix. Bugfixes 2017-04-15 09:32:15 -04:00
paboyle
bf516c3b81 higher precision reduction variables in norm and inner product 2017-04-15 12:27:28 +01:00
paboyle
441a52ee5d First cut at higher precision reduction 2017-04-15 10:57:21 +01:00
paboyle
a8db024c92 Cleaning up the dense matrix and lanczos sector 2017-04-15 08:54:11 +01:00
paboyle
3ca41458a3 Fix to no USE_FP16 case 2017-04-14 14:20:54 +01:00
Peter Boyle
951be75292 Half precision conversion working on AVX512 now too 2017-04-13 17:35:11 +01:00
Peter Boyle
b9113ed310 Patches for knl 2017-04-13 12:02:12 -04:00
paboyle
42fb49d3fd Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-04-13 14:12:47 +01:00
paboyle
db5ea001a3 Update to use Xcode 8.3 since -mfp16 causes SIGILL 2017-04-13 12:22:40 +01:00
paboyle
1d502e4ed6 FP16 optional compile time 2017-04-13 11:55:24 +01:00
paboyle
73cdf0fffe Drop f16c from SSE because of a macos compile error on travis 2017-04-13 11:23:41 +01:00
paboyle
1c25773319 Trap illegal instructions 2017-04-13 10:51:40 +01:00
paboyle
94eb829d08 Align cast fixed for __mm128i gcc complained 2017-04-13 08:40:44 +01:00