1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-04-19 18:21:02 +01:00
Commit Graph

2067 Commits

Author SHA1 Message Date
Guido Cossu 3344788fa1 Merge branch 'develop' into feature/hmc_generalise 2017-05-01 12:13:56 +01:00
Peter Boyle 99220f6531 Fixes and better timing 2017-04-26 17:24:11 -04:00
Peter Boyle f8797e1e3e bug fix. works now and great face performance 2017-04-26 03:14:02 -04:00
Peter Boyle fd1eb7de13 Clean implementation of the exterior faces listing only those points on the boudary 2017-04-26 02:34:52 -04:00
Peter Boyle 2ce898efa3 Pretty code 2017-04-26 02:34:25 -04:00
paboyle ab66bac4e6 Think I'm getting on top of the reduced cost exterior precomputed list of links 2017-04-25 08:50:26 +01:00
paboyle 56277a11c8 Build a list of whats on the surface 2017-04-24 17:06:15 +01:00
Peter Boyle 5b55867a7a Slightly cheaper Ext assembly 2017-04-24 05:36:11 -04:00
Peter Boyle 3accb1ef89 Debugged assemply split phase with interior suppression 2017-04-23 19:30:19 -04:00
Peter Boyle e3d0e31525 Debugged assemply split phase with interior suppression 2017-04-23 19:29:27 -04:00
Peter Boyle 5812eb8a8c Partially fixed. But the comms-overlap does not work yet. 2017-04-22 18:50:25 -04:00
paboyle ac58565d0a Dangerous rewrite of the assembly. If I make a mistake the debug will be painful. 2017-04-22 19:31:04 +01:00
paboyle 3703b718aa Mark up a table if a given site only receives from itself; including MPI3 splitting info. 2017-04-22 19:28:37 +01:00
paboyle b722889234 Try a better load balancing loop 2017-04-22 19:27:41 +01:00
paboyle abba44a837 Hand unrolled for overlapped comms 2017-04-22 17:45:17 +01:00
paboyle f301be94ce Fixed 2017-04-22 17:42:31 +01:00
Peter Boyle 1d1b225497 Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node). 2017-04-22 09:05:28 -04:00
Peter Boyle 53a785a3dd Fixing the KNL compile 2017-04-22 08:11:51 -04:00
paboyle 736bf3c866 Major rework of stencil. Half precision and MPI3 now working. 2017-04-22 11:33:50 +01:00
paboyle b9bbe5d188 L1p config bg/q 2017-04-22 11:33:09 +01:00
paboyle 3844bcf800 If no f16c instructions supported must use software half precision conversion.
This will also become useful on BG/Q, so will move out from SSE4 into a general area.
Lifted the Eigen half precision from web. Looks sensible, but not extensively regressed
against the intrinsics implementation yet.
2017-04-20 15:30:52 +01:00
paboyle e1a2319d01 Simple compressor moved out of cshift into stencil 2017-04-20 13:18:15 +01:00
paboyle 180c732b4c Move compressors out of Cshift.
Slice iterators would help
2017-04-20 13:17:55 +01:00
paboyle d2312e9874 Drop compressor entirely from Cshift to only Stencil. 2017-04-20 13:16:55 +01:00
paboyle fc4ab9ccd5 Working half precision comms 2017-04-20 11:20:26 +01:00
paboyle 4a340aa5ca Massive compressor rework to support reduced precision comms 2017-04-20 09:28:27 +01:00
paboyle 3b7de792d5 Type comparison in the traits work 2017-04-18 13:28:04 +01:00
paboyle 557c3fa109 Pretty change 2017-04-18 13:27:38 +01:00
paboyle 8e161152e4 MultiRHS solver improvements with slice operations moved into lattice and sped up.
Block solver requires a lot of performance work.
2017-04-18 10:51:55 +01:00
paboyle 3141ebac10 MultiRHS working, starting to optimise. Block doesn't and I thought it already was; puzzled. 2017-04-17 10:50:19 +01:00
paboyle 7ede696126 Non compile of tests fixed 2017-04-16 23:40:00 +01:00
paboyle bf516c3b81 higher precision reduction variables in norm and inner product 2017-04-15 12:27:28 +01:00
paboyle 441a52ee5d First cut at higher precision reduction 2017-04-15 10:57:21 +01:00
paboyle a8db024c92 Cleaning up the dense matrix and lanczos sector 2017-04-15 08:54:11 +01:00
paboyle 3ca41458a3 Fix to no USE_FP16 case 2017-04-14 14:20:54 +01:00
Peter Boyle 951be75292 Half precision conversion working on AVX512 now too 2017-04-13 17:35:11 +01:00
Peter Boyle b9113ed310 Patches for knl 2017-04-13 12:02:12 -04:00
portelli a6a0da873f Merge branch 'feature/hadrons' into feature/qed-fvol 2017-04-13 15:31:06 +01:00
paboyle 42fb49d3fd Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-04-13 14:12:47 +01:00
paboyle db5ea001a3 Update to use Xcode 8.3 since -mfp16 causes SIGILL 2017-04-13 12:22:40 +01:00
paboyle 1d502e4ed6 FP16 optional compile time 2017-04-13 11:55:24 +01:00
paboyle 73cdf0fffe Drop f16c from SSE because of a macos compile error on travis 2017-04-13 11:23:41 +01:00
paboyle 1c25773319 Trap illegal instructions 2017-04-13 10:51:40 +01:00
paboyle 94eb829d08 Align cast fixed for __mm128i gcc complained 2017-04-13 08:40:44 +01:00
paboyle 68392ddb5b Exchange in generic
Precision change in AVX, SSE, AVX512, Generic. QPX still to do.
2017-04-13 08:38:12 +01:00
paboyle cb6b81ae82 Half precision conversion 2017-04-12 19:32:37 +01:00
portelli 53e76b41d2 Merge branch 'develop' into feature/hadrons 2017-04-10 17:00:53 +01:00
portelli 8ef4300412 spurious .dirstamp files removed 2017-04-10 17:00:22 +01:00
portelli 98a24ebf31 The macro “magics” is very intensive for the preprocessor in the measurement code which has numerous serialisable classes. Reducing the number of serialisable fields to 64 (instead of 1024) helps a lot, this is enough for now and can be extended trivially if needed in the future. 2017-04-10 16:58:54 +01:00
paboyle b12dc89d26 Commenting and clean up 2017-04-10 20:38:20 +09:00