portelli/Grid - Grid - DiRAC Tursa git server

mirror of https://github.com/paboyle/Grid.git synced 2026-07-04 17:33:29 +01:00

Author	SHA1	Message	Date
Peter Boyle	99220f6531	Fixes and better timing	2017-04-26 17:24:11 -04:00
Peter Boyle	f8797e1e3e	bug fix. works now and great face performance	2017-04-26 03:14:02 -04:00
Peter Boyle	fd1eb7de13	Clean implementation of the exterior faces listing only those points on the boudary	2017-04-26 02:34:52 -04:00
Peter Boyle	2ce898efa3	Pretty code	2017-04-26 02:34:25 -04:00
paboyle	ab66bac4e6	Think I'm getting on top of the reduced cost exterior precomputed list of links	2017-04-25 08:50:26 +01:00
paboyle	56277a11c8	Build a list of whats on the surface	2017-04-24 17:06:15 +01:00
Peter Boyle	5b55867a7a	Slightly cheaper Ext assembly	2017-04-24 05:36:11 -04:00
Peter Boyle	3accb1ef89	Debugged assemply split phase with interior suppression	2017-04-23 19:30:19 -04:00
Peter Boyle	e3d0e31525	Debugged assemply split phase with interior suppression	2017-04-23 19:29:27 -04:00
Peter Boyle	5812eb8a8c	Partially fixed. But the comms-overlap does not work yet.	2017-04-22 18:50:25 -04:00
paboyle	ac58565d0a	Dangerous rewrite of the assembly. If I make a mistake the debug will be painful.	2017-04-22 19:31:04 +01:00
paboyle	3703b718aa	Mark up a table if a given site only receives from itself; including MPI3 splitting info.	2017-04-22 19:28:37 +01:00
paboyle	b722889234	Try a better load balancing loop	2017-04-22 19:27:41 +01:00
paboyle	abba44a837	Hand unrolled for overlapped comms	2017-04-22 17:45:17 +01:00
paboyle	f301be94ce	Fixed	2017-04-22 17:42:31 +01:00
Peter Boyle	1d1b225497	Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node).	2017-04-22 09:05:28 -04:00
Peter Boyle	53a785a3dd	Fixing the KNL compile	2017-04-22 08:11:51 -04:00
paboyle	736bf3c866	Major rework of stencil. Half precision and MPI3 now working.	2017-04-22 11:33:50 +01:00
paboyle	b9bbe5d188	L1p config bg/q	2017-04-22 11:33:09 +01:00
paboyle	3844bcf800	If no f16c instructions supported must use software half precision conversion. This will also become useful on BG/Q, so will move out from SSE4 into a general area. Lifted the Eigen half precision from web. Looks sensible, but not extensively regressed against the intrinsics implementation yet.	2017-04-20 15:30:52 +01:00
paboyle	e1a2319d01	Simple compressor moved out of cshift into stencil	2017-04-20 13:18:15 +01:00
paboyle	180c732b4c	Move compressors out of Cshift. Slice iterators would help	2017-04-20 13:17:55 +01:00
paboyle	d2312e9874	Drop compressor entirely from Cshift to only Stencil.	2017-04-20 13:16:55 +01:00
paboyle	fc4ab9ccd5	Working half precision comms	2017-04-20 11:20:26 +01:00
paboyle	4a340aa5ca	Massive compressor rework to support reduced precision comms	2017-04-20 09:28:27 +01:00
paboyle	3b7de792d5	Type comparison in the traits work	2017-04-18 13:28:04 +01:00
paboyle	557c3fa109	Pretty change	2017-04-18 13:27:38 +01:00
paboyle	8e161152e4	MultiRHS solver improvements with slice operations moved into lattice and sped up. Block solver requires a lot of performance work.	2017-04-18 10:51:55 +01:00
paboyle	3141ebac10	MultiRHS working, starting to optimise. Block doesn't and I thought it already was; puzzled.	2017-04-17 10:50:19 +01:00
paboyle	7ede696126	Non compile of tests fixed	2017-04-16 23:40:00 +01:00
paboyle	bf516c3b81	higher precision reduction variables in norm and inner product	2017-04-15 12:27:28 +01:00
paboyle	441a52ee5d	First cut at higher precision reduction	2017-04-15 10:57:21 +01:00
paboyle	a8db024c92	Cleaning up the dense matrix and lanczos sector	2017-04-15 08:54:11 +01:00
paboyle	3ca41458a3	Fix to no USE_FP16 case	2017-04-14 14:20:54 +01:00
Peter Boyle	951be75292	Half precision conversion working on AVX512 now too	2017-04-13 17:35:11 +01:00
Peter Boyle	b9113ed310	Patches for knl	2017-04-13 12:02:12 -04:00
paboyle	42fb49d3fd	Merge branch 'develop' of https://github.com/paboyle/Grid into develop	2017-04-13 14:12:47 +01:00
paboyle	db5ea001a3	Update to use Xcode 8.3 since -mfp16 causes SIGILL	2017-04-13 12:22:40 +01:00
paboyle	1d502e4ed6	FP16 optional compile time	2017-04-13 11:55:24 +01:00
paboyle	73cdf0fffe	Drop f16c from SSE because of a macos compile error on travis	2017-04-13 11:23:41 +01:00
paboyle	1c25773319	Trap illegal instructions	2017-04-13 10:51:40 +01:00
paboyle	94eb829d08	Align cast fixed for __mm128i gcc complained	2017-04-13 08:40:44 +01:00
paboyle	68392ddb5b	Exchange in generic Precision change in AVX, SSE, AVX512, Generic. QPX still to do.	2017-04-13 08:38:12 +01:00
paboyle	cb6b81ae82	Half precision conversion	2017-04-12 19:32:37 +01:00
portelli	8ef4300412	spurious .dirstamp files removed	2017-04-10 17:00:22 +01:00
portelli	98a24ebf31	The macro “magics” is very intensive for the preprocessor in the measurement code which has numerous serialisable classes. Reducing the number of serialisable fields to 64 (instead of 1024) helps a lot, this is enough for now and can be extended trivially if needed in the future.	2017-04-10 16:58:54 +01:00
paboyle	b12dc89d26	Commenting and clean up	2017-04-10 20:38:20 +09:00
paboyle	d80d802f9d	MultiRHS solver test	2017-04-10 00:12:12 +09:00
paboyle	3d99b09dba	Start of blockCG	2017-04-09 23:42:10 +09:00
paboyle	db5f6d3ae3	Verbose fix	2017-04-09 23:41:30 +09:00

1 2 3 4 5 ...

1696 Commits