1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-09 23:45:36 +00:00
Commit Graph

6004 Commits

Author SHA1 Message Date
nmeyer-ur
67db4993c2 reset head, update SVE readme 2020-07-07 19:54:52 +02:00
nmeyer-ur
fd3c8b0e85 correct build instructions qp4 2020-07-01 09:00:38 +02:00
nmeyer-ur
1635c263ee disable TOFU by default 2020-06-30 19:27:08 +02:00
nmeyer-ur
a87e45ba25 SVE readme update 2020-06-18 11:23:08 +02:00
nmeyer-ur
465856331a switch back to serialized; wrong results on single too 2020-06-15 15:39:39 +02:00
nmeyer-ur
cc958aa9ed switch back to standard MPI_init due to wrong results in Benchmark_wilson using comms-overlap 2020-06-15 14:21:38 +02:00
nmeyer-ur
a25e4b3d0c pred 32/64 for float/double instead of 8 in VLA patch 2020-06-13 14:44:37 +02:00
nmeyer-ur
d1210ca12a switch to double/float instead of float64_t/float32_t in VLA patch 2020-06-13 13:59:32 +02:00
nmeyer-ur
36ea0e222a type traits for ComplexF/D in VLA patch; cosmetics in VLS intrinsics 2020-06-13 13:42:35 +02:00
nmeyer-ur
92281ec22d add 3 op Mult for VLA 2020-06-12 18:49:05 +02:00
nmeyer-ur
87266ce099 comment out fcmla in vector types: need also MultAddReal 2020-06-12 18:37:19 +02:00
nmeyer-ur
2a23f133e8 reenable fcmla for VLA 2020-06-12 17:30:38 +02:00
nmeyer-ur
8dbf790f62 correct tbl2 for sp 2020-06-12 17:12:34 +02:00
nmeyer-ur
2402b4940e vec_imm in float 2020-06-12 15:17:38 +02:00
nmeyer-ur
2111052fbe apply VLA patch for memcpy reduction suggested by Arm, CAS-162542-D6W7Z7 2020-06-12 14:49:19 +02:00
nmeyer-ur
433766ac62 revert Add/SubTimesI and prefetching in stencil
This reverts commit 9b2699226c.
2020-06-08 12:02:53 +02:00
nmeyer-ur
93a37c8f68 test prefetch to L2 in stencil 2020-06-08 09:39:50 +02:00
nmeyer-ur
9872c76825 introduce AddTimesI and SubTimesI; slight benefit in operators, but < 1%; breaks all other impls 2020-06-03 15:20:13 +02:00
nmeyer-ur
5ee3ea2144 round-up after testing of prefetches in stencil close 2020-06-03 11:58:20 +02:00
nmeyer-ur
5050833b42 revert changes due to performance penalty in Wilson using MPI 2020-06-02 13:08:57 +02:00
nmeyer-ur
7bee4ebb54 correct predication for svcadd 2020-06-02 10:51:39 +02:00
nmeyer-ur
71cf9851e7 correct type for vecd in TimesI and TimesMinusI 2020-06-02 10:44:15 +02:00
nmeyer-ur
b4735c9904 correct zero in svcadd 2020-06-02 10:38:05 +02:00
nmeyer-ur
9b2699226c use fcadd in TimesI and TimesMinusI instead of tbl and neg 2020-06-02 10:32:44 +02:00
nmeyer-ur
5f52804907 update calculation of data 2020-05-30 10:55:17 +02:00
nmeyer-ur
936071773e correct throughput in wilson and dwf 2020-05-29 22:15:59 +02:00
nmeyer-ur
1732f9319e more mods; counters seem to work correctly 2020-05-29 18:44:00 +02:00
nmeyer-ur
91c81cab30 some corrections; compiles on my laptop; untested 2020-05-29 18:19:22 +02:00
nmeyer-ur
38164f8480 include counters in WilsonFermionImplementation.h 2020-05-29 17:59:26 +02:00
nmeyer-ur
f013979791 add counter support in WilsonFermion.h 2020-05-29 17:13:59 +02:00
nmeyer-ur
e947b563ea add space in stencil output 2020-05-29 17:11:17 +02:00
nmeyer-ur
5cb3530c34 enable counters in Benchmark_wilson 2020-05-29 15:44:52 +02:00
nmeyer-ur
250008372f update SVE readme 2020-05-29 15:44:25 +02:00
nmeyer-ur
4fedd8d29f switch to MPI_THREAD_SERIALIZED instead of SINGLE 2020-05-27 14:08:34 +02:00
nmeyer-ur
6ddcef1bca fix build error enabling fcmla/mac in vector types for VLA 2020-05-21 21:21:03 +02:00
nmeyer-ur
8c5a5fdfce disable fcmla in vector type building for VLA 2020-05-21 19:41:42 +02:00
nmeyer-ur
046b1cbbc0 enable fcmla in tensor arithmetics; fixed-size works, VLA does not compile 2020-05-21 19:39:07 +02:00
nmeyer-ur
a65ce237c1 clean up; Exch1 VLA sp+dp integrate, tested, working 2020-05-21 09:48:06 +02:00
nmeyer-ur
cd27f1005d clean up; Exch1 sp integrate, tested, working 2020-05-21 08:45:43 +02:00
nmeyer-ur
f8c0a59221 clean up; Exch1 dp integrate, tested, working 2020-05-21 02:48:14 +02:00
nmeyer-ur
832485699f save some cycles in HtoD and DtoH by direct instead of multi-pass conversion 2020-05-20 23:04:35 +02:00
nmeyer-ur
81484a4760 symmetrize Mult and MultAddComplex 2020-05-20 22:36:45 +02:00
nmeyer-ur
9a86059761 symmetrize VLA and fixed size build messages 2020-05-20 20:05:42 +02:00
nmeyer-ur
b780b7b7a0 guard prevents multiple TOFU messages 2020-05-20 19:20:59 +02:00
nmeyer-ur
9e085bd04e guard prevents multiple A64FX build messages 2020-05-20 19:16:30 +02:00
nmeyer-ur
6b6bf537d3 comment out mac in vector types 2020-05-18 20:36:16 +02:00
nmeyer-ur
323a651c71 correct typo 2020-05-18 19:58:27 +02:00
nmeyer-ur
9f212679f1 support fcmla in vector_types, untested 2020-05-18 19:55:18 +02:00
nmeyer-ur
032f7dde1a update SVE readme, asm generator 2020-05-18 19:10:36 +02:00
nmeyer-ur
50b1db1e8b implemented correct _m form (using 3 operands instead of 2) 2020-05-15 10:01:05 +02:00