Peter Boyle
|
b2b5137d28
|
Finally starting to get decent performance on Volta
|
2018-07-13 12:06:18 -04:00 |
|
Peter Boyle
|
2cc07450f4
|
Fastest option for the dslash
|
2018-07-05 09:57:55 -04:00 |
|
Peter Boyle
|
4730d4692a
|
Fast lane extract, saturates bandwidth on Volta for SU3 benchmarks
|
2018-07-05 07:03:33 -04:00 |
|
Peter Boyle
|
1bb456c0c5
|
Minor GPU vector width changeÂ
|
2018-07-05 07:02:04 -04:00 |
|
paboyle
|
847c761ccc
|
Move sfw IEEE fp16 into central location
|
2018-06-13 20:22:01 +01:00 |
|
paboyle
|
8287ed8383
|
New GPU vector targets
|
2018-06-13 20:21:35 +01:00 |
|
paboyle
|
066be31a3b
|
Optional GPU target SIMD types; work in progress and trying experiments
|
2018-06-13 20:07:55 +01:00 |
|
Peter Boyle
|
a8a0bb85cc
|
Control scalar execution or vector under generic. Disable Eigen vectorisation on powerpc / SUmmit
|
2018-04-12 12:32:57 -04:00 |
|
Peter Boyle
|
55be842d23
|
Dont force l1p.h so early
|
2018-03-22 18:01:43 -04:00 |
|
paboyle
|
ede0dff794
|
Mark up as an accelerator function
|
2018-02-02 11:36:44 +00:00 |
|
paboyle
|
c67c1544cd
|
abs no compile on travis fix attempt
|
2018-01-28 10:26:04 +00:00 |
|
paboyle
|
44ef5bc207
|
Zero changes (literally speaking).
|
2018-01-27 23:46:28 +00:00 |
|
paboyle
|
8b371ffa94
|
Hide internal data
|
2018-01-26 23:03:54 +00:00 |
|
paboyle
|
461df78a3f
|
Better to use Zero(), and not zero static data
|
2018-01-25 23:36:22 +00:00 |
|
paboyle
|
db9c9475d4
|
const
|
2018-01-25 23:36:06 +00:00 |
|
paboyle
|
421401af55
|
Remove IMCI as really don't support
|
2018-01-24 13:53:21 +00:00 |
|
paboyle
|
0626c1e39e
|
Accelerator flaggina dn thrust complex for NVCC
|
2018-01-24 13:50:41 +00:00 |
|
paboyle
|
725f03e2e2
|
Accelerator markup and thrust complex on nvcc
|
2018-01-24 13:50:10 +00:00 |
|
paboyle
|
408b868475
|
Generic for GPU needs accelerator markup of functions
|
2018-01-24 13:49:12 +00:00 |
|
paboyle
|
1c797deb04
|
Accelerator tweaks
|
2018-01-24 13:43:43 +00:00 |
|
paboyle
|
bd15c38ae8
|
Formatting emacs compliant
|
2018-01-12 23:25:02 +00:00 |
|
paboyle
|
b815f5f764
|
Formatting
|
2018-01-12 23:23:21 +00:00 |
|
paboyle
|
4da437431e
|
Reformat
|
2018-01-12 23:22:46 +00:00 |
|
paboyle
|
3c7bf211a9
|
Reformat
|
2018-01-12 23:22:18 +00:00 |
|
paboyle
|
347d5404dd
|
format
|
2018-01-12 23:21:25 +00:00 |
|
paboyle
|
5e2cd0d07c
|
Format
|
2018-01-12 23:18:22 +00:00 |
|
paboyle
|
62fcee72c5
|
Format, NAMESPACE
|
2018-01-12 23:16:37 +00:00 |
|
paboyle
|
0a6168eef0
|
Format emacs style
|
2018-01-12 23:11:22 +00:00 |
|
paboyle
|
63865e4232
|
format
|
2018-01-12 23:10:48 +00:00 |
|
paboyle
|
c64deedf74
|
Format
|
2018-01-12 23:09:35 +00:00 |
|
paboyle
|
3281559ec3
|
Format
|
2018-01-12 23:09:01 +00:00 |
|
paboyle
|
6a2eca2ec2
|
NAMESAPCE
|
2018-01-12 23:00:03 +00:00 |
|
paboyle
|
d8ff895e74
|
NAMESPACE and format
|
2018-01-12 18:27:22 +00:00 |
|
paboyle
|
00c49d4c17
|
Format
|
2018-01-12 18:25:39 +00:00 |
|
paboyle
|
ec89714cce
|
NAMESPACE
|
2018-01-12 18:24:16 +00:00 |
|
paboyle
|
6ab744c720
|
NAMESPACE and formatting
|
2018-01-12 18:11:04 +00:00 |
|
paboyle
|
bbb657da5c
|
NAMESPACE and formatting
|
2018-01-12 18:10:11 +00:00 |
|
paboyle
|
fbc2380cb8
|
NAMESPACE & format
|
2018-01-12 18:05:36 +00:00 |
|
paboyle
|
08682c5461
|
NAMESPACE and format to my liking
|
2018-01-12 18:03:57 +00:00 |
|
paboyle
|
13bce2a6bf
|
NAMESPACE
|
2018-01-12 17:58:53 +00:00 |
|
paboyle
|
70e689900b
|
NAMESPACE
|
2018-01-12 17:58:13 +00:00 |
|
Peter Boyle
|
bfb68e6f02
|
Merge pull request #130 from giltirn/gparity-handunroll
Gparity handunroll
|
2017-09-21 10:11:00 +01:00 |
|
Nils Meyer
|
4e907fef2c
|
Merge remote-tracking branch 'grid/develop' into feature/arm-neon
|
2017-08-29 17:47:36 +02:00 |
|
Christopher Kelly
|
f365a83fae
|
In G-parity unrolled kernel, replaced calls to permute and exchange with run-time-evaluated permute type with explicit calls to appropriate underlying functions
|
2017-08-25 14:24:11 -04:00 |
|
Nils Meyer
|
7a53dc3715
|
Added integer reduce functionality
|
2017-07-24 11:12:59 +02:00 |
|
Guido Cossu
|
8859a151cc
|
Small corrections to the NEON port
|
2017-06-29 11:30:29 +01:00 |
|
Guido Cossu
|
688a39cfd9
|
Merge pull request #114 from nmeyer-ur/feature/arm-neon
ARM neon intrinsics support
Guido: checked and approved
|
2017-06-29 09:57:17 +01:00 |
|
Nils Meyer
|
0933aeefd4
|
corrected Grid_neon.h
|
2017-06-28 20:22:22 +02:00 |
|
Nils Meyer
|
a9c816a268
|
moved file to correct folder
|
2017-06-27 21:39:15 +02:00 |
|
Nils Meyer
|
bf729766dd
|
removed collision with QPX implementation
|
2017-06-27 20:32:24 +02:00 |
|