1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-15 02:05:37 +00:00
Commit Graph

21 Commits

Author SHA1 Message Date
Peter Boyle
19b527e83f Better extract merge for GPU. Let the SIMD header files define the pointer type for
access. GPU redirects through builtin float2, double2 for complex
2018-07-05 07:05:13 -04:00
Peter Boyle
09cd46d337 Lane by Lane operation 2018-05-12 17:59:35 -04:00
Peter Boyle
90a2efb9b3 Hit an annoying strict alias optimisation in GCC 4.9 through 6.3
Chris K was correct. It appears that an additional memcpy (UGHHH) is enough
to suppress the compiler
2018-03-07 07:27:26 -08:00
paboyle
4d53703c67 Scalar type differeing allowed, eg. precisoin change 2018-03-05 11:39:52 +00:00
paboyle
e5ea04ee0c Need to support precision change, and real replication in multiple simd lanes 2018-03-04 15:53:04 +00:00
paboyle
7574c18cef Massive clean up extract merge.
Simpler and GPU friendly
2018-02-24 22:21:08 +00:00
paboyle
8e99264f40 Accelerator mark up of entire tensore space for offload 2018-01-24 13:27:30 +00:00
paboyle
c037244874 Tensor reformatted with NAMESPACE too 2018-01-13 00:31:02 +00:00
paboyle
3cbe974eb4 Layout 2016-10-20 16:55:21 +01:00
Christopher Kelly
85ed8175cb Implemented mixed precision CG. Fixed filelist to exclude lib/Old directory and include Config.h. 2016-07-06 15:57:04 -04:00
paboyle
8fd8bc25e9 simd 5th dim with rotation 2016-04-19 15:39:00 -07:00
Peter Boyle
c9fadf97a5 Simplify the compressor interface again. 2016-02-17 18:16:45 -06:00
Peter Boyle
c650bb3f3d Very small merge speed up. 2016-02-16 18:41:53 -06:00
paboyle
fc6ad65751 Pushed the overlap comms tweaks 2016-01-11 06:34:22 -08:00
paboyle
aae8bf31a7 Global edit adding copyright and license info to every source file. 2016-01-02 14:51:32 +00:00
paboyle
145a295231 Bug fix for stencil with large shifts (3+), would be important to naik term for example but did not
impact Wilson based nearest neighbour stencils.
2015-12-30 19:29:48 +00:00
Peter Boyle
955b482aaf Partial optimisation of the extraction/merger of simd vecs. 2015-11-06 05:26:20 -06:00
Peter Boyle
d1afebf71e Sizable improvement in multigrid for unsquared.
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01

Substantial effort on timing and logging infrastructure
2015-07-24 01:31:13 +09:00
neo
6e5db0b1da Corrected bug in integer multiplications for SSE4 and AVX2
Merge remote-tracking branch 'upstream/master'

Conflicts:
	tests/Make.inc
2015-06-16 23:34:45 +09:00
Azusa Yamaguchi
ef97692622 Handle case of simd_layout not filling whole vector.
Useful if real complex live on same grid
2015-06-14 00:55:21 +01:00
Peter Boyle
1d0df449e8 Reorganise of file naming 2015-06-03 12:47:05 +01:00