1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-18 15:57:05 +01:00
Commit Graph

21 Commits

Author SHA1 Message Date
19b527e83f Better extract merge for GPU. Let the SIMD header files define the pointer type for
access. GPU redirects through builtin float2, double2 for complex
2018-07-05 07:05:13 -04:00
09cd46d337 Lane by Lane operation 2018-05-12 17:59:35 -04:00
90a2efb9b3 Hit an annoying strict alias optimisation in GCC 4.9 through 6.3
Chris K was correct. It appears that an additional memcpy (UGHHH) is enough
to suppress the compiler
2018-03-07 07:27:26 -08:00
4d53703c67 Scalar type differeing allowed, eg. precisoin change 2018-03-05 11:39:52 +00:00
e5ea04ee0c Need to support precision change, and real replication in multiple simd lanes 2018-03-04 15:53:04 +00:00
7574c18cef Massive clean up extract merge.
Simpler and GPU friendly
2018-02-24 22:21:08 +00:00
8e99264f40 Accelerator mark up of entire tensore space for offload 2018-01-24 13:27:30 +00:00
c037244874 Tensor reformatted with NAMESPACE too 2018-01-13 00:31:02 +00:00
3cbe974eb4 Layout 2016-10-20 16:55:21 +01:00
85ed8175cb Implemented mixed precision CG. Fixed filelist to exclude lib/Old directory and include Config.h. 2016-07-06 15:57:04 -04:00
8fd8bc25e9 simd 5th dim with rotation 2016-04-19 15:39:00 -07:00
c9fadf97a5 Simplify the compressor interface again. 2016-02-17 18:16:45 -06:00
c650bb3f3d Very small merge speed up. 2016-02-16 18:41:53 -06:00
fc6ad65751 Pushed the overlap comms tweaks 2016-01-11 06:34:22 -08:00
aae8bf31a7 Global edit adding copyright and license info to every source file. 2016-01-02 14:51:32 +00:00
145a295231 Bug fix for stencil with large shifts (3+), would be important to naik term for example but did not
impact Wilson based nearest neighbour stencils.
2015-12-30 19:29:48 +00:00
955b482aaf Partial optimisation of the extraction/merger of simd vecs. 2015-11-06 05:26:20 -06:00
d1afebf71e Sizable improvement in multigrid for unsquared.
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01

Substantial effort on timing and logging infrastructure
2015-07-24 01:31:13 +09:00
neo
6e5db0b1da Corrected bug in integer multiplications for SSE4 and AVX2
Merge remote-tracking branch 'upstream/master'

Conflicts:
	tests/Make.inc
2015-06-16 23:34:45 +09:00
ef97692622 Handle case of simd_layout not filling whole vector.
Useful if real complex live on same grid
2015-06-14 00:55:21 +01:00
1d0df449e8 Reorganise of file naming 2015-06-03 12:47:05 +01:00