Peter Boyle
8cfd5d2639
Need lattice view
2020-06-03 09:11:28 -04:00
Peter Boyle
1c9f20b15e
Views must be closed
2020-06-03 09:10:29 -04:00
Peter Boyle
32237895bd
Reorg memory manager for O(1) hash table
2020-06-03 09:09:52 -04:00
Peter Boyle
1d252d0922
Accelerator inline
2020-05-28 11:45:25 -04:00
Peter Boyle
006cc8a8f1
Staggereed move to accelerator
2020-05-28 08:33:06 -04:00
Peter Boyle
ee63721bad
int unhappiness sycl fix
2020-05-25 08:36:24 -07:00
Peter Boyle
22c5168d70
Sycl happier
2020-05-25 08:35:56 -07:00
Peter Boyle
949ac3cd24
Must avoid non-trivial copy constructors
2020-05-25 08:35:28 -07:00
Peter Boyle
7bc0166c1c
SYCLL maknig happy - must avoid non ttrivial copy constructors
2020-05-25 08:34:19 -07:00
Peter Boyle
cb0d1b3399
hopefullly fix buildd fail
2020-05-24 21:27:00 -04:00
Peter Boyle
d1f1ccc705
HIP changes
2020-05-24 21:18:49 -04:00
Peter Boyle
c7519a237a
Assertions fail on HIP foor unknown reasons - dedbugging
2020-05-24 14:02:47 -04:00
Peter Boyle
32be2b13d3
Updates for HiP
2020-05-24 14:00:55 -04:00
Peter Boyle
92b342a477
Hip reduction too
2020-05-24 13:50:28 -04:00
Peter Boyle
556da86ac3
HIP fp16
2020-05-24 13:41:58 -04:00
Peter Boyle
8285e41574
View location / access mode
2020-05-21 16:14:41 -04:00
Peter Boyle
7860a50f70
Make view specify where and drive data motion - first cut.
...
This is a compile tiime option --enable-unified=yes/no
2020-05-21 16:13:16 -04:00
Peter Boyle
ebb60330c9
Automatic data motion options beginning
2020-05-17 16:34:25 -04:00
Peter Boyle
a9847aa866
Dependence fix
2020-05-12 20:03:37 -04:00
Peter Boyle
d24d8e8398
Use X-direction as more bits meaningful on CUDA.
...
2^31-1 shoulddd always bee enough for SIMD and thread reduced local volume
e.g. 32*2^31 = 2^36 = (2^9)^4 or 512^4 ias big enough.
Where 32 is gpu_threads * Nsimd = 8*4
2020-05-12 10:35:49 -04:00
Peter Boyle
07c0c02f8c
Speed up Cshift
2020-05-11 17:02:01 -04:00
Peter Boyle
8c31c065b5
Keep the Vector fixed to protect it from realloc
2020-05-11 17:00:30 -04:00
Peter Boyle
bbbee5660d
First compiile on HiP
2020-05-10 05:28:09 -04:00
Peter Boyle
52081acfa5
NVCC compile fixes
2020-05-08 13:14:12 -04:00
Peter Boyle
f8b8e00090
Systematise the accelerator primitives and locate to Grid/threads/Accelerator.h / Accelerator.cc
...
Aim to reduce the amount of cuda and other code variations floating around all over the place.
Will move GpuInit iinto Accelerator.cc from Init.cc
Need to worry about SharedMemoryMPI.cc and the Peer2Peer windows
2020-05-08 06:23:55 -07:00
Peter Boyle
28a1fcaaff
First compile against SYCL
2020-05-05 11:13:27 -07:00
u37294
04927d2e40
SYCL prep - no sycl just make it compile through DPC++
2020-05-04 10:28:29 -07:00
u37294
7caed4edd9
dpc++ didn't like rdtsc()
2020-05-04 10:27:05 -07:00
Peter Boyle
9b2d2d0fc3
Basis rotate stack passig to GPU reduction
2020-04-30 12:31:07 -04:00
Peter Boyle
5011753f4f
Clean up warning
2020-04-30 10:23:48 -04:00
Peter Boyle
dd3ebc2ce4
Slow compile on NVCC switch off conserved current
2020-04-29 08:43:12 -04:00
Peter Boyle
c2c3cad20d
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
2020-04-23 04:35:42 -04:00
Peter Boyle
edec9ee2e2
Conserved current rewrite done. Zmobius working
2020-04-23 04:34:01 -04:00
Christopher Kelly
0896f2cead
Added missing include guards in bigfloat_double.h
2020-04-20 10:30:38 -04:00
Christopher Kelly
181709bba4
Merge branch 'develop' into feature/zmobius_paramcompute
2020-04-20 09:12:34 -04:00
Peter Boyle
90229cfb0f
Merge pull request #270 from milc-qcd/feature/CGinfo
...
feature/CGinfo
2020-04-16 11:46:08 -04:00
Peter Boyle
0475c46ecb
Merge pull request #256 from djm2131/feature/BiCGSTAB
...
Import BiCGSTAB solvers and tests
2020-04-16 11:45:15 -04:00
Peter Boyle
f3a8d039a2
Merge branch 'feature/hdcr' into develop
2020-04-10 22:01:52 -04:00
Peter Boyle
3b0e07882f
Adding another form of polynomial
2020-04-10 11:28:33 -04:00
Peter Boyle
8e81a811d0
Merge branch 'feature/hdcr' into develop
2020-04-10 11:14:49 -04:00
Peter Boyle
aa13118127
Missing conjugate already fixed in develop
2020-04-10 11:11:24 -04:00
Peter Boyle
6cdb09c884
Faster copy region
2020-04-10 11:10:52 -04:00
Peter Boyle
a65bc64f10
Accelerator peek poke
2020-04-10 11:09:59 -04:00
Peter Boyle
11dec4883c
Don't throw assert
2020-04-10 11:09:11 -04:00
Peter Boyle
afa458c812
Extra solvers
2020-04-10 11:08:19 -04:00
Peter Boyle
dc50190b8f
Faster GPU basis rotation
...
May need to later include Regensburg optimised CPU variant
2020-04-10 11:06:04 -04:00
Carleton DeTar
165c68e28e
Change TrueResiduals to TrueResidualShift and IterationsToComplete to IterationsToCompleteShift
2020-02-29 17:51:51 -06:00
Carleton DeTar
9479bc8486
Make IterationsToComplete and TrueResidual externally accessible
2020-02-19 17:43:57 -06:00
Peter Boyle
8a5c13d5fb
Still fast moving in changes
2020-02-06 17:57:26 -05:00
Peter Boyle
bdccb0c91f
Working 2 types of decomposition
2020-02-06 17:26:55 -05:00