a7635fd5ba
summit mem
2020-05-18 17:52:26 -04:00
ebb60330c9
Automatic data motion options beginning
2020-05-17 16:34:25 -04:00
32fbdf4fb1
Merge pull request #5 from paboyle/develop
...
Sync upstream
2020-05-13 09:02:56 +02:00
0e3c49f687
TransposeIndex was broken by Christoph
2020-05-12 17:57:01 -04:00
162e4bb567
no automatic prefetching for now
2020-05-12 07:01:23 -04:00
bbbee5660d
First compiile on HiP
2020-05-10 05:28:09 -04:00
c83471bfd0
Fix missing checkerboards for adj und conjugate
2020-05-08 16:44:03 +02:00
f8b8e00090
Systematise the accelerator primitives and locate to Grid/threads/Accelerator.h / Accelerator.cc
...
Aim to reduce the amount of cuda and other code variations floating around all over the place.
Will move GpuInit iinto Accelerator.cc from Init.cc
Need to worry about SharedMemoryMPI.cc and the Peer2Peer windows
2020-05-08 06:23:55 -07:00
3c6ffcb48c
Merge branch 'develop' into feature/gpt
2020-05-06 15:03:35 +02:00
87984ece7d
add Lattice_basis.h
2020-05-06 08:47:18 -04:00
e9b295f967
Synchronize blocking infrastructure with GPT
2020-05-06 08:42:28 -04:00
28a1fcaaff
First compile against SYCL
2020-05-05 11:13:27 -07:00
6b64727161
disable comments
2020-05-05 05:05:36 -04:00
04863f8f38
debug new AcceleratorView
2020-05-04 16:07:03 -04:00
2a1387e992
rankInnerProduct
2020-05-03 17:27:11 -04:00
9bfa51bffb
cleanup comment
2020-05-03 09:12:52 -04:00
38532753f4
interface cleanup
2020-05-03 08:58:32 -04:00
949be9605c
fix pragmas
2020-05-02 16:20:03 -04:00
63cf201ee7
Add AdviseInfrequentUse
2020-05-02 11:38:42 -04:00
ddb192bac7
re-work double precision promotion for summit
2020-04-30 16:09:57 -04:00
5011753f4f
Clean up warning
2020-04-30 10:23:48 -04:00
091d5c605e
towards more precise blocking
2020-04-17 04:25:28 -04:00
327da332bb
Merge branch 'develop' of https://github.com/paboyle/Grid into feature/gpt
2020-04-16 11:30:17 -04:00
6cdb09c884
Faster copy region
2020-04-10 11:10:52 -04:00
a65bc64f10
Accelerator peek poke
2020-04-10 11:09:59 -04:00
5fc8a273e7
Fused innerProduct + norm2 on first argument operation
2020-04-06 11:52:29 +02:00
c9b737a4e7
make trace,adj,transpose unary operators
2020-03-16 17:58:30 -04:00
68b45f6444
Lower left/upper right region cut paste
2020-02-06 15:50:26 -05:00
ef9b3e658a
extra typedef
2020-02-06 15:47:14 -05:00
1bd87c35d7
Read coalescing on Nvidia
2020-01-27 12:29:56 -05:00
48008e4d8b
Thread coordinate creation loop
2020-01-27 12:28:16 -05:00
9aafd20468
Simple block project promote runs faster on GPU
2019-12-17 05:01:39 -05:00
9e15474999
Accelerator loop attempt at speed up
2019-12-14 05:28:16 -05:00
152b525a4d
Typo fix
2019-12-13 22:44:42 -05:00
d18994eddc
offload more of mgrid to GPU
2019-12-13 22:08:11 -05:00
845a045493
Merge pull request #233 from giltirn/lanczos_fix
...
A few run /compile / memory leak fixes
2019-10-30 10:21:59 -04:00
5de9547db5
Removing old debug code
2019-10-08 15:51:28 +01:00
114ebb7914
Fixed Lanczos calling aligned alloc in threaded region hitting up against pointer-cache no-threading restrictions
...
Fixed Lattice::reset not compiling with new Grid explicit memory region handling
Fixed memory leak in Lattice::resize that occurs when data region has been previously allocated
2019-08-26 16:47:44 -04:00
be37dfb6f8
Remove debug code
2019-08-15 01:31:40 +01:00
3e49dc8a67
Reduction finished and hopefully fixes CI regression fail on single precisoin and force
2019-08-14 15:18:34 +01:00
ce97638bac
Think the reduction is now sorted and cleaned up
2019-08-11 11:09:01 +01:00
9117f61109
GPU friendly
2019-07-31 01:22:54 +01:00
9dad7a0094
Reproducible reduction and axpy_norm offload from Gianluca.
...
Hopefully get CG running entirely on GPU
2019-07-30 00:14:12 +01:00
775eaee199
Fix for suspected Intel 2018.1 compiler bug under O3
2019-07-19 07:57:34 +01:00
d976e5c514
Pow is being awkward in thrust for reasons I don't understand. Possible thrust bug.
2019-06-16 12:05:11 +01:00
20359ca15f
Coalesced loops.
2019-06-15 08:03:57 +01:00
736358b0cb
Coalesced loops
2019-06-15 08:03:13 +01:00
6b692aa726
Thread loops
2019-06-15 08:02:26 +01:00
7f99e1cd3b
Coalesced loops
2019-06-15 08:01:39 +01:00
f3c89df948
Thread loop changes
2019-06-15 08:00:37 +01:00