paboyle
|
aead94e9a7
|
View introduced
|
2018-03-04 16:39:29 +00:00 |
|
paboyle
|
3277bda130
|
View introduction to prepare for accelerator offload.
Probably same problem exists for stencil object
|
2018-03-04 16:38:08 +00:00 |
|
paboyle
|
442b0b406c
|
View related changes
|
2018-03-04 16:34:14 +00:00 |
|
paboyle
|
8824a54269
|
View related changes
|
2018-03-04 16:33:33 +00:00 |
|
paboyle
|
c03423250f
|
Indexable changes
|
2018-03-04 16:31:35 +00:00 |
|
paboyle
|
317fd0da44
|
Views introduced. Need to accelerator offload these routines.
|
2018-03-04 16:30:45 +00:00 |
|
paboyle
|
783795a44a
|
Views introduced
|
2018-03-04 16:12:49 +00:00 |
|
paboyle
|
0e6197fbed
|
Introduce accelerator friendly expression template rewrite.
Must obtain and access lattice indexing through a view object that is safe
to copy construct in copy to GPU (without copying the lattice).
|
2018-03-04 16:03:19 +00:00 |
|
paboyle
|
dad7862f91
|
Go through a view object that can be copied to GPU
|
2018-03-04 16:02:02 +00:00 |
|
paboyle
|
c89a883448
|
where was deprecated and integrated to ET engine a long time ago. Remove dead old original code
|
2018-03-04 15:58:02 +00:00 |
|
paboyle
|
c204288fbc
|
Remove a couple of print statements
|
2018-03-04 15:57:15 +00:00 |
|
paboyle
|
ad739f042a
|
Introduce views for passing lattice indexing to accelerators.
|
2018-03-04 15:56:14 +00:00 |
|
paboyle
|
db988301d0
|
Introduce view objects for indexing lattices. Used to pass the view to acccelerators
|
2018-03-04 15:55:16 +00:00 |
|
paboyle
|
9b1f29c4c2
|
Support a view for passing to accelerator
|
2018-03-04 15:54:35 +00:00 |
|
paboyle
|
e5ea04ee0c
|
Need to support precision change, and real replication in multiple simd lanes
|
2018-03-04 15:53:04 +00:00 |
|
paboyle
|
c92a3c6068
|
Need to support any vector type template and run on accelerator
|
2018-03-04 15:52:14 +00:00 |
|
paboyle
|
03f8da8fbc
|
enable-debug option for debug flags in compile
|
2018-03-04 15:51:47 +00:00 |
|
paboyle
|
78a9e31ff0
|
options more obvious
|
2018-02-24 22:26:32 +00:00 |
|
paboyle
|
c1fc947bb8
|
Coordinate handling GPU friendly + some GPU merge/extract improvements
|
2018-02-24 22:26:10 +00:00 |
|
paboyle
|
ff7b19a71b
|
Coordinate handling GPU ready avoid malloc
|
2018-02-24 22:25:39 +00:00 |
|
paboyle
|
1c16ffa1c1
|
Coordinate GPU ready. No malloc
|
2018-02-24 22:25:09 +00:00 |
|
paboyle
|
4962f59477
|
Eliminate both GPU issue and threading bottle neck by avoiding malloc in coordinate handling
|
2018-02-24 22:24:37 +00:00 |
|
paboyle
|
e158b60bce
|
GPU friendly coords
|
2018-02-24 22:23:47 +00:00 |
|
paboyle
|
34820bec27
|
Coordinate handling GPU ready. No malloc
|
2018-02-24 22:23:18 +00:00 |
|
paboyle
|
eed9aa9f0c
|
Extract merge gpu ready
|
2018-02-24 22:23:01 +00:00 |
|
paboyle
|
8792ff6439
|
Coordinate handling gpu ready
|
2018-02-24 22:22:43 +00:00 |
|
paboyle
|
078901278c
|
Coordinate handling gpu friendly
|
2018-02-24 22:22:02 +00:00 |
|
paboyle
|
bf5fb89aff
|
Coordinate handling GPU friendly
|
2018-02-24 22:21:36 +00:00 |
|
paboyle
|
7574c18cef
|
Massive clean up extract merge.
Simpler and GPU friendly
|
2018-02-24 22:21:08 +00:00 |
|
paboyle
|
36ea5f6b77
|
gpu friendly coordinates ; no std::vector on GPU
|
2018-02-24 22:20:14 +00:00 |
|
paboyle
|
285deab432
|
Coordinate handling GPU friendly. Avoid std::vector
|
2018-02-24 22:19:28 +00:00 |
|
paboyle
|
bb7d87d0a0
|
Coordinate handling gpu friendly
|
2018-02-24 22:18:33 +00:00 |
|
paboyle
|
b9b5bdfc3a
|
Proper offload (accelerator access) will require a mutable copy lambda.
|
2018-02-02 11:38:19 +00:00 |
|
paboyle
|
51eb2c5dfc
|
Make referencign the stencil and all info required to evaluate the kernel
accelerator marked up
|
2018-02-02 11:37:13 +00:00 |
|
paboyle
|
ede0dff794
|
Mark up as an accelerator function
|
2018-02-02 11:36:44 +00:00 |
|
paboyle
|
aa6de818e2
|
Copy data needed by Kernels out of the grid object to avoid host reference
|
2018-02-02 11:36:11 +00:00 |
|
paboyle
|
dcf6517a93
|
Accelerator offload and copy Opt into the kernel for GPU host var safety
|
2018-02-02 11:35:35 +00:00 |
|
paboyle
|
a308dff410
|
accelerator loop, copy Opt into the GPU
|
2018-02-02 11:34:37 +00:00 |
|
paboyle
|
14ba20898a
|
Accelerator loop the key kernel call
|
2018-02-02 11:30:07 +00:00 |
|
paboyle
|
a53d3ee19a
|
Add Opt to the lambda capture to get it into the GPU
|
2018-02-02 11:28:39 +00:00 |
|
paboyle
|
5df435319d
|
Use constexpr
|
2018-02-02 11:27:56 +00:00 |
|
paboyle
|
0da2d3e222
|
accelerator off load some more stuff
|
2018-02-02 11:27:35 +00:00 |
|
paboyle
|
9c9dfbfa78
|
Force accelerator
|
2018-02-02 11:25:09 +00:00 |
|
paboyle
|
e4df025d01
|
Accelerator related
|
2018-02-01 23:20:05 +00:00 |
|
paboyle
|
cfeda9d536
|
constexpr on const ints
|
2018-02-01 22:59:12 +00:00 |
|
paboyle
|
4450b1993a
|
Offload
|
2018-02-01 22:45:47 +00:00 |
|
paboyle
|
d03ce5c2a4
|
Provide a way to get around std::vector for a known type on device.
Use template specialisation to access a private member in the Clang++ STL implementation
|
2018-02-01 22:44:25 +00:00 |
|
paboyle
|
7d6522c1ef
|
Accelerator inline
|
2018-02-01 22:43:56 +00:00 |
|
paboyle
|
b96832a922
|
Accelerator inline
|
2018-02-01 22:43:26 +00:00 |
|
paboyle
|
5d7af47b05
|
accelerator_inline
|
2018-02-01 22:42:54 +00:00 |
|