paboyle
|
dd891f5e3b
|
Use NVCC to suppress device Eigen
|
2018-06-27 21:25:17 +01:00 |
|
paboyle
|
6c97a6a071
|
Coalescing version of the kernel
|
2018-06-13 20:52:29 +01:00 |
|
paboyle
|
73bb2d5128
|
Ugly hack to speed up compile on GPU; we don't use the hand kernels on GPU anyway so why compile
|
2018-06-13 20:35:28 +01:00 |
|
paboyle
|
b710fec6ea
|
Gpu code first version of specialised kernel
|
2018-06-13 20:34:39 +01:00 |
|
paboyle
|
b2a8cd60f5
|
Doubled gauge field is useful
|
2018-06-13 20:27:47 +01:00 |
|
paboyle
|
867ee364ab
|
Explicit instantiation hooks
|
2018-06-13 20:27:12 +01:00 |
|
paboyle
|
94d1ae4c82
|
Some prep work for GPU shared memory. Need to be careful, as will try GPU direct
RDMA and inter-GPU memory sharing on SUmmit later
|
2018-06-13 20:24:06 +01:00 |
|
paboyle
|
2075b177ef
|
CUDA_ARCH more carefule treatment
|
2018-06-13 20:22:34 +01:00 |
|
paboyle
|
847c761ccc
|
Move sfw IEEE fp16 into central location
|
2018-06-13 20:22:01 +01:00 |
|
paboyle
|
8287ed8383
|
New GPU vector targets
|
2018-06-13 20:21:35 +01:00 |
|
paboyle
|
e6be7416f4
|
Use managed memory
|
2018-06-13 20:14:00 +01:00 |
|
paboyle
|
26863b6d95
|
User Managed memory
|
2018-06-13 20:13:42 +01:00 |
|
paboyle
|
ebd730bd54
|
Adding 2D loops
|
2018-06-13 20:13:01 +01:00 |
|
paboyle
|
066be31a3b
|
Optional GPU target SIMD types; work in progress and trying experiments
|
2018-06-13 20:07:55 +01:00 |
|
Peter Boyle
|
eb7d34a4cc
|
GPU version
|
2018-05-14 19:41:47 -04:00 |
|
Peter Boyle
|
aab27a655a
|
Start of GPU kernels
|
2018-05-14 19:41:17 -04:00 |
|
Peter Boyle
|
93280bae85
|
Gpu option
|
2018-05-14 19:40:58 -04:00 |
|
Peter Boyle
|
c5f93abcd7
|
GPU clean up
|
2018-05-14 19:40:33 -04:00 |
|
Peter Boyle
|
d5deef782d
|
Useful debug comments
|
2018-05-14 19:39:52 -04:00 |
|
Peter Boyle
|
5f50473c0d
|
Clean up
|
2018-05-14 19:39:11 -04:00 |
|
Peter Boyle
|
13f50406e3
|
Suppress print statement
|
2018-05-12 18:00:00 -04:00 |
|
Peter Boyle
|
09cd46d337
|
Lane by Lane operation
|
2018-05-12 17:59:35 -04:00 |
|
Peter Boyle
|
d3f51065c2
|
Give command line control of blocks/threads split
|
2018-05-12 17:58:56 -04:00 |
|
Peter Boyle
|
925ac4173d
|
Thread count control for warp scheduler thingy doodaa thing
|
2018-05-12 17:58:22 -04:00 |
|
Peter Boyle
|
a8a0bb85cc
|
Control scalar execution or vector under generic. Disable Eigen vectorisation on powerpc / SUmmit
|
2018-04-12 12:32:57 -04:00 |
|
Peter Boyle
|
6411caad67
|
work distribution
|
2018-04-12 11:41:41 -04:00 |
|
Peter Boyle
|
7533035a99
|
Control Eigen vectorisatoin
|
2018-04-12 11:40:56 -04:00 |
|
Peter Boyle
|
b15db11c60
|
Kernels -> pure static object to enable device execution
|
2018-03-24 19:35:20 -04:00 |
|
Peter Boyle
|
f6077f9d48
|
Kernels -> not instantiaed otherwise object ref on GPU
|
2018-03-24 19:33:44 -04:00 |
|
Peter Boyle
|
572954ef12
|
Kernels not an instantiated object, just static
|
2018-03-24 19:33:13 -04:00 |
|
Peter Boyle
|
cedeaae7db
|
Lebesge -> StencilView if necessary
|
2018-03-24 19:32:41 -04:00 |
|
Peter Boyle
|
e6cf0b1e17
|
View typedefs go to OperatorImpl
|
2018-03-24 19:32:11 -04:00 |
|
Peter Boyle
|
5412628ea6
|
begin end lamda
|
2018-03-24 19:31:45 -04:00 |
|
Peter Boyle
|
1f70cedbab
|
Have to make all kernel called routines static since object reference will be a host pointer on GPU
|
2018-03-24 19:29:26 -04:00 |
|
Peter Boyle
|
b50f37cfb4
|
Remove overlap comms flag
|
2018-03-24 19:28:53 -04:00 |
|
Peter Boyle
|
cb0d2a1b03
|
threaded rng init; I thought this was on
|
2018-03-24 19:28:17 -04:00 |
|
Peter Boyle
|
4e1272fabf
|
Kernels need to be static to work on GPU. No reference to host resident data
|
2018-03-22 18:44:53 -04:00 |
|
Peter Boyle
|
607dc2d3c6
|
Remove lebesgue order
|
2018-03-22 18:23:09 -04:00 |
|
Peter Boyle
|
23c880b009
|
Remove lebesgue order; stick in stencil if need
|
2018-03-22 18:13:41 -04:00 |
|
Peter Boyle
|
334bb6792f
|
Lebesgue order removed. Stick in the stencil view
|
2018-03-22 18:12:12 -04:00 |
|
Peter Boyle
|
299d119013
|
GPU work allocation improved
|
2018-03-22 18:04:24 -04:00 |
|
Peter Boyle
|
55be842d23
|
Dont force l1p.h so early
|
2018-03-22 18:01:43 -04:00 |
|
Peter Boyle
|
9875c446c6
|
Clean up pragmas
|
2018-03-20 07:19:17 -04:00 |
|
Peter Boyle
|
5cc9aca85d
|
Use 64bit index for looping
|
2018-03-20 06:34:52 -04:00 |
|
Peter Boyle
|
ac29ebcb95
|
Clean up debug prints
|
2018-03-20 06:33:59 -04:00 |
|
Peter Boyle
|
f04a7251cc
|
Gpu welcome message and device info
|
2018-03-19 07:12:12 -04:00 |
|
Peter Boyle
|
d4ce7d9905
|
GPU friendly Stencil needs a view
|
2018-03-19 07:11:21 -04:00 |
|
Peter Boyle
|
8a1d303ab9
|
GPU friendly stencil improvements
|
2018-03-19 07:11:03 -04:00 |
|
Peter Boyle
|
bf0a4de919
|
GPU friendly params object
|
2018-03-19 07:10:12 -04:00 |
|
Peter Boyle
|
6fe5885fe4
|
Warning suppress
|
2018-03-19 07:09:49 -04:00 |
|