paboyle
|
442b0b406c
|
View related changes
|
2018-03-04 16:34:14 +00:00 |
|
paboyle
|
8824a54269
|
View related changes
|
2018-03-04 16:33:33 +00:00 |
|
paboyle
|
c03423250f
|
Indexable changes
|
2018-03-04 16:31:35 +00:00 |
|
paboyle
|
317fd0da44
|
Views introduced. Need to accelerator offload these routines.
|
2018-03-04 16:30:45 +00:00 |
|
paboyle
|
783795a44a
|
Views introduced
|
2018-03-04 16:12:49 +00:00 |
|
paboyle
|
0e6197fbed
|
Introduce accelerator friendly expression template rewrite.
Must obtain and access lattice indexing through a view object that is safe
to copy construct in copy to GPU (without copying the lattice).
|
2018-03-04 16:03:19 +00:00 |
|
paboyle
|
dad7862f91
|
Go through a view object that can be copied to GPU
|
2018-03-04 16:02:02 +00:00 |
|
paboyle
|
c89a883448
|
where was deprecated and integrated to ET engine a long time ago. Remove dead old original code
|
2018-03-04 15:58:02 +00:00 |
|
paboyle
|
c204288fbc
|
Remove a couple of print statements
|
2018-03-04 15:57:15 +00:00 |
|
paboyle
|
ad739f042a
|
Introduce views for passing lattice indexing to accelerators.
|
2018-03-04 15:56:14 +00:00 |
|
paboyle
|
db988301d0
|
Introduce view objects for indexing lattices. Used to pass the view to acccelerators
|
2018-03-04 15:55:16 +00:00 |
|
paboyle
|
9b1f29c4c2
|
Support a view for passing to accelerator
|
2018-03-04 15:54:35 +00:00 |
|
paboyle
|
e5ea04ee0c
|
Need to support precision change, and real replication in multiple simd lanes
|
2018-03-04 15:53:04 +00:00 |
|
paboyle
|
c92a3c6068
|
Need to support any vector type template and run on accelerator
|
2018-03-04 15:52:14 +00:00 |
|
paboyle
|
c1fc947bb8
|
Coordinate handling GPU friendly + some GPU merge/extract improvements
|
2018-02-24 22:26:10 +00:00 |
|
paboyle
|
ff7b19a71b
|
Coordinate handling GPU ready avoid malloc
|
2018-02-24 22:25:39 +00:00 |
|
paboyle
|
1c16ffa1c1
|
Coordinate GPU ready. No malloc
|
2018-02-24 22:25:09 +00:00 |
|
paboyle
|
4962f59477
|
Eliminate both GPU issue and threading bottle neck by avoiding malloc in coordinate handling
|
2018-02-24 22:24:37 +00:00 |
|
paboyle
|
34820bec27
|
Coordinate handling GPU ready. No malloc
|
2018-02-24 22:23:18 +00:00 |
|
paboyle
|
eed9aa9f0c
|
Extract merge gpu ready
|
2018-02-24 22:23:01 +00:00 |
|
paboyle
|
8792ff6439
|
Coordinate handling gpu ready
|
2018-02-24 22:22:43 +00:00 |
|
paboyle
|
078901278c
|
Coordinate handling gpu friendly
|
2018-02-24 22:22:02 +00:00 |
|
paboyle
|
bf5fb89aff
|
Coordinate handling GPU friendly
|
2018-02-24 22:21:36 +00:00 |
|
paboyle
|
7574c18cef
|
Massive clean up extract merge.
Simpler and GPU friendly
|
2018-02-24 22:21:08 +00:00 |
|
paboyle
|
b9b5bdfc3a
|
Proper offload (accelerator access) will require a mutable copy lambda.
|
2018-02-02 11:38:19 +00:00 |
|
paboyle
|
51eb2c5dfc
|
Make referencign the stencil and all info required to evaluate the kernel
accelerator marked up
|
2018-02-02 11:37:13 +00:00 |
|
paboyle
|
ede0dff794
|
Mark up as an accelerator function
|
2018-02-02 11:36:44 +00:00 |
|
paboyle
|
aa6de818e2
|
Copy data needed by Kernels out of the grid object to avoid host reference
|
2018-02-02 11:36:11 +00:00 |
|
paboyle
|
dcf6517a93
|
Accelerator offload and copy Opt into the kernel for GPU host var safety
|
2018-02-02 11:35:35 +00:00 |
|
paboyle
|
a308dff410
|
accelerator loop, copy Opt into the GPU
|
2018-02-02 11:34:37 +00:00 |
|
paboyle
|
14ba20898a
|
Accelerator loop the key kernel call
|
2018-02-02 11:30:07 +00:00 |
|
paboyle
|
a53d3ee19a
|
Add Opt to the lambda capture to get it into the GPU
|
2018-02-02 11:28:39 +00:00 |
|
paboyle
|
5df435319d
|
Use constexpr
|
2018-02-02 11:27:56 +00:00 |
|
paboyle
|
0da2d3e222
|
accelerator off load some more stuff
|
2018-02-02 11:27:35 +00:00 |
|
paboyle
|
9c9dfbfa78
|
Force accelerator
|
2018-02-02 11:25:09 +00:00 |
|
paboyle
|
e4df025d01
|
Accelerator related
|
2018-02-01 23:20:05 +00:00 |
|
paboyle
|
cfeda9d536
|
constexpr on const ints
|
2018-02-01 22:59:12 +00:00 |
|
paboyle
|
4450b1993a
|
Offload
|
2018-02-01 22:45:47 +00:00 |
|
paboyle
|
d03ce5c2a4
|
Provide a way to get around std::vector for a known type on device.
Use template specialisation to access a private member in the Clang++ STL implementation
|
2018-02-01 22:44:25 +00:00 |
|
paboyle
|
7d6522c1ef
|
Accelerator inline
|
2018-02-01 22:43:56 +00:00 |
|
paboyle
|
b96832a922
|
Accelerator inline
|
2018-02-01 22:43:26 +00:00 |
|
paboyle
|
5d7af47b05
|
accelerator_inline
|
2018-02-01 22:42:54 +00:00 |
|
paboyle
|
053ef25c90
|
constexpr makes GPU happy
|
2018-02-01 22:42:29 +00:00 |
|
paboyle
|
8ae77d3706
|
Small simplification of FermionOperatorImpl towards GPU but not there yet
|
2018-02-01 22:41:54 +00:00 |
|
paboyle
|
79b50feacf
|
fixme updates
|
2018-01-29 16:00:40 +00:00 |
|
paboyle
|
c67c1544cd
|
abs no compile on travis fix attempt
|
2018-01-28 10:26:04 +00:00 |
|
paboyle
|
e657f9a344
|
OMP collapse changes to make NVCC happy
|
2018-01-28 01:21:53 +00:00 |
|
paboyle
|
b6ebf35af5
|
Intel compiler doesn't like Nvidia error disable pragmas
|
2018-01-28 01:03:10 +00:00 |
|
paboyle
|
70e276e1ab
|
parallel_for elimination -> thread_loop
|
2018-01-28 01:01:14 +00:00 |
|
paboyle
|
9597ab94eb
|
Zero changes, swap on lattice type.
|
2018-01-27 23:51:40 +00:00 |
|
paboyle
|
f574c20118
|
Zero changes, __VA_ARGS__ and swap
|
2018-01-27 23:50:17 +00:00 |
|
paboyle
|
f102897385
|
VA_ARGS to make comma safe automatic
|
2018-01-27 23:49:47 +00:00 |
|
paboyle
|
d6fce3e498
|
Zero changes, literally
|
2018-01-27 23:48:01 +00:00 |
|
paboyle
|
2d0bcc2606
|
Zero changes, acceleartor on kernels and some thread loop changes
|
2018-01-27 23:47:38 +00:00 |
|
paboyle
|
45df59720e
|
Zero changes and VA_ARGS changes
|
2018-01-27 23:46:58 +00:00 |
|
paboyle
|
44ef5bc207
|
Zero changes (literally speaking).
|
2018-01-27 23:46:28 +00:00 |
|
paboyle
|
c4f82e072b
|
_grid becomes private ; use Grid()§
|
2018-01-27 00:04:12 +00:00 |
|
paboyle
|
912b50f6fa
|
Hiding lattice internals
|
2018-01-26 23:08:45 +00:00 |
|
paboyle
|
32523a229c
|
Hide internals
|
2018-01-26 23:08:02 +00:00 |
|
paboyle
|
1ebd56c3fb
|
Hide internal data
|
2018-01-26 23:07:34 +00:00 |
|
paboyle
|
43cea62855
|
Hide internal data
|
2018-01-26 23:06:03 +00:00 |
|
paboyle
|
2b4067bb71
|
Hide internal data
|
2018-01-26 23:05:32 +00:00 |
|
paboyle
|
85771e97e9
|
Hide internal data
|
2018-01-26 23:04:46 +00:00 |
|
paboyle
|
8b371ffa94
|
Hide internal data
|
2018-01-26 23:03:54 +00:00 |
|
paboyle
|
bf659dfd92
|
Hide the ._odata
|
2018-01-26 22:27:47 +00:00 |
|
paboyle
|
76a4dd36d9
|
Fix no compile of test serialisation
|
2018-01-26 00:13:21 +00:00 |
|
paboyle
|
f4010023ca
|
Warning fixes
|
2018-01-25 23:46:47 +00:00 |
|
paboyle
|
40ee1e1957
|
Zero()
|
2018-01-25 23:36:58 +00:00 |
|
paboyle
|
461df78a3f
|
Better to use Zero(), and not zero static data
|
2018-01-25 23:36:22 +00:00 |
|
paboyle
|
db9c9475d4
|
const
|
2018-01-25 23:36:06 +00:00 |
|
paboyle
|
214f7a6f13
|
Drop std::vector container for the lattice data
|
2018-01-25 23:35:04 +00:00 |
|
paboyle
|
c844cfcda8
|
Remove commAllocator; make more simple; option to switch off the pointer caceh
|
2018-01-25 23:33:57 +00:00 |
|
paboyle
|
a3e3034e6f
|
Host compile
|
2018-01-25 23:33:00 +00:00 |
|
paboyle
|
99329197ee
|
Rename header to .h
|
2018-01-24 14:10:09 +00:00 |
|
paboyle
|
421401af55
|
Remove IMCI as really don't support
|
2018-01-24 13:53:21 +00:00 |
|
paboyle
|
0626c1e39e
|
Accelerator flaggina dn thrust complex for NVCC
|
2018-01-24 13:50:41 +00:00 |
|
paboyle
|
725f03e2e2
|
Accelerator markup and thrust complex on nvcc
|
2018-01-24 13:50:10 +00:00 |
|
paboyle
|
65f77112e0
|
Thread loops done properly
|
2018-01-24 13:49:39 +00:00 |
|
paboyle
|
408b868475
|
Generic for GPU needs accelerator markup of functions
|
2018-01-24 13:49:12 +00:00 |
|
paboyle
|
1c797deb04
|
Accelerator tweaks
|
2018-01-24 13:43:43 +00:00 |
|
paboyle
|
b9d5a42b57
|
Should be able to eliminate the COMMA_SAFE with VA_ARGS trick ; revisit this file
|
2018-01-24 13:42:06 +00:00 |
|
paboyle
|
e737591918
|
Accelerator loops
|
2018-01-24 13:41:12 +00:00 |
|
paboyle
|
ba5ea5830b
|
Acceleartor loops
|
2018-01-24 13:40:56 +00:00 |
|
paboyle
|
43f244badf
|
Thread loops for now; figure out what can be GPU accelerated later here
|
2018-01-24 13:40:30 +00:00 |
|
paboyle
|
e9c8ba5ef7
|
Accelerator loosp
|
2018-01-24 13:39:54 +00:00 |
|
paboyle
|
d70709a8e8
|
Thread construct changes
|
2018-01-24 13:39:06 +00:00 |
|
paboyle
|
733f8ff0b2
|
Still using parallel_for -- don't know how to implement reduction on GPU yet. Look at some sample code is best.
|
2018-01-24 13:38:13 +00:00 |
|
paboyle
|
0bfa5bb213
|
Accelerator loosp
|
2018-01-24 13:37:26 +00:00 |
|
paboyle
|
1f26a234f9
|
CPU loops explicit for peek poke
|
2018-01-24 13:36:31 +00:00 |
|
paboyle
|
13f0116425
|
Accelerator loops
|
2018-01-24 13:35:55 +00:00 |
|
paboyle
|
25f589b064
|
Accelerator loops
|
2018-01-24 13:35:36 +00:00 |
|
paboyle
|
210c50a278
|
Accelerator prep work
|
2018-01-24 13:35:13 +00:00 |
|
paboyle
|
549a143e78
|
Accelerator related
|
2018-01-24 13:34:46 +00:00 |
|
paboyle
|
277301486d
|
Simple warning elimination
|
2018-01-24 13:34:15 +00:00 |
|
paboyle
|
c851b39a49
|
Nicer way of including aggregate
|
2018-01-24 13:33:34 +00:00 |
|
paboyle
|
15cc12eb6c
|
Delete the old non ET file
|
2018-01-24 13:33:07 +00:00 |
|
paboyle
|
ae4f1f8c12
|
New file, split out two from Lattice_reduction
|
2018-01-24 13:32:43 +00:00 |
|
paboyle
|
5609624b44
|
Threading constructs replaced
|
2018-01-24 13:32:24 +00:00 |
|
paboyle
|
b5a947dd79
|
Change to make NVCC happy
|
2018-01-24 13:32:02 +00:00 |
|
paboyle
|
ee16f62322
|
stray semicolon elimination. NVCC is picky, but eventually picked up these diags
with a pragma to suppress
|
2018-01-24 13:31:17 +00:00 |
|
paboyle
|
3318de27d6
|
Thread macro changes
|
2018-01-24 13:30:23 +00:00 |
|
paboyle
|
ac56965306
|
GPU changes and threading macros replaced
|
2018-01-24 13:28:30 +00:00 |
|
paboyle
|
8e99264f40
|
Accelerator mark up of entire tensore space for offload
|
2018-01-24 13:27:30 +00:00 |
|
paboyle
|
69327db9a9
|
Improviements for NVCC. Eigen is not compat with CUDA 9 and must hack to disable device
compilation
|
2018-01-24 13:25:07 +00:00 |
|
paboyle
|
7331ee2d80
|
Warnings control to overpower the NVCC compiler
|
2018-01-24 13:24:36 +00:00 |
|
paboyle
|
4e1135b214
|
Updated pugixml to v1.8; still didn't fix no compile under nvcc.
Turns out nvcc was right; must to an explicit template instantiation that was missing
but left gcc, icpc and clang happy for some reason.
Fix this.
|
2018-01-24 13:17:10 +00:00 |
|
paboyle
|
acd4955a18
|
remove rdtsc on __NVCC__ as may be device called
|
2018-01-24 13:16:18 +00:00 |
|
paboyle
|
bd08dc4f45
|
Pragma use for nvcc, warning elimination.
|
2018-01-24 13:15:43 +00:00 |
|
paboyle
|
22d137d4e5
|
Namespace, nvcc warning elimination.
|
2018-01-24 13:14:43 +00:00 |
|
paboyle
|
87ee592176
|
Pragma changes and layout and warning elimination for nvcc
|
2018-01-24 13:14:09 +00:00 |
|
paboyle
|
063603b1ea
|
Warning elimination
|
2018-01-24 13:12:14 +00:00 |
|
paboyle
|
f292106db6
|
Split out pragms from threads.h;
More work needed; renam threads directory to "parallelism" or something like that
|
2018-01-24 13:11:04 +00:00 |
|
paboyle
|
9d08aebea9
|
Compile through nvcc ; warning elimination fixes
|
2018-01-24 13:09:53 +00:00 |
|
paboyle
|
56999474e2
|
Indent
|
2018-01-15 11:44:45 +00:00 |
|
paboyle
|
d74c21a386
|
GLobal edit for QCD namespace removal & NAMESPACE macros
|
2018-01-15 09:37:58 +00:00 |
|
paboyle
|
6f20f1d224
|
Namespace
|
2018-01-15 00:24:20 +00:00 |
|
paboyle
|
d0e357ef89
|
CLeanup and no QCD namespace
|
2018-01-15 00:23:51 +00:00 |
|
paboyle
|
21251f2e1b
|
Namespace and formatting changes
|
2018-01-15 00:21:27 +00:00 |
|
paboyle
|
fcf1ccf669
|
Namespace, indent, badly formatted
|
2018-01-15 00:17:58 +00:00 |
|
paboyle
|
49cce514f1
|
Namespace
|
2018-01-15 00:17:11 +00:00 |
|
paboyle
|
695af98a1d
|
Namespace, indent, tidy
|
2018-01-15 00:16:13 +00:00 |
|
paboyle
|
f8cb46d360
|
Namspace, indent, badly formatted code fixed
|
2018-01-15 00:14:47 +00:00 |
|
paboyle
|
0da64dea90
|
Namespace, indent
|
2018-01-15 00:13:32 +00:00 |
|
paboyle
|
2cceebbf12
|
Namespace, indent
|
2018-01-15 00:12:20 +00:00 |
|
paboyle
|
40232dcefe
|
Namespce
|
2018-01-15 00:11:19 +00:00 |
|
paboyle
|
dbd86bb95b
|
CLeanup, namespace, indent
|
2018-01-15 00:10:11 +00:00 |
|
paboyle
|
b8fd2c161f
|
Indent, namespace
|
2018-01-15 00:09:33 +00:00 |
|
paboyle
|
df9b979583
|
Indent, namespace
|
2018-01-15 00:08:40 +00:00 |
|
paboyle
|
23ef0e3e19
|
Namespace and indentation
|
2018-01-15 00:07:46 +00:00 |
|
paboyle
|
ae9175735a
|
Indentation, Namespace
|
2018-01-15 00:07:10 +00:00 |
|
paboyle
|
2d13ea1a22
|
Namespace and indentation emacs choices
|
2018-01-15 00:05:55 +00:00 |
|
paboyle
|
8c675064bd
|
Namespace and indentation
|
2018-01-15 00:04:43 +00:00 |
|
paboyle
|
550b905bb8
|
Namespace nd indentation
|
2018-01-15 00:03:49 +00:00 |
|
paboyle
|
edb79dc088
|
Namespce,and indent
|
2018-01-15 00:02:33 +00:00 |
|
paboyle
|
88e635c5d1
|
Namepscae, format
|
2018-01-15 00:02:01 +00:00 |
|
paboyle
|
ecb4a24de8
|
Namespace
|
2018-01-15 00:01:25 +00:00 |
|
paboyle
|
c8c1d36710
|
Namespace, indent
|
2018-01-15 00:00:52 +00:00 |
|
paboyle
|
b4bb428d9b
|
Namespace, indent
|
2018-01-14 23:59:57 +00:00 |
|
paboyle
|
e9ef7e3852
|
Namespace, indent
|
2018-01-14 23:59:23 +00:00 |
|
paboyle
|
31cbbfc07e
|
Namespace, indent
|
2018-01-14 23:58:44 +00:00 |
|
paboyle
|
4eb0552d1d
|
Namespace, indnet
|
2018-01-14 23:58:03 +00:00 |
|
paboyle
|
08f2a4564f
|
Namespace, formatting
|
2018-01-14 23:56:33 +00:00 |
|
paboyle
|
7e00f643f8
|
Namespace indent
|
2018-01-14 23:55:44 +00:00 |
|
paboyle
|
c19ccdad7c
|
Namespace, indent
|
2018-01-14 23:55:07 +00:00 |
|
paboyle
|
8aed4181e1
|
Namespace, indent
|
2018-01-14 23:54:25 +00:00 |
|
paboyle
|
06ab7f5661
|
Namespace
|
2018-01-14 23:53:31 +00:00 |
|
paboyle
|
645ec8eba0
|
Namespace
|
2018-01-14 23:52:26 +00:00 |
|
paboyle
|
72ffa8a88e
|
Namespace
|
2018-01-14 23:51:38 +00:00 |
|
paboyle
|
4c829b410e
|
Namespace
|
2018-01-14 23:50:20 +00:00 |
|
paboyle
|
eda4fd9912
|
Namespace
|
2018-01-14 23:49:11 +00:00 |
|