Peter Boyle
c5f93abcd7
GPU clean up
2018-05-14 19:40:33 -04:00
paboyle
db988301d0
Introduce view objects for indexing lattices. Used to pass the view to acccelerators
2018-03-04 15:55:16 +00:00
paboyle
eed9aa9f0c
Extract merge gpu ready
2018-02-24 22:23:01 +00:00
paboyle
e657f9a344
OMP collapse changes to make NVCC happy
2018-01-28 01:21:53 +00:00
paboyle
70e276e1ab
parallel_for elimination -> thread_loop
2018-01-28 01:01:14 +00:00
paboyle
c4f82e072b
_grid becomes private ; use Grid()§
2018-01-27 00:04:12 +00:00
paboyle
32523a229c
Hide internals
2018-01-26 23:08:02 +00:00
paboyle
5609624b44
Threading constructs replaced
2018-01-24 13:32:24 +00:00
paboyle
6bf5fb1924
Clean up and format NAMESPACE
2018-01-13 00:08:25 +00:00
paboyle
54e94360ad
Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit
2017-06-24 23:10:24 +01:00
paboyle
180c732b4c
Move compressors out of Cshift.
...
Slice iterators would help
2017-04-20 13:17:55 +01:00
paboyle
4a340aa5ca
Massive compressor rework to support reduced precision comms
2017-04-20 09:28:27 +01:00
paboyle
4e7ab3166f
Refactoring header layout
2017-02-22 18:09:33 +00:00
paboyle
3ae92fa2e6
Global changes to parallel_for structure.
...
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
paboyle
41009cc142
Move excange into the stencil only; keep Cshift fully general
2017-02-20 17:48:04 -05:00
paboyle
8a29c16bde
Faster gather exchange
2017-02-16 23:52:22 +00:00
paboyle
bd600702cf
Vectorise the XYZT face gathering better.
...
Hard coded for simd_layout <= 2 in any given spread out direction; full generality is inconsistent
with efficiency.
2017-02-15 11:11:04 +00:00
paboyle
85c7bc4321
Bug fixes for cases that physics code couldn't hit but latent
...
and discovered on KNL (long vector, y SIMD dir) and checker dir set to y.
Remove the assertions on these code paths now they are tested.
2017-02-07 01:01:15 -05:00
paboyle
4f8e636a43
commVector
2016-10-20 16:59:16 +01:00
paboyle
9b39f35ae6
commVector different for SHMEM compat
2016-10-20 16:58:53 +01:00
paboyle
7240d73184
Parallelise the x faces; fix the segv on KNL with comms
2016-10-11 22:21:07 +01:00
paboyle
7223753355
Rotate in a direction > 2 for simd_layout
2016-04-19 15:35:15 -07:00
paboyle
db5e8050a8
Attempts at some optimisation
2016-02-18 22:33:58 +00:00
Peter Boyle
c9fadf97a5
Simplify the compressor interface again.
2016-02-17 18:16:45 -06:00
Peter Boyle
c650bb3f3d
Very small merge speed up.
2016-02-16 18:41:53 -06:00
Peter Boyle
41c2b09184
Shmem comms [NO MPI] target added. The dwf test runs and passes.
...
Not really shaken out to my satisfaction though as I want more tests done, so don't declare as working.
But committing my current while I try a few experimentals.
2016-02-14 14:24:38 -06:00
paboyle
d19321dfde
Overlap comms compute changes
2016-01-10 19:20:16 +00:00
paboyle
aae8bf31a7
Global edit adding copyright and license info to every source file.
2016-01-02 14:51:32 +00:00
paboyle
145a295231
Bug fix for stencil with large shifts (3+), would be important to naik term for example but did not
...
impact Wilson based nearest neighbour stencils.
2015-12-30 19:29:48 +00:00
Peter Boyle
473fa28a6c
Partial optimisation; comms in x-dir for red black dslash will be slow as the checker skipping block strided
...
loops are non threadable. Will need to write a kernel for these instead and drive them with a lookup table
to make a look sufficiently simple to thread.
2015-11-06 05:23:23 -06:00
Peter Boyle
12c5ec813c
Useful debug messages (commented out) are included for preservation in case I need to revisit this
2015-11-04 09:59:27 +00:00
Peter Boyle
1271508ca2
Bug fix for spread out in x (EO) direction.
...
This is really annoying -- it is very hard to thread the loops with the index
recursion on buffer offset in the red-black case. Must think of a good threading
solution here.
2015-11-04 09:57:57 +00:00
Peter Boyle
0a9ebac514
Gparity modifications in the Gparity compressor variant.
2015-08-11 06:22:20 +01:00
Peter Boyle
1d0df449e8
Reorganise of file naming
2015-06-03 12:47:05 +01:00
Azusa Yamaguchi
b00a40dd65
Const safety
2015-06-01 12:25:59 +01:00
Azusa Yamaguchi
12c2562b96
No compile fix on mpi target
2015-05-31 22:50:03 +01:00
Peter Boyle
5644ab1e19
Large scale change to support 5d fermion formulations.
...
Have 5d replicated wilson with 4d gauge working and matrix regressing
to Ls copies of wilson.
2015-05-31 15:09:02 +01:00
Peter Boyle
67fa5691e5
Weak scale the benchmarks automatically.
2015-05-28 13:47:01 +01:00
neo
da46b56e85
Adding support for doxygen generation
2015-05-27 10:34:56 +09:00
neo
1a24801246
checked performance of new vector libaries.
...
Added check for c++11 support on the configure.ac
2015-05-26 12:02:54 +09:00
neo
9e29ac6549
Completed implementation of new Grid_simd classes
...
Tested performance for SSE4, Ok.
AVX1/2, AVX512 yet untested
2015-05-22 17:33:15 +09:00
Peter Boyle
b00622302b
gcc doesn't like collapse(2) for some reason I can't figure
2015-05-15 11:36:22 +01:00
Peter Boyle
48f425d31c
I have made the Cshift work successfully with open mp threading in
...
every routine. Collapse(2) is now working under clang-omp++.
2015-05-13 00:31:00 +01:00
Peter Boyle
6103c29ee3
Threading support rework.
...
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
Peter Boyle
5555a852be
Lots of changes required to compile for MIC under ICPC
2015-05-10 23:29:21 +01:00
Peter Boyle
25d523c0f4
Shaken out stencil to the point where I think wilson dslash is correct.
...
Need to audit code carefully, consolidate between stencil and cshift,
and then benchmark and optimise.
2015-04-28 08:11:59 +01:00
Peter Boyle
f159495a9d
Reworking CSHIFT and Stencil. Implementing Wilson and discovered rework is required
2015-04-27 13:45:07 +01:00
Peter Boyle
b32c14b433
Got the NERSC IO working and fixed a bug in cshift.
2015-04-22 22:46:48 +01:00
Peter Boyle
e5a25dfcb1
Build reorg with which I am a bit happier
2015-04-18 21:22:50 +01:00