1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-10-25 10:09:34 +01:00
Commit Graph

49 Commits

Author SHA1 Message Date
Peter Boyle
c5f93abcd7 GPU clean up 2018-05-14 19:40:33 -04:00
paboyle
db988301d0 Introduce view objects for indexing lattices. Used to pass the view to acccelerators 2018-03-04 15:55:16 +00:00
paboyle
eed9aa9f0c Extract merge gpu ready 2018-02-24 22:23:01 +00:00
paboyle
e657f9a344 OMP collapse changes to make NVCC happy 2018-01-28 01:21:53 +00:00
paboyle
70e276e1ab parallel_for elimination -> thread_loop 2018-01-28 01:01:14 +00:00
paboyle
c4f82e072b _grid becomes private ; use Grid()§ 2018-01-27 00:04:12 +00:00
paboyle
32523a229c Hide internals 2018-01-26 23:08:02 +00:00
paboyle
5609624b44 Threading constructs replaced 2018-01-24 13:32:24 +00:00
paboyle
6bf5fb1924 Clean up and format NAMESPACE 2018-01-13 00:08:25 +00:00
paboyle
54e94360ad Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit 2017-06-24 23:10:24 +01:00
paboyle
180c732b4c Move compressors out of Cshift.
Slice iterators would help
2017-04-20 13:17:55 +01:00
paboyle
4a340aa5ca Massive compressor rework to support reduced precision comms 2017-04-20 09:28:27 +01:00
paboyle
4e7ab3166f Refactoring header layout 2017-02-22 18:09:33 +00:00
paboyle
3ae92fa2e6 Global changes to parallel_for structure.
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
paboyle
41009cc142 Move excange into the stencil only; keep Cshift fully general 2017-02-20 17:48:04 -05:00
paboyle
8a29c16bde Faster gather exchange 2017-02-16 23:52:22 +00:00
paboyle
bd600702cf Vectorise the XYZT face gathering better.
Hard coded for simd_layout <= 2 in any given spread out direction; full generality is inconsistent
with efficiency.
2017-02-15 11:11:04 +00:00
paboyle
85c7bc4321 Bug fixes for cases that physics code couldn't hit but latent
and discovered on KNL (long vector, y SIMD dir) and checker dir set to y.
Remove the assertions on these code paths now they are tested.
2017-02-07 01:01:15 -05:00
paboyle
4f8e636a43 commVector 2016-10-20 16:59:16 +01:00
paboyle
9b39f35ae6 commVector different for SHMEM compat 2016-10-20 16:58:53 +01:00
paboyle
7240d73184 Parallelise the x faces; fix the segv on KNL with comms 2016-10-11 22:21:07 +01:00
paboyle
7223753355 Rotate in a direction > 2 for simd_layout 2016-04-19 15:35:15 -07:00
paboyle
db5e8050a8 Attempts at some optimisation 2016-02-18 22:33:58 +00:00
Peter Boyle
c9fadf97a5 Simplify the compressor interface again. 2016-02-17 18:16:45 -06:00
Peter Boyle
c650bb3f3d Very small merge speed up. 2016-02-16 18:41:53 -06:00
Peter Boyle
41c2b09184 Shmem comms [NO MPI] target added. The dwf test runs and passes.
Not really shaken out to my satisfaction though as I want more tests done, so don't declare as working.
But committing my current while I try a few experimentals.
2016-02-14 14:24:38 -06:00
paboyle
d19321dfde Overlap comms compute changes 2016-01-10 19:20:16 +00:00
paboyle
aae8bf31a7 Global edit adding copyright and license info to every source file. 2016-01-02 14:51:32 +00:00
paboyle
145a295231 Bug fix for stencil with large shifts (3+), would be important to naik term for example but did not
impact Wilson based nearest neighbour stencils.
2015-12-30 19:29:48 +00:00
Peter Boyle
473fa28a6c Partial optimisation; comms in x-dir for red black dslash will be slow as the checker skipping block strided
loops are non threadable. Will need to write a kernel for these instead and drive them with a lookup table
to make a look sufficiently simple to thread.
2015-11-06 05:23:23 -06:00
Peter Boyle
12c5ec813c Useful debug messages (commented out) are included for preservation in case I need to revisit this 2015-11-04 09:59:27 +00:00
Peter Boyle
1271508ca2 Bug fix for spread out in x (EO) direction.
This is really annoying -- it is very hard to thread the loops with the index
recursion on buffer offset in the red-black case. Must think of a good threading
solution here.
2015-11-04 09:57:57 +00:00
Peter Boyle
0a9ebac514 Gparity modifications in the Gparity compressor variant. 2015-08-11 06:22:20 +01:00
Peter Boyle
1d0df449e8 Reorganise of file naming 2015-06-03 12:47:05 +01:00
Azusa Yamaguchi
b00a40dd65 Const safety 2015-06-01 12:25:59 +01:00
Azusa Yamaguchi
12c2562b96 No compile fix on mpi target 2015-05-31 22:50:03 +01:00
Peter Boyle
5644ab1e19 Large scale change to support 5d fermion formulations.
Have 5d replicated wilson with 4d gauge working and matrix regressing
to Ls copies of wilson.
2015-05-31 15:09:02 +01:00
Peter Boyle
67fa5691e5 Weak scale the benchmarks automatically. 2015-05-28 13:47:01 +01:00
neo
da46b56e85 Adding support for doxygen generation 2015-05-27 10:34:56 +09:00
neo
1a24801246 checked performance of new vector libaries.
Added check for c++11 support on the configure.ac
2015-05-26 12:02:54 +09:00
neo
9e29ac6549 Completed implementation of new Grid_simd classes
Tested performance for SSE4, Ok.
AVX1/2, AVX512 yet untested
2015-05-22 17:33:15 +09:00
Peter Boyle
b00622302b gcc doesn't like collapse(2) for some reason I can't figure 2015-05-15 11:36:22 +01:00
Peter Boyle
48f425d31c I have made the Cshift work successfully with open mp threading in
every routine. Collapse(2) is now working under clang-omp++.
2015-05-13 00:31:00 +01:00
Peter Boyle
6103c29ee3 Threading support rework.
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
Peter Boyle
5555a852be Lots of changes required to compile for MIC under ICPC 2015-05-10 23:29:21 +01:00
Peter Boyle
25d523c0f4 Shaken out stencil to the point where I think wilson dslash is correct.
Need to audit code carefully, consolidate between stencil and cshift,
and then benchmark and optimise.
2015-04-28 08:11:59 +01:00
Peter Boyle
f159495a9d Reworking CSHIFT and Stencil. Implementing Wilson and discovered rework is required 2015-04-27 13:45:07 +01:00
Peter Boyle
b32c14b433 Got the NERSC IO working and fixed a bug in cshift. 2015-04-22 22:46:48 +01:00
Peter Boyle
e5a25dfcb1 Build reorg with which I am a bit happier 2015-04-18 21:22:50 +01:00