paboyle
0e6197fbed
Introduce accelerator friendly expression template rewrite.
...
Must obtain and access lattice indexing through a view object that is safe
to copy construct in copy to GPU (without copying the lattice).
2018-03-04 16:03:19 +00:00
paboyle
c1fc947bb8
Coordinate handling GPU friendly + some GPU merge/extract improvements
2018-02-24 22:26:10 +00:00
paboyle
e657f9a344
OMP collapse changes to make NVCC happy
2018-01-28 01:21:53 +00:00
paboyle
70e276e1ab
parallel_for elimination -> thread_loop
2018-01-28 01:01:14 +00:00
paboyle
f574c20118
Zero changes, __VA_ARGS__ and swap
2018-01-27 23:50:17 +00:00
paboyle
c4f82e072b
_grid becomes private ; use Grid()§
2018-01-27 00:04:12 +00:00
paboyle
43cea62855
Hide internal data
2018-01-26 23:06:03 +00:00
paboyle
733f8ff0b2
Still using parallel_for -- don't know how to implement reduction on GPU yet. Look at some sample code is best.
2018-01-24 13:38:13 +00:00
paboyle
08f2a4564f
Namespace, formatting
2018-01-14 23:56:33 +00:00
paboyle
4f8b6f26b4
Merge branch 'develop' into feature/dwf-multirhs
2017-10-02 11:41:49 +01:00
Azusa Yamaguchi
d9cd4f0273
Staggered multinode block cg debugged. Missing global sum.
...
Code stalls and resumes on KNL at cambridge. Curious.
CG iterations 23ms each, then 3200 ms pauses. Mean bandwidth reports
as 200MB/s. Comms dominant in the report. However, the time behaviour suggests it
is *bursty*.... Could be swap to disk?
2017-08-23 15:07:18 +01:00
azusayamaguchi
659d7d1a40
For test/solver
...
Fixed
2017-07-12 15:01:48 +01:00
paboyle
349d75e483
Precision fix
2017-06-23 02:57:59 -07:00
Azusa Yamaguchi
e9cc21900f
Block solver complete for staggered. Now stable on mass 0.003 and
...
gives 8x (!) speed up on Haswell laptop vs. standard CG for 8 RHS solves.
166 iterations vs. 537 iterations so algorithmic gain + 2x in flop rate gain.
Better than a slap in the face with a wet kipper.
2017-06-20 12:37:41 +01:00
Azusa Yamaguchi
cfe3cd76d1
Block solver improvements
2017-06-19 14:04:21 +01:00
Azusa Yamaguchi
f46a67ffb3
No compile issue on clang on mac fixed.
...
Compiler version was clang++-3.9 under mpicxx
2017-05-17 10:51:01 +01:00
paboyle
2439999ec8
Warning elimination; drop to -O2 on G++ bad versions
2017-05-06 14:44:49 +01:00
Guido Cossu
3344788fa1
Merge branch 'develop' into feature/hmc_generalise
2017-05-01 12:13:56 +01:00
paboyle
8e161152e4
MultiRHS solver improvements with slice operations moved into lattice and sped up.
...
Block solver requires a lot of performance work.
2017-04-18 10:51:55 +01:00
paboyle
3141ebac10
MultiRHS working, starting to optimise. Block doesn't and I thought it already was; puzzled.
2017-04-17 10:50:19 +01:00
paboyle
7ede696126
Non compile of tests fixed
2017-04-16 23:40:00 +01:00
paboyle
bf516c3b81
higher precision reduction variables in norm and inner product
2017-04-15 12:27:28 +01:00
paboyle
441a52ee5d
First cut at higher precision reduction
2017-04-15 10:57:21 +01:00
Guido Cossu
8c540333d5
Merge branch 'develop' into feature/hmc_generalise
2017-04-05 14:41:04 +01:00
paboyle
3ae92fa2e6
Global changes to parallel_for structure.
...
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
Guido Cossu
26b9740d53
Some fix for the GenericHMCrunner
2016-10-10 09:43:05 +01:00
Guido Cossu
e3d5319470
Debugged the real() and imag() functions and added tests to Test_Simd
2016-07-06 14:16:03 +01:00
c4c89336fe
SliceSum: shutting down warning about non-threaded code for now
2016-05-01 18:29:57 -07:00
Peter Boyle
7f927a541c
Shmem related fixes for shmem compile
2016-02-11 07:37:39 -06:00
paboyle
aae8bf31a7
Global edit adding copyright and license info to every source file.
2016-01-02 14:51:32 +00:00
paboyle
09bfe52840
Remove extraneous variable
2015-12-21 15:30:28 +00:00
Peter Boyle
d1afebf71e
Sizable improvement in multigrid for unsquared.
...
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01
Substantial effort on timing and logging infrastructure
2015-07-24 01:31:13 +09:00
Azusa Yamaguchi
611f7ec38c
trying to find a way to remove functions from the ET system using explicit
...
expression closure statements. Not sure if this works yet
2015-06-14 01:03:28 +01:00
Peter Boyle
1d0df449e8
Reorganise of file naming
2015-06-03 12:47:05 +01:00