paboyle
ac56965306
GPU changes and threading macros replaced
2018-01-24 13:28:30 +00:00
paboyle
fcf1ccf669
Namespace, indent, badly formatted
2018-01-15 00:17:58 +00:00
Azusa Yamaguchi
b83b2b1415
Stability improvement to BCG. Force m_rr hermitian beyond rounding.
2017-09-04 14:09:47 +01:00
Azusa Yamaguchi
d9cd4f0273
Staggered multinode block cg debugged. Missing global sum.
...
Code stalls and resumes on KNL at cambridge. Curious.
CG iterations 23ms each, then 3200 ms pauses. Mean bandwidth reports
as 200MB/s. Comms dominant in the report. However, the time behaviour suggests it
is *bursty*.... Could be swap to disk?
2017-08-23 15:07:18 +01:00
paboyle
7e35286860
Simplified lanczos, added Eigen diagonalisation.
...
Curious if we can deprecate dependencly on BLAS.
Will see when we get 48^3 running on our BG/Q port
2017-06-21 02:26:03 +01:00
Azusa Yamaguchi
e9cc21900f
Block solver complete for staggered. Now stable on mass 0.003 and
...
gives 8x (!) speed up on Haswell laptop vs. standard CG for 8 RHS solves.
166 iterations vs. 537 iterations so algorithmic gain + 2x in flop rate gain.
Better than a slap in the face with a wet kipper.
2017-06-20 12:37:41 +01:00
Azusa Yamaguchi
cfe3cd76d1
Block solver improvements
2017-06-19 14:04:21 +01:00
paboyle
8e161152e4
MultiRHS solver improvements with slice operations moved into lattice and sped up.
...
Block solver requires a lot of performance work.
2017-04-18 10:51:55 +01:00
paboyle
3141ebac10
MultiRHS working, starting to optimise. Block doesn't and I thought it already was; puzzled.
2017-04-17 10:50:19 +01:00
paboyle
7ede696126
Non compile of tests fixed
2017-04-16 23:40:00 +01:00
paboyle
b12dc89d26
Commenting and clean up
2017-04-10 20:38:20 +09:00
paboyle
d80d802f9d
MultiRHS solver test
2017-04-10 00:12:12 +09:00
paboyle
3d99b09dba
Start of blockCG
2017-04-09 23:42:10 +09:00