Peter Boyle
840754dd42
Hand unrolled version of dslash in a separate class.
...
Useful to compare; raises Intel compiler from 9GFlop/s to 17.5 Gflops.
on ivybridge core. Raises Clang form 14.5 to 17.5
2015-05-26 19:54:03 +01:00
Peter Boyle
ea3240ad55
Better EO support letting Schur solver work
2015-05-25 13:46:28 +01:00
Peter Boyle
64fcbd0387
Improving even odd sector; lot of work and through required cleaning this
2015-05-23 09:34:16 +01:00
Peter Boyle
8220794c44
Optimisation...
2015-05-19 15:50:47 +01:00
Peter Boyle
4dba8522a1
Got unpreconditioned conjugate gradient to run and converge on a random (uniform random,
...
not even SU(3) for now) gauge field. Convergence history is correctly indepdendent of decomposition
on 1,2,4,8,16 mpi tasks.
Found a couple of simd bugs which required fixed and enhanced the Grid_simd.cc test suite.
Implemented the Mdag, M, MdagM, Meooe Mooee schur type stuff in the wilson dop.
2015-05-19 13:57:35 +01:00
Peter Boyle
11cb3e9a01
Getting closer to having a wilson solver... introducing a first and untested
...
cut at Conjugate gradient. Also copied in Remez, Zolotarev, Chebyshev from
Mike Clark, Tony Kennedy and my BFM package respectively since we know we will
need these. I wanted the structure of
algorithms/approx
algorithms/iterative
etc.. to start taking shape.
2015-05-18 07:47:05 +01:00
Peter Boyle
bf7ab0da7a
Updating preparing for solvers etc..
2015-05-16 23:35:08 +01:00
Peter Boyle
e179828662
OMP dslash working
2015-05-13 10:59:22 +01:00
Peter Boyle
48f425d31c
I have made the Cshift work successfully with open mp threading in
...
every routine. Collapse(2) is now working under clang-omp++.
2015-05-13 00:31:00 +01:00
Peter Boyle
6103c29ee3
Threading support rework.
...
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
paboyle
379943abf5
Command line args and a general clean up
2015-05-11 12:43:10 +01:00
Peter Boyle
5555a852be
Lots of changes required to compile for MIC under ICPC
2015-05-10 23:29:21 +01:00
Peter Boyle
48b9692845
Merge branch 'master' of https://github.com/paboyle/Grid
...
Conflicts:
lib/qcd/Grid_qcd_wilson_dop.cc
2015-05-10 15:37:47 +01:00
Peter Boyle
02ae26d091
Small tweak to enable benchmarking to suppress gauge field bandwidth as a test.
...
This is a short term hack while I benchmark.
2015-05-10 15:25:23 +01:00
Peter Boyle
55ccb8ccf4
Wilson perf improvements with Gauge prefetching
2015-05-06 06:37:21 +01:00
Peter Boyle
35d949cc17
Cleaned up for Linux
2015-05-05 22:09:22 +01:00
Peter Boyle
a98c01c86a
Integrated Lebesgue code and been playing with alternate implementations of the wilson dop without
...
any particular success in increasing the performance.
2015-04-30 16:39:06 +01:00
Peter Boyle
c72db6c6f6
Fixed the stencil sector and Wilson now agrees between stencil based implementation
...
and the cshift based implementation. Managed to reduce the volume of code in this
sector a little, but consolidation would be good, perhaps taking common
logic out into simple helper functions
2015-04-29 06:23:56 +01:00
Peter Boyle
25d523c0f4
Shaken out stencil to the point where I think wilson dslash is correct.
...
Need to audit code carefully, consolidate between stencil and cshift,
and then benchmark and optimise.
2015-04-28 08:11:59 +01:00
Peter Boyle
f159495a9d
Reworking CSHIFT and Stencil. Implementing Wilson and discovered rework is required
2015-04-27 13:45:07 +01:00
Peter Boyle
94f728bee4
Big updates with progress towards wilson matrix
2015-04-26 15:51:09 +01:00
Peter Boyle
51f0da7b93
Starting the implementation of wilson; incomplete and committing non-functional code which
...
is not yet included from elsewhere or linked to the build system.
2015-04-25 14:33:02 +01:00