5644ab1e19
Large scale change to support 5d fermion formulations.
...
Have 5d replicated wilson with 4d gauge working and matrix regressing
to Ls copies of wilson.
2015-05-31 15:09:02 +01:00
6ef0096dc9
Strip out the dslash kernel implementation
2015-05-26 19:55:18 +01:00
5e72e4c0d9
Strip out the dslash kernel implementation
2015-05-26 19:55:18 +01:00
bfb1cd36e2
Strip out the dslash kernel implementation
2015-05-26 19:55:18 +01:00
20100d0a40
Hand unrolled version of dslash in a separate class.
...
Useful to compare; raises Intel compiler from 9GFlop/s to 17.5 Gflops.
on ivybridge core. Raises Clang form 14.5 to 17.5
2015-05-26 19:54:03 +01:00
a32ac287bb
Hand unrolled version of dslash in a separate class.
...
Useful to compare; raises Intel compiler from 9GFlop/s to 17.5 Gflops.
on ivybridge core. Raises Clang form 14.5 to 17.5
2015-05-26 19:54:03 +01:00
840754dd42
Hand unrolled version of dslash in a separate class.
...
Useful to compare; raises Intel compiler from 9GFlop/s to 17.5 Gflops.
on ivybridge core. Raises Clang form 14.5 to 17.5
2015-05-26 19:54:03 +01:00
201a110c51
Better EO support letting Schur solver work
2015-05-25 13:46:28 +01:00
1a9841a0f1
Better EO support letting Schur solver work
2015-05-25 13:46:28 +01:00
ea3240ad55
Better EO support letting Schur solver work
2015-05-25 13:46:28 +01:00
2d30e82dcb
Improving even odd sector; lot of work and through required cleaning this
2015-05-23 09:34:16 +01:00
65f2e6b269
Improving even odd sector; lot of work and through required cleaning this
2015-05-23 09:34:16 +01:00
64fcbd0387
Improving even odd sector; lot of work and through required cleaning this
2015-05-23 09:34:16 +01:00
2d8b5a8191
Optimisation...
2015-05-19 15:50:47 +01:00
46ab8edf30
Optimisation...
2015-05-19 15:50:47 +01:00
8220794c44
Optimisation...
2015-05-19 15:50:47 +01:00
a6e1ea216d
Got unpreconditioned conjugate gradient to run and converge on a random (uniform random,
...
not even SU(3) for now) gauge field. Convergence history is correctly indepdendent of decomposition
on 1,2,4,8,16 mpi tasks.
Found a couple of simd bugs which required fixed and enhanced the Grid_simd.cc test suite.
Implemented the Mdag, M, MdagM, Meooe Mooee schur type stuff in the wilson dop.
2015-05-19 13:57:35 +01:00
ffc00caea3
Got unpreconditioned conjugate gradient to run and converge on a random (uniform random,
...
not even SU(3) for now) gauge field. Convergence history is correctly indepdendent of decomposition
on 1,2,4,8,16 mpi tasks.
Found a couple of simd bugs which required fixed and enhanced the Grid_simd.cc test suite.
Implemented the Mdag, M, MdagM, Meooe Mooee schur type stuff in the wilson dop.
2015-05-19 13:57:35 +01:00
4dba8522a1
Got unpreconditioned conjugate gradient to run and converge on a random (uniform random,
...
not even SU(3) for now) gauge field. Convergence history is correctly indepdendent of decomposition
on 1,2,4,8,16 mpi tasks.
Found a couple of simd bugs which required fixed and enhanced the Grid_simd.cc test suite.
Implemented the Mdag, M, MdagM, Meooe Mooee schur type stuff in the wilson dop.
2015-05-19 13:57:35 +01:00
6d2accba7b
Corrected some compilation errors (zolotarev.h) and SSE4 vsplat and conj to make cshift test pass.
2015-05-18 16:48:14 +09:00
cee363e28c
Corrected some compilation errors (zolotarev.h) and SSE4 vsplat and conj to make cshift test pass.
2015-05-18 16:48:14 +09:00
b4cd37276b
Corrected some compilation errors (zolotarev.h) and SSE4 vsplat and conj to make cshift test pass.
2015-05-18 16:48:14 +09:00
1887c77498
Getting closer to having a wilson solver... introducing a first and untested
...
cut at Conjugate gradient. Also copied in Remez, Zolotarev, Chebyshev from
Mike Clark, Tony Kennedy and my BFM package respectively since we know we will
need these. I wanted the structure of
algorithms/approx
algorithms/iterative
etc.. to start taking shape.
2015-05-18 07:47:05 +01:00
d0e4673a3f
Getting closer to having a wilson solver... introducing a first and untested
...
cut at Conjugate gradient. Also copied in Remez, Zolotarev, Chebyshev from
Mike Clark, Tony Kennedy and my BFM package respectively since we know we will
need these. I wanted the structure of
algorithms/approx
algorithms/iterative
etc.. to start taking shape.
2015-05-18 07:47:05 +01:00
11cb3e9a01
Getting closer to having a wilson solver... introducing a first and untested
...
cut at Conjugate gradient. Also copied in Remez, Zolotarev, Chebyshev from
Mike Clark, Tony Kennedy and my BFM package respectively since we know we will
need these. I wanted the structure of
algorithms/approx
algorithms/iterative
etc.. to start taking shape.
2015-05-18 07:47:05 +01:00
e841395dfd
Updating preparing for solvers etc..
2015-05-16 23:35:08 +01:00
dc6b6bdc96
Updating preparing for solvers etc..
2015-05-16 23:35:08 +01:00
bf7ab0da7a
Updating preparing for solvers etc..
2015-05-16 23:35:08 +01:00
e3b61bdfce
Forces inlining upon icpc
2015-05-15 11:43:49 +01:00
0e7945fe54
Forces inlining upon icpc
2015-05-15 11:43:49 +01:00
a0d041b522
Forces inlining upon icpc
2015-05-15 11:43:49 +01:00
7f3ae64a31
OMP dslash working
2015-05-13 10:59:22 +01:00
0097b81778
OMP dslash working
2015-05-13 10:59:22 +01:00
e179828662
OMP dslash working
2015-05-13 10:59:22 +01:00
b4a570477c
I have made the Cshift work successfully with open mp threading in
...
every routine. Collapse(2) is now working under clang-omp++.
2015-05-13 00:31:00 +01:00
541d52ab97
I have made the Cshift work successfully with open mp threading in
...
every routine. Collapse(2) is now working under clang-omp++.
2015-05-13 00:31:00 +01:00
48f425d31c
I have made the Cshift work successfully with open mp threading in
...
every routine. Collapse(2) is now working under clang-omp++.
2015-05-13 00:31:00 +01:00
65c91eae64
Threading support rework.
...
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
c6baa3e657
Threading support rework.
...
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
6103c29ee3
Threading support rework.
...
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
fa5779537c
Command line args and a general clean up
2015-05-11 12:43:10 +01:00
b42453d1fd
Command line args and a general clean up
2015-05-11 12:43:10 +01:00
379943abf5
Command line args and a general clean up
2015-05-11 12:43:10 +01:00
242e447bc5
Lots of changes required to compile for MIC under ICPC
2015-05-10 23:29:21 +01:00
2203c6e597
Lots of changes required to compile for MIC under ICPC
2015-05-10 23:29:21 +01:00
5555a852be
Lots of changes required to compile for MIC under ICPC
2015-05-10 23:29:21 +01:00
352bccf6ca
Merge branch 'master' of https://github.com/paboyle/Grid
...
Conflicts:
lib/qcd/Grid_qcd_wilson_dop.cc
2015-05-10 15:37:47 +01:00
4da2c2ea00
Merge branch 'master' of https://github.com/paboyle/Grid
...
Conflicts:
lib/qcd/Grid_qcd_wilson_dop.cc
2015-05-10 15:37:47 +01:00
48b9692845
Merge branch 'master' of https://github.com/paboyle/Grid
...
Conflicts:
lib/qcd/Grid_qcd_wilson_dop.cc
2015-05-10 15:37:47 +01:00
133493dc79
Small tweak to enable benchmarking to suppress gauge field bandwidth as a test.
...
This is a short term hack while I benchmark.
2015-05-10 15:25:23 +01:00