Peter Boyle
bfb1cd36e2
Strip out the dslash kernel implementation
2015-05-26 19:55:18 +01:00
Peter Boyle
840754dd42
Hand unrolled version of dslash in a separate class.
...
Useful to compare; raises Intel compiler from 9GFlop/s to 17.5 Gflops.
on ivybridge core. Raises Clang form 14.5 to 17.5
2015-05-26 19:54:03 +01:00
neo
500f6ed0c5
More cleanup of Grid_simd.h
2015-05-26 13:54:34 +09:00
neo
4dbaa389c8
Cleaning up simd files
2015-05-26 13:31:10 +09:00
neo
48cc816136
Merge remote-tracking branch 'upstream/master'
...
Conflicts:
lib/math/Grid_math_tensors.h
lib/simd/Grid_vector_types.h
2015-05-26 13:14:06 +09:00
neo
1a24801246
checked performance of new vector libaries.
...
Added check for c++11 support on the configure.ac
2015-05-26 12:02:54 +09:00
Peter Boyle
489b1b9633
Schur complement based red-black inversion working
2015-05-25 13:47:12 +01:00
Peter Boyle
ea3240ad55
Better EO support letting Schur solver work
2015-05-25 13:46:28 +01:00
Peter Boyle
956e728b40
Most cosmetic
2015-05-25 13:45:32 +01:00
Peter Boyle
94d679c4e6
Better checkerboard tracking.
2015-05-25 13:45:08 +01:00
Peter Boyle
616f871735
move constants into red black
2015-05-25 13:44:35 +01:00
Peter Boyle
624c0ac3ef
Updates now schur red black solver working
2015-05-25 13:43:58 +01:00
Peter Boyle
ac99832d21
Herm op
2015-05-25 13:42:36 +01:00
Peter Boyle
d30c013721
red black fix
2015-05-25 13:42:12 +01:00
Peter Boyle
5cf285bce9
Merge branch 'master' of https://github.com/paboyle/Grid
2015-05-23 09:36:08 +01:00
Peter Boyle
64fcbd0387
Improving even odd sector; lot of work and through required cleaning this
2015-05-23 09:34:16 +01:00
Peter Boyle
bef9bf0d38
Rely on default constructors
2015-05-23 09:33:42 +01:00
Peter Boyle
eadfb5be67
Better pragma use
2015-05-23 09:32:37 +01:00
Peter Boyle
33737ef57a
Cosmetic
2015-05-23 09:31:15 +01:00
Peter Boyle
32c3f16f95
Iterator required
2015-05-23 09:30:28 +01:00
neo
9e29ac6549
Completed implementation of new Grid_simd classes
...
Tested performance for SSE4, Ok.
AVX1/2, AVX512 yet untested
2015-05-22 17:33:15 +09:00
Peter Boyle
9601890549
Streaming store option ifdef
2015-05-21 06:47:05 +01:00
Peter Boyle
1559dd4adc
Compile time select if we do the streaming store copy. Relies on Clang++ eliminating object copies,
...
and other compliers do not necessarily cope.
2015-05-21 06:39:00 +01:00
Peter Boyle
d0d41b8bce
Didn't like a print statement
2015-05-21 06:36:15 +01:00
Peter Boyle
34960ca50c
Unroll pragma abstraction
2015-05-21 06:34:33 +01:00
neo
d03c4e5901
Merge remote-tracking branch 'upstream/master'
...
Conflicts:
lib/simd/Grid_vector_types.h
tests/Makefile.am
2015-05-20 17:32:46 +09:00
neo
cf7be0e461
Implemented all SSE4 functions.
...
A test code Grid_simd_new.cc has been created to test the new class.
Tests are all OK.
2015-05-20 17:22:40 +09:00
Peter Boyle
221902a882
Merging in
...
Merge branch 'master' of https://github.com/paboyle/Grid
2015-05-19 21:30:13 +01:00
Peter Boyle
a21036e69a
Reworking to keep intel compiler happy
2015-05-19 21:29:07 +01:00
Peter Boyle
8220794c44
Optimisation...
2015-05-19 15:50:47 +01:00
Peter Boyle
fde7f8d6b9
Merged
...
Merge branch 'master' of https://github.com/coppolachan/Grid into coppolachan-master
Conflicts:
lib/simd/Grid_vector_types.h
2015-05-19 15:05:07 +01:00
azusayamaguchi
2d2da8364f
Merge branch 'master' of https://github.com/paboyle/Grid
2015-05-19 14:55:26 +01:00
azusayamaguchi
91f29d4a68
Add messages to get the number of threads for openmp
2015-05-19 14:54:42 +01:00
Peter Boyle
4dba8522a1
Got unpreconditioned conjugate gradient to run and converge on a random (uniform random,
...
not even SU(3) for now) gauge field. Convergence history is correctly indepdendent of decomposition
on 1,2,4,8,16 mpi tasks.
Found a couple of simd bugs which required fixed and enhanced the Grid_simd.cc test suite.
Implemented the Mdag, M, MdagM, Meooe Mooee schur type stuff in the wilson dop.
2015-05-19 13:57:35 +01:00
neo
74e91cd925
Partial implementation of the vector types SIMD
...
Implementing SSE4 now
A systematic series of tests must be written.
2015-05-19 17:21:17 +09:00
neo
baa382f055
Added check of mpfr and gmp at configure time
...
It generates automatically the linker flags or complains if not found.
2015-05-19 13:54:55 +09:00
neo
7ad705066d
Merging with upstream
2015-05-19 13:36:03 +09:00
Peter Boyle
05f1419df4
Merge branch 'master' of https://github.com/coppolachan/Grid into coppolachan-master
...
Conflicts:
lib/algorithms/approx/bigfloat.h
2015-05-18 16:34:21 +01:00
Peter Boyle
17835c6f42
Remez tested
2015-05-18 12:09:25 +01:00
neo
99aecf1f2e
Minor modification to the configure.ac
...
Enables silent rules (use make V=1 to override)
Prints a summary after configure is completed
2015-05-18 17:15:14 +09:00
neo
b4cd37276b
Corrected some compilation errors (zolotarev.h) and SSE4 vsplat and conj to make cshift test pass.
2015-05-18 16:48:14 +09:00
Peter Boyle
11cb3e9a01
Getting closer to having a wilson solver... introducing a first and untested
...
cut at Conjugate gradient. Also copied in Remez, Zolotarev, Chebyshev from
Mike Clark, Tony Kennedy and my BFM package respectively since we know we will
need these. I wanted the structure of
algorithms/approx
algorithms/iterative
etc.. to start taking shape.
2015-05-18 07:47:05 +01:00
Peter Boyle
7992346190
Working towards solvers
2015-05-17 00:19:03 +01:00
Peter Boyle
bf7ab0da7a
Updating preparing for solvers etc..
2015-05-16 23:35:08 +01:00
Peter Boyle
e9ed288b00
Typoo xifed
2015-05-16 05:49:32 +01:00
Peter Boyle
dda3da45fb
Update Grid_lattice_trace.h
2015-05-16 04:40:28 +01:00
Peter Boyle
2e4ba02443
Pretty syntax
2015-05-16 04:37:26 +01:00
Peter Boyle
a19aa9627d
Optimisation and syntax pretty
2015-05-16 04:36:22 +01:00
Peter Boyle
9e29fb2c6a
strong inline
2015-05-16 04:33:10 +01:00
Peter Boyle
9386522543
Compile options tweak
2015-05-15 12:33:18 +01:00
Peter Boyle
331f832c34
Out of source compile now working
2015-05-15 12:21:40 +01:00
Peter Boyle
0b4d3544b9
clang++ 3.4/5/7 compile happy for AVX and SSE
...
icpc compiles happy on MacOSX both with -xCOMMON-AV512 and native AVX
gcc-5 does not compile happy; can work around by renaming lattice peek/poke/transpose/trace templates
relative to tensor ones, but gcc goes into a recursive template instantiation due to
matching error. I think this is a gcc bug and have filed a report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66153
2015-05-15 11:52:11 +01:00
Peter Boyle
882fa27ff5
GCC and ICPC complained on more careful typeing
2015-05-15 11:50:44 +01:00
Peter Boyle
3346b68ccd
Move platform dependent out to Grid_simd.h
2015-05-15 11:50:00 +01:00
Peter Boyle
0afb64bf24
ngo store
2015-05-15 11:49:39 +01:00
Peter Boyle
537f47404b
Parallel for replace
2015-05-15 11:48:04 +01:00
Peter Boyle
a0d041b522
Forces inlining upon icpc
2015-05-15 11:43:49 +01:00
Peter Boyle
8c57bcaece
Force inlining upon icpc
2015-05-15 11:43:20 +01:00
Peter Boyle
519eab8ff0
More elegant enable_if
2015-05-15 11:42:51 +01:00
Peter Boyle
f986e123d2
More elegant to do boolean logic inside the enable_if construct
...
Should have done that from the beginning and should move this into
a global edit
2015-05-15 11:42:03 +01:00
Peter Boyle
70638bf1f1
Force inlining on ICPC because inline apparently is not enoguh
2015-05-15 11:41:31 +01:00
Peter Boyle
54d8972753
strong_inline forces ICPC to do it.
2015-05-15 11:40:59 +01:00
Peter Boyle
5159b26261
Force strong_inline to force ipcc's hand
2015-05-15 11:40:31 +01:00
Peter Boyle
c33ec96fc8
Switch to strong_inline macro to force icpc's hand
2015-05-15 11:40:00 +01:00
Peter Boyle
577325cb7a
Promote to strong inline to force ICPC's hand. Annoying.
2015-05-15 11:39:25 +01:00
Peter Boyle
46c4379592
Formatting change
2015-05-15 11:38:54 +01:00
Peter Boyle
f761ab0f50
Filed bug report Bug 66153 on GCC-5.
...
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66153
2015-05-15 11:38:04 +01:00
Peter Boyle
2a28cfb3a3
Silly formatting change
2015-05-15 11:37:07 +01:00
Peter Boyle
b00622302b
gcc doesn't like collapse(2) for some reason I can't figure
2015-05-15 11:36:22 +01:00
Peter Boyle
3057b2762a
ICPC and GCC5 fixes
2015-05-15 11:35:02 +01:00
Peter Boyle
151a6f4e14
Using boolean logic inside enable_if is more elegant
2015-05-15 11:32:45 +01:00
Peter Boyle
a36c974f26
Key of mm_malloc.h
2015-05-15 11:32:11 +01:00
Peter Boyle
c0977dcfaa
strong inline required to force icpc
2015-05-15 11:31:41 +01:00
Peter Boyle
f1255197c2
Linear op added
2015-05-13 11:25:34 +01:00
Peter Boyle
e179828662
OMP dslash working
2015-05-13 10:59:22 +01:00
Peter Boyle
a108d5d3b0
cout IO for all types
2015-05-13 09:24:10 +01:00
Peter Boyle
48f425d31c
I have made the Cshift work successfully with open mp threading in
...
every routine. Collapse(2) is now working under clang-omp++.
2015-05-13 00:31:00 +01:00
Peter Boyle
6cec662ac5
Enhanced SIMD interfacing
2015-05-12 20:41:44 +01:00
Peter Boyle
6103c29ee3
Threading support rework.
...
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
Peter Boyle
b1d2c60d07
Moving some things around for pretty
2015-05-11 19:09:49 +01:00
Peter Boyle
22d384b07d
Adding a better controlled threading class, preparing to
...
force in deterministic reduction.
2015-05-11 18:59:03 +01:00
Peter Boyle
f5dcca7b1b
Got command line args working
2015-05-11 14:36:48 +01:00
paboyle
379943abf5
Command line args and a general clean up
2015-05-11 12:43:10 +01:00
Peter Boyle
5555a852be
Lots of changes required to compile for MIC under ICPC
2015-05-10 23:29:21 +01:00
Peter Boyle
48b9692845
Merge branch 'master' of https://github.com/paboyle/Grid
...
Conflicts:
lib/qcd/Grid_qcd_wilson_dop.cc
2015-05-10 15:37:47 +01:00
Peter Boyle
b802abc83f
Expression template hack
2015-05-10 15:35:30 +01:00
Peter Boyle
14591c72d6
Expression template engin
2015-05-10 15:34:20 +01:00
Peter Boyle
02ae26d091
Small tweak to enable benchmarking to suppress gauge field bandwidth as a test.
...
This is a short term hack while I benchmark.
2015-05-10 15:25:23 +01:00
Peter Boyle
2ffd941d67
Assertion should never hit, but did due to a bug
2015-05-10 15:24:37 +01:00
Peter Boyle
ca554f661b
Moving operator stuff into separate file so that we can switch on/off replacement with
...
expression templates
2015-05-10 15:23:49 +01:00
Peter Boyle
29be76f958
Fixing breakage in the Comms non compile
2015-05-10 15:23:09 +01:00
Peter Boyle
e3acb36de6
Bringing expression templates for faster vector loops
2015-05-10 15:22:31 +01:00
Peter Boyle
55ccb8ccf4
Wilson perf improvements with Gauge prefetching
2015-05-06 06:37:21 +01:00
Peter Boyle
35d949cc17
Cleaned up for Linux
2015-05-05 22:09:22 +01:00
Peter Boyle
b9d16a7191
streaming store cases
2015-05-05 18:14:09 +01:00
Peter Boyle
07d57b6d55
Streaming store option
2015-05-05 18:13:06 +01:00
Peter Boyle
5ebc7a1756
Added streaming stores
2015-05-05 18:09:28 +01:00
Peter Boyle
aeda7b923d
Back to vector for now; cost of init loop is clear in the a*x + y
...
loop in memory benchmark and must move to better container class.
2015-05-03 09:48:13 +01:00
Peter Boyle
193860dbc8
Comms and memory benchmarks added
2015-05-03 09:44:47 +01:00
Peter Boyle
f663be2a6c
Added a comms benchmark
2015-05-02 23:42:30 +01:00