1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-04-05 03:35:55 +01:00

35 Commits

Author SHA1 Message Date
Peter Boyle
5644ab1e19 Large scale change to support 5d fermion formulations.
Have 5d replicated wilson with 4d gauge working and matrix regressing
to Ls copies of wilson.
2015-05-31 15:09:02 +01:00
Peter Boyle
840754dd42 Hand unrolled version of dslash in a separate class.
Useful to compare; raises Intel compiler from 9GFlop/s to 17.5 Gflops.
                   on ivybridge core. Raises Clang form 14.5 to 17.5
2015-05-26 19:54:03 +01:00
Peter Boyle
94d679c4e6 Better checkerboard tracking. 2015-05-25 13:45:08 +01:00
Peter Boyle
eadfb5be67 Better pragma use 2015-05-23 09:32:37 +01:00
Peter Boyle
9601890549 Streaming store option ifdef 2015-05-21 06:47:05 +01:00
Peter Boyle
1559dd4adc Compile time select if we do the streaming store copy. Relies on Clang++ eliminating object copies,
and other compliers do not necessarily cope.
2015-05-21 06:39:00 +01:00
Peter Boyle
4dba8522a1 Got unpreconditioned conjugate gradient to run and converge on a random (uniform random,
not even SU(3) for now) gauge field. Convergence history is correctly indepdendent of decomposition
on 1,2,4,8,16 mpi tasks.
Found a couple of simd bugs which required fixed and enhanced the Grid_simd.cc test suite.
Implemented the Mdag, M, MdagM, Meooe Mooee schur type stuff in the wilson dop.
2015-05-19 13:57:35 +01:00
Peter Boyle
e9ed288b00 Typoo xifed 2015-05-16 05:49:32 +01:00
Peter Boyle
dda3da45fb Update Grid_lattice_trace.h 2015-05-16 04:40:28 +01:00
Peter Boyle
a19aa9627d Optimisation and syntax pretty 2015-05-16 04:36:22 +01:00
Peter Boyle
9e29fb2c6a strong inline 2015-05-16 04:33:10 +01:00
Peter Boyle
537f47404b Parallel for replace 2015-05-15 11:48:04 +01:00
Peter Boyle
46c4379592 Formatting change 2015-05-15 11:38:54 +01:00
Peter Boyle
f761ab0f50 Filed bug report Bug 66153 on GCC-5.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66153
2015-05-15 11:38:04 +01:00
Peter Boyle
2a28cfb3a3 Silly formatting change 2015-05-15 11:37:07 +01:00
Peter Boyle
a108d5d3b0 cout IO for all types 2015-05-13 09:24:10 +01:00
Peter Boyle
6cec662ac5 Enhanced SIMD interfacing 2015-05-12 20:41:44 +01:00
Peter Boyle
6103c29ee3 Threading support rework.
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
Peter Boyle
b1d2c60d07 Moving some things around for pretty 2015-05-11 19:09:49 +01:00
Peter Boyle
5555a852be Lots of changes required to compile for MIC under ICPC 2015-05-10 23:29:21 +01:00
Peter Boyle
b802abc83f Expression template hack 2015-05-10 15:35:30 +01:00
Peter Boyle
14591c72d6 Expression template engin 2015-05-10 15:34:20 +01:00
Peter Boyle
2ffd941d67 Assertion should never hit, but did due to a bug 2015-05-10 15:24:37 +01:00
Peter Boyle
ca554f661b Moving operator stuff into separate file so that we can switch on/off replacement with
expression templates
2015-05-10 15:23:49 +01:00
Peter Boyle
b9d16a7191 streaming store cases 2015-05-05 18:14:09 +01:00
Peter Boyle
193860dbc8 Comms and memory benchmarks added 2015-05-03 09:44:47 +01:00
Peter Boyle
25d523c0f4 Shaken out stencil to the point where I think wilson dslash is correct.
Need to audit code carefully, consolidate between stencil and cshift,
and then benchmark and optimise.
2015-04-28 08:11:59 +01:00
Peter Boyle
94f728bee4 Big updates with progress towards wilson matrix 2015-04-26 15:51:09 +01:00
Peter Boyle
9ec3529864 Improved the gamma quite a bit.
Serial rng's which are set on node zero and broadcaste
2015-04-24 20:21:40 +01:00
Peter Boyle
128ad0999f Moved code from summation into transfer and reduction 2015-04-24 18:40:44 +01:00
Peter Boyle
52a6ba9767 Slice summation working. May move this into lattice/Grid_lattice_reduction however 2015-04-23 15:13:00 +01:00
Peter Boyle
b32c14b433 Got the NERSC IO working and fixed a bug in cshift. 2015-04-22 22:46:48 +01:00
Peter Boyle
42f167ea37 Rework of RNG to use C++11 random. Should work correctly maintaining parallel RNG across
a machine. If a "fixedSeed" is used, randoms should be reproducible across different machine
decomposition since the generators are physically indexed and assigned in lexico ordering.
2015-04-19 14:55:58 +01:00
Peter Boyle
5483ed641e Split all OMP directives into lattice subdir for easy maintainance of
parallelism and future OMP 4.0 offload.
2015-04-18 22:17:01 +01:00
Peter Boyle
e5a25dfcb1 Build reorg with which I am a bit happier 2015-04-18 21:22:50 +01:00