1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-05-02 08:24:12 +01:00
Commit Graph

35 Commits

Author SHA1 Message Date
Peter Boyle a75b6f6e78 Large scale change to support 5d fermion formulations.
Have 5d replicated wilson with 4d gauge working and matrix regressing
to Ls copies of wilson.
2015-05-31 15:09:02 +01:00
Peter Boyle a32ac287bb Hand unrolled version of dslash in a separate class.
Useful to compare; raises Intel compiler from 9GFlop/s to 17.5 Gflops.
                   on ivybridge core. Raises Clang form 14.5 to 17.5
2015-05-26 19:54:03 +01:00
Peter Boyle 3358a77c7a Better checkerboard tracking. 2015-05-25 13:45:08 +01:00
Peter Boyle a2928321b6 Better pragma use 2015-05-23 09:32:37 +01:00
Peter Boyle d8061afe24 Streaming store option ifdef 2015-05-21 06:47:05 +01:00
Peter Boyle 874b2eb32d Compile time select if we do the streaming store copy. Relies on Clang++ eliminating object copies,
and other compliers do not necessarily cope.
2015-05-21 06:39:00 +01:00
Peter Boyle ffc00caea3 Got unpreconditioned conjugate gradient to run and converge on a random (uniform random,
not even SU(3) for now) gauge field. Convergence history is correctly indepdendent of decomposition
on 1,2,4,8,16 mpi tasks.
Found a couple of simd bugs which required fixed and enhanced the Grid_simd.cc test suite.
Implemented the Mdag, M, MdagM, Meooe Mooee schur type stuff in the wilson dop.
2015-05-19 13:57:35 +01:00
Peter Boyle 39e7ef1243 Typoo xifed 2015-05-16 05:49:32 +01:00
Peter Boyle 9c38a52bad Update Grid_lattice_trace.h 2015-05-16 04:40:28 +01:00
Peter Boyle 9f0e990b40 Optimisation and syntax pretty 2015-05-16 04:36:22 +01:00
Peter Boyle 49f56a25d1 strong inline 2015-05-16 04:33:10 +01:00
Peter Boyle 8d77d758c3 Parallel for replace 2015-05-15 11:48:04 +01:00
Peter Boyle 5b46992a15 Formatting change 2015-05-15 11:38:54 +01:00
Peter Boyle e7d25647e6 Filed bug report Bug 66153 on GCC-5.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66153
2015-05-15 11:38:04 +01:00
Peter Boyle c28551f40f Silly formatting change 2015-05-15 11:37:07 +01:00
Peter Boyle add4495a4a cout IO for all types 2015-05-13 09:24:10 +01:00
Peter Boyle 556befaaaa Enhanced SIMD interfacing 2015-05-12 20:41:44 +01:00
Peter Boyle c6baa3e657 Threading support rework.
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
Peter Boyle 6e6843ac69 Moving some things around for pretty 2015-05-11 19:09:49 +01:00
Peter Boyle 2203c6e597 Lots of changes required to compile for MIC under ICPC 2015-05-10 23:29:21 +01:00
Peter Boyle 1ec1b4ee44 Expression template hack 2015-05-10 15:35:30 +01:00
Peter Boyle 1ab92563b9 Expression template engin 2015-05-10 15:34:20 +01:00
Peter Boyle 961fbb2718 Assertion should never hit, but did due to a bug 2015-05-10 15:24:37 +01:00
Peter Boyle 4a8fd55f52 Moving operator stuff into separate file so that we can switch on/off replacement with
expression templates
2015-05-10 15:23:49 +01:00
Peter Boyle cb4b82b09f streaming store cases 2015-05-05 18:14:09 +01:00
Peter Boyle 9d93d1e6d4 Comms and memory benchmarks added 2015-05-03 09:44:47 +01:00
Peter Boyle b0485894b3 Shaken out stencil to the point where I think wilson dslash is correct.
Need to audit code carefully, consolidate between stencil and cshift,
and then benchmark and optimise.
2015-04-28 08:11:59 +01:00
Peter Boyle 35cfef2129 Big updates with progress towards wilson matrix 2015-04-26 15:51:09 +01:00
Peter Boyle fc32450360 Improved the gamma quite a bit.
Serial rng's which are set on node zero and broadcaste
2015-04-24 20:21:40 +01:00
Peter Boyle 74432432b6 Moved code from summation into transfer and reduction 2015-04-24 18:40:44 +01:00
Peter Boyle 62e8d2d127 Slice summation working. May move this into lattice/Grid_lattice_reduction however 2015-04-23 15:13:00 +01:00
Peter Boyle 1851327d19 Got the NERSC IO working and fixed a bug in cshift. 2015-04-22 22:46:48 +01:00
Peter Boyle a5b0c492d7 Rework of RNG to use C++11 random. Should work correctly maintaining parallel RNG across
a machine. If a "fixedSeed" is used, randoms should be reproducible across different machine
decomposition since the generators are physically indexed and assigned in lexico ordering.
2015-04-19 14:55:58 +01:00
Peter Boyle f64d39ab57 Split all OMP directives into lattice subdir for easy maintainance of
parallelism and future OMP 4.0 offload.
2015-04-18 22:17:01 +01:00
Peter Boyle aee6669d0b Build reorg with which I am a bit happier 2015-04-18 21:22:50 +01:00