1
0
mirror of https://github.com/paboyle/Grid.git synced 2026-05-12 13:14:31 +01:00

Compare commits

..

4057 Commits

Author SHA1 Message Date
Peter Boyle ac1d655de8 Const correctness patch 2018-11-19 10:38:36 +00:00
portelli 3023287fd9 Hadrons: 3-index RO access to Eigen disk vector 2018-10-16 14:44:14 +01:00
portelli b3d6805638 Merge branch 'feature/contractor' into develop 2018-10-16 11:29:37 +01:00
portelli 291bc2a1f0 IO benchmark on a list of directories 2018-10-15 17:25:08 +01:00
portelli 2f368c33fc Hadrons: copyright update 2018-10-15 15:51:45 +01:00
portelli 9592115341 Hadrons: NPR and gauge fixing linking fix 2018-10-15 15:49:42 +01:00
Peter Boyle 24c07694bc Mixed precision now supported in MADWF 2018-10-14 00:22:52 +01:00
Peter Boyle f0229025e2 MADWF working across a range of actions 2018-10-13 19:55:03 +01:00
Peter Boyle 6de9a45a09 NPR first cut by Julia Kettle 2018-10-12 11:00:58 +01:00
Peter Boyle 03c3d495a2 First cut (non functional NPR code) developed by Julia Kettle 2018-10-12 10:59:33 +01:00
Peter Boyle 49f25e08e8 PauliVillars based 4D -> 5D reconstruction with Fourier Accelerated PV inverse
by Christoph. Differs from the one by Rudy in BFM since it vectorises the twisted
4D solves in pairs.
2018-10-11 12:35:32 +01:00
portelli efc0c65056 Hadrons: DiskVector Eigen specialisation with binary I/O and sha256 correctness check 2018-10-08 19:02:00 +01:00
portelli 936eaac8e1 function to get the sha256 string 2018-10-08 19:00:50 +01:00
portelli fe6a372f75 Hadrons: fixes and cleaning in the scalar SU(N) part 2018-10-08 15:14:08 +01:00
portelli 148fc052bd Hadrons: Aslash field, tested 2018-10-05 21:04:10 +01:00
portelli c073341a10 Hadrons: more cleaning 2018-10-05 19:50:41 +01:00
portelli 78299daaac Hadrons: code cleaning 2018-10-05 16:47:52 +01:00
portelli 866449c804 Hadrons: integration of Peter's A2Autils 2018-10-05 16:42:44 +01:00
portelli d69a52079f Merge remote-tracking branch 'gh/feature/a2a-integration' into feature/aslashfield 2018-10-05 15:39:09 +01:00
portelli 9f4f8a14a3 Hadrons: code cleaning 2018-10-05 15:38:01 +01:00
portelli f6593dc881 Hadrons: A2A block performance counter fix 2018-10-05 15:11:01 +01:00
Peter Boyle b46d31d4b6 MKL enable on Eigen if Grid is configured to use MKL 2018-10-05 11:29:40 +01:00
portelli 58567fc650 Hadrons: big update abstracting the block meson field routine, tested & working, performance counters broken and code dirty 2018-10-04 20:01:49 +01:00
Peter Boyle 7c57cac670 Adding A2A utils class for containing kernels. 2018-10-04 18:57:41 +01:00
portelli d0b21bf1ff Merge branch 'feature/eigenpack-convert' into develop 2018-10-04 18:26:45 +01:00
portelli a1825d1f59 Hadrons: final fix for multiprec eigenpacks 2018-10-04 18:25:26 +01:00
portelli 5a3e83ff7b Hadrons: new layer in eigenpacks class hierarchy 2018-10-03 14:45:01 +01:00
portelli 52569d98d8 Hadrons: multiprec eigenpack I/O fix 2018-10-03 14:24:43 +01:00
portelli b351103c29 Hadrons: eigenpack load module with 32bit I/O 2018-10-02 21:07:56 +01:00
portelli 118cca4681 Hadrons: linking fix 2018-10-02 20:08:49 +01:00
portelli 44de727cd2 Hadrons: eigenpack support for multiprecision I/O 2018-10-02 19:51:09 +01:00
portelli 888ebc3cf9 Hadrons: better name for the EP converter 2018-10-02 15:22:18 +01:00
portelli 6c031a1b81 Merge branch 'feature/eigenpack-convert' into develop 2018-10-02 14:57:30 +01:00
portelli 02aa4bd762 Hadrons: cleaner eigenpack convert log 2018-10-02 13:43:25 +01:00
portelli 9aafa8ee60 Hadrons: eigenpack converter generalised for RB/5d grids 2018-10-02 13:34:17 +01:00
portelli 430b98b354 fix previous commit 2018-10-02 13:12:46 +01:00
portelli 84189867ef Hadrons: eigenpack converter with RB grids (to be generalised) 2018-10-02 13:05:05 +01:00
portelli 4ab8cfbe2a Hadrons: more verbose eigenpack convert 2018-10-02 12:24:45 +01:00
portelli aadd9f4468 Eigenpack converter, to be tested, HadronsXmlRun moved to Utilities directory 2018-10-02 00:02:34 +01:00
portelli 8fbb27ce13 Hadrons: less code duplication in eigenpack IO 2018-10-01 20:15:21 +01:00
portelli 21bba95909 Hadrons: eigenpack metadata is no ignored anymore when reading 2018-10-01 19:33:45 +01:00
portelli 6448fe7121 More flexible XML control in Lime files 2018-10-01 19:32:50 +01:00
portelli 2458a11d1d Hadrons: precision cast module 2018-09-29 18:00:08 +01:00
portelli d0ca7c3fe6 Hadrons: big update for getGrid, grids are now created automatically 2018-09-29 17:55:19 +01:00
portelli 57f899d79c Merge branch 'develop' of github.com:paboyle/Grid into develop 2018-09-29 15:50:59 +01:00
portelli e881a0c157 Merge commit 'beed527ea37c90fd5e19b82d326eb8adc8eba5ff' into develop 2018-09-29 15:50:21 +01:00
portelli f411657118 JSON update 2018-09-29 15:48:05 +01:00
Peter Boyle 7458c6174b Use operator() for indexing internal indices 2018-09-27 06:42:02 +01:00
Peter Boyle 21b269d0f9 Move the Grid.pdf out of a deep directory 2018-09-27 06:36:25 +01:00
Peter Boyle 083af92ac2 Update from chulwoo ; high level link for Grid.pdf in documentation 2018-09-27 06:30:40 +01:00
Peter Boyle 2c162577b5 HMC documentation 2018-09-25 23:28:17 +01:00
Peter Boyle b1c4e96382 Updates to actions etc.. 2018-09-24 22:10:30 +01:00
Peter Boyle a55c6f34f3 Updated docs 2018-09-24 15:44:35 +01:00
Peter Boyle beed527ea3 Carletons chapter 2018-09-24 15:09:51 +01:00
portelli eaa633cf69 Merge branch 'develop' of github.com:paboyle/Grid into develop 2018-09-21 18:16:22 +01:00
portelli c632455129 Hadrons: meson field IO fix 2018-09-21 18:16:01 +01:00
portelli c012899ed5 Hadrons: big update after templating of get/createGrid 2018-09-21 18:15:33 +01:00
paboyle 8bab544c2f Updated manual pdf 2018-09-20 18:51:11 +01:00
paboyle 76fc06a5dc Updates with todo from Carleton 2018-09-20 18:50:11 +01:00
portelli 4af6c7e7aa Hadrons: copyright update 2018-09-14 12:51:48 +01:00
portelli f60fbcfc4d Hadrons: mixed precision CG, to be tested 2018-09-14 12:47:55 +01:00
portelli 464c81706e Hadrons: defaults Impls for different precisions 2018-09-14 12:46:43 +01:00
portelli 408130b808 Hadrons: header list fix 2018-09-10 17:38:54 +01:00
portelli 375edd1370 file forgotten in last commit 2018-09-10 17:37:29 +01:00
portelli 6d912f6c67 Hadrons: general guesser factory 2018-09-10 17:36:54 +01:00
portelli 6d1d28955e Guesser class is redundant, switching to LinearFunction 2018-09-10 17:35:54 +01:00
portelli 920b471761 Hadrons tests update 2018-09-10 15:32:13 +01:00
portelli 63c21767ba Hadrons: grids stored with hash of SIMD type (for mixed-precision setups) 2018-09-10 15:31:39 +01:00
portelli 7b6b712565 function to convert std::vector to string 2018-09-10 15:17:32 +01:00
portelli 35abd05ee9 mute Version.h cache creation 2018-09-10 15:16:59 +01:00
portelli dd36e60f6a compilation fix for hypercube optimal communicator 2018-09-08 18:07:29 +01:00
portelli cb6c548e21 Hadrons: code cleaning 2018-09-07 20:40:55 +01:00
portelli 02c4ccf621 Hadrons: diskvector debug message for writes 2018-09-07 20:33:49 +01:00
portelli fd24588212 Merge branch 'develop' of github.com:paboyle/Grid into develop 2018-09-07 20:25:11 +01:00
portelli b800bb3ecb Hadrons: disk vector cache policy to last touch 2018-09-07 20:24:48 +01:00
portelli f8abd0978b Hadrons copyright update 2018-09-07 20:10:07 +01:00
portelli 12c7c493bf Hadrons: disk-based container 2018-09-07 20:04:54 +01:00
paboyle c7c9072313 Documentation 2018-09-06 16:01:42 +01:00
portelli 2bf3be5fae Hadrons: copyright and code cleaning 2018-09-04 18:25:10 +01:00
portelli 3a40e4fc69 Hadrons: scalar SU(N) 2-pt guard against negative momenta components 2018-09-04 18:24:07 +01:00
portelli 2e69e03f6f Hadrons: CosmHol configs IO module 2018-09-04 18:23:28 +01:00
portelli a09f9bb528 Hadrons: code cleaning 2018-09-04 18:22:21 +01:00
portelli f0e341d726 Hadrons: module list generator fix 2018-09-04 18:22:04 +01:00
portelli 6f09df0daf Hadrons: A2A matrix IO fix 2018-09-02 01:46:22 +01:00
portelli 26cee605b8 Hadrons: copyright update 2018-09-01 21:30:30 +01:00
portelli b3fa18c229 copyright script never removes authorship 2018-09-01 21:29:58 +01:00
portelli 2940c9bcfd Hadrons: dedicated IO class for A2A matrices 2018-09-01 21:09:01 +01:00
portelli 0bb532f72b more explicit clean git tree message 2018-09-01 20:02:18 +01:00
portelli fada2aa0f7 Hadrons: precision fix 2018-09-01 20:00:12 +01:00
portelli c193e4e675 Aslash expression in Mathematica notebook 2018-09-01 19:59:58 +01:00
portelli 3ee682f676 more Version.h fine tuning 2018-09-01 19:58:16 +01:00
portelli d85ec3bac2 build system minor fix 2018-09-01 19:54:21 +01:00
portelli b52d8eb1e3 better Version.h implementation 2018-09-01 19:49:13 +01:00
portelli ee630d2e8b Hadrons: smearing plaquette output 2018-09-01 17:38:32 +01:00
portelli 2f0af79869 Hadrons: scalar SU(N) NPR update 2018-09-01 17:36:35 +01:00
portelli 1b7fb79ec0 CI fix 2018-08-28 18:26:37 +01:00
portelli 2db1a4628c build system minor fix 2018-08-28 18:26:30 +01:00
portelli 6aa047d842 Hadrons module template fix 2018-08-28 17:17:00 +01:00
portelli 8779c32ae1 Merge branch 'feature/hadrons' into develop 2018-08-28 17:10:33 +01:00
portelli c527dc3358 CI fix 2018-08-28 17:10:08 +01:00
portelli 6b42577b6b gitignore update 2018-08-28 16:58:37 +01:00
portelli fb3596f968 Hadrons: precision fixes 2018-08-28 16:58:23 +01:00
portelli f3a0158213 code cleaning 2018-08-28 16:56:07 +01:00
portelli 0250aa9347 file committed in error 2018-08-28 16:55:48 +01:00
portelli 3df6743396 more build system cleaning and patch for bad include in Eigen 2018-08-28 16:54:57 +01:00
portelli fb7d021b9d Hadrons: moving Hadrons to root directory, build system improvements 2018-08-28 15:00:40 +01:00
portelli 5f206df775 Hadrons: meson field cache friendly cache copy 2018-08-15 17:29:44 +01:00
portelli 7727e81113 Hadrons: slight improvement on previous commit 2018-08-14 20:18:47 +01:00
portelli c4115544a5 Hadrons: application option to save graph 2018-08-14 20:03:53 +01:00
portelli 08c47328ba Hadrons: meson field kernel performance for each block 2018-08-14 17:35:42 +01:00
portelli 09001aedca Hadrons: meson fields saved in single precision 2018-08-14 17:19:38 +01:00
portelli 2c67304716 Hadrons: meson field code cleaning 2018-08-14 17:00:05 +01:00
portelli dc6d8686de Hadrons: meson field chunked HDF5 IO 2018-08-14 16:40:29 +01:00
portelli cc2780bea3 Hadrons: meson field parallel IO 2018-08-14 14:55:13 +01:00
portelli 6e5a2b7922 fix previous commit 2018-08-14 14:07:54 +01:00
portelli f4878d3a13 Hadrons: meson field threaded cache copy 2018-08-14 14:02:37 +01:00
portelli 89d2fac92e Hadrons: copyright update 2018-08-14 12:19:14 +01:00
portelli f2d3e41cf2 Hadrons: meson field: HDF5 perf, gamma input and Eigen tensors allocated by Grid 2018-08-13 20:18:33 +01:00
portelli 3c27bb36d4 Hadrons: direct timer access 2018-08-13 20:17:45 +01:00
portelli 603d59f389 Hadrons: code cleaning 2018-08-13 20:17:24 +01:00
portelli 07a0ef3f95 Hadrons: global measurement time profile 2018-08-13 16:44:57 +01:00
portelli 503259f9c9 Hadrons: meson field HDF5 IO done and tested 2018-08-12 16:52:12 +01:00
portelli 5be6a51044 Hadrons: meson fields code cleaning and momentum phases 2018-08-11 15:13:43 +01:00
portelli ac69f042b1 Hadrons: module RNG uniquely seeded with <run id> + <module name> + <trajectory> 2018-08-10 18:27:00 +01:00
portelli 133d5c2e34 Merge branch 'develop' into feature/hadrons 2018-08-10 16:36:40 +01:00
portelli 2a94244890 configure: --with-openssl option and LIME is now mandatory 2018-08-10 16:36:11 +01:00
portelli a15a2dfd29 Merge branch 'develop' into feature/hadrons 2018-08-10 16:08:22 +01:00
portelli 093bb02633 Hadrons: execute message for time diluted noise 2018-08-10 16:07:48 +01:00
portelli 99a85116f8 Hadrons: module and VM instrumentation 2018-08-10 16:07:30 +01:00
paboyle 27cdb79063 Sha used to seed from a unique string 2018-08-10 15:11:01 +01:00
portelli f4cbfd63ff Hadrons: more meson field cleaning, needs IO now 2018-08-09 18:39:58 +01:00
portelli 2b794b6aa7 Hadrons: module generating random lattices for testing purposes 2018-08-09 17:16:42 +01:00
portelli d0244a059f Hadrons: cleaning cleaning... 2018-08-09 00:38:17 +01:00
portelli dcdd891d7d Hadrons: precision fix 2018-08-09 00:13:53 +01:00
portelli 6d2df9de79 Hadrons: even more cleaning 2018-08-08 23:15:55 +01:00
portelli 41d4e37bae Hadrons: more cleaning 2018-08-08 19:04:44 +01:00
portelli ee5c0cc9b6 Hadrons: code cleaning 2018-08-08 18:45:06 +01:00
portelli 0a4020eb4d Hadrons: copyright fix 2018-08-07 18:42:52 +01:00
portelli b2de26589b Hadrons: code cleaning and copyright update 2018-08-07 18:40:48 +01:00
portelli 0677adb4dd Hadrons: overhaul of A2A for production 2018-08-07 18:27:59 +01:00
portelli 231cc95be6 Hadrons: eigenvalues precision fix 2018-08-07 18:27:19 +01:00
portelli 639f9cab82 Hadrons: schedule loading fix 2018-08-07 18:26:49 +01:00
portelli 4eac4e575e Hadrons: meson fields indentation fix 2018-08-06 12:42:25 +01:00
portelli 3f0f92cda6 Hadrons: first cleaning/integration of A2A/meson fields 2018-08-06 12:11:52 +01:00
portelli d2650e89bd Hadrons: VM exception for object type (solves infinite loop in scheduler) 2018-08-06 12:11:00 +01:00
portelli 2962123cba Hadrons: diluted noise polish 2018-08-05 01:44:37 +01:00
portelli 830168ec37 Hadrons: first try at diluted noise class (tested) 2018-08-04 12:32:58 +01:00
portelli 584c921ca0 Eigen support fix (use of Grid as a library was broken) 2018-08-03 21:07:58 +01:00
portelli 81347b4d16 gitignore update 2018-08-03 19:58:52 +01:00
portelli 2cfa0b0e6b Merge pull request #174 from fionnoh/a2a_basics
A2A basics
2018-08-03 16:32:14 +01:00
fionnoh fa5dee76b1 Included Peter's A2AMeson field and Eigen changes 2018-08-03 15:15:54 +01:00
fionnoh 8d1679c6b8 Merge branch 'feature/hadrons-a2a' of https://github.com/paboyle/Grid into a2a_basics 2018-08-03 15:12:24 +01:00
Peter Boyle 3791a38f7c Optimised the MesonField a bit more 2018-08-01 08:27:27 +01:00
Peter Boyle 142f7b0c86 Updated the A2A Meson Field module 2018-07-31 15:58:02 +01:00
fionnoh 891ad66eab Included changes to Hadrons RBPrecCG solver needed for subtraction of guess 2018-07-31 11:26:07 +01:00
Peter Boyle 60c43151c5 Merge branch 'feature/hadrons-a2a' of https://github.com/paboyle/Grid into feature/hadrons-a2a 2018-07-31 01:09:02 +01:00
paboyle e036800261 Eigen fix 2018-07-31 01:08:42 +01:00
Peter Boyle 62900def36 Merge branch 'feature/hadrons-a2a' of https://github.com/paboyle/Grid into feature/hadrons-a2a 2018-07-31 00:36:26 +01:00
paboyle e3a309a73f Eigen happiness 2018-07-31 00:35:17 +01:00
fionnoh ad6c1c0c4e The basics of what is needed in Grid and Hadrons for the A2A class and module, with none of the contraction or MF code. 2018-07-30 18:40:50 +01:00
Peter Boyle 00b92a91b5 Optimising 2018-07-28 23:46:22 +01:00
paboyle 65533741f7 7 moms 2018-07-28 16:17:47 +01:00
Peter Boyle dc0259fbda Merge pull request #173 from fionnoh/feature/hadrons-a2a
Changes to meson field benchmark. Now includes the gammas in the fina…
2018-07-27 23:03:56 +01:00
Peter Boyle 131a6785d4 Merge branch 'feature/hadrons-a2a' into feature/hadrons-a2a 2018-07-27 23:03:42 +01:00
paboyle 44f4f5c8e2 Momentum loop 2018-07-27 23:00:16 +01:00
fionnoh 2679df034f Changes to meson field benchmark. Now includes the gammas in the final part of the naive method, both methods compute
lhs^dag*Gamma*rhs (previously Gamma*lhs^dag*rhs), and checks results.
2018-07-27 18:31:10 +01:00
portelli bf71162b97 Hadrons: backtrace on abort 2018-07-26 19:20:12 +01:00
portelli 299e828d83 Merge branch 'develop' into feature/hadrons 2018-07-26 16:49:49 +01:00
portelli ef5452cddf Hadrons: smarter memory profiler 2018-07-26 16:47:45 +01:00
portelli 80de748737 Hadrons: new exceptions which can save a integer 2018-07-26 16:47:25 +01:00
paboyle 71e1006ba8 Updated meson field benchmark for dirac structures 2018-07-26 09:09:29 +01:00
portelli 00f31ae83f Merge pull request #163 from goracle/unstaged
Add printing of whether there are unstaged changes in the git hash print
2018-07-25 19:00:00 +00:00
portelli cce339deaf Merge pull request #172 from fionnoh/feature/hadrons
feature/hadrons -> feature/hadrons-a2a
2018-07-25 17:20:19 +00:00
fionnoh 24128ff109 Changes needed for MF benchmark to work with comms correctly 2018-07-23 15:51:37 +01:00
fionnoh 34e9d3f0ca Moved the creation and resizing of the v and w high modes from the A2A class to the A2A module and made them an output of the module. This means that they have to be inputs of the contration modules and they will freed from memory if they are no longer needed. 2018-07-22 14:40:31 +01:00
fionnoh c995788259 Added ImportUnphysicalFermion and included appropriate logic for 5d w vectors in A2A code 2018-07-21 00:08:11 +01:00
fionnoh 94c7198001 Added ZFIMPL to A2AMeson contraction 2018-07-20 23:08:22 +01:00
fionnoh 04d86fe9f3 Removed overly verbose print statement 2018-07-20 21:38:19 +01:00
fionnoh b78074b6a0 Removed a Dminus from high mode v and removed duplication pf D_oo code 2018-07-20 16:55:24 +01:00
fionnoh 7dfd3cdae8 Inclusion of ExportPhysicalFermionSource that fixes a bug in the low mode w vectors 2018-07-20 15:45:43 +01:00
fionnoh cecee1ef2c Merge branch 'develop' of github.com:paboyle/Grid into feature/hadrons 2018-07-20 13:37:50 +01:00
fionnoh 355d4b58be Merge branch 'feature/hadrons' of github.com:fionnoh/Grid into feature/hadrons 2018-07-19 16:07:54 +01:00
fionnoh 2c54a536f3 Moved the meson field inner product to its own header file 2018-07-19 15:56:52 +01:00
fionnoh d868a45120 Cleaned up some stuff that was erroneously included in a previous "trash" commit. Leaving in the mySliceInnerProdct function for now as it speeds up mesonfield creation quite a lot for 24^3 tests 2018-07-16 16:19:59 +01:00
fionnoh 9deae8c962 A2A meson field contraction code 2018-07-16 14:18:45 +01:00
fionnoh db86cdd7bd Possible trash commit 2018-07-10 13:30:45 +01:00
paboyle ec9939c1ba Test for faster implementation of meson field inner loop
This should be possible to cache block at outer levels, global sum across nodes not performed
and deferred to caller to block them all into a big all reduce.
Nc=3 and Fermion is hard coded in an ugly way. We might think about benchmarking whether
a product without the conjugate should be made available by Grid.

It is not clear whether the explicit unroll, or the performing of conjugate on left once
was the real source of the speed up.

Gives 70-80 GF/s on my laptop (single) half that double, and 70GB/s to cache.

This is competitive with dslash and a reasonable stopping point for the optimisation. If necessary we can revisit.
2018-07-10 12:38:51 +01:00
fionnoh f74617c124 Added ZFIMPL to meson field module 2018-07-03 14:04:53 +01:00
fionnoh 8c6a3921ed Merge remote-tracking branch 'upstream/feature/hadrons' into feature/hadrons 2018-07-03 11:35:14 +01:00
portelli a8a15dd9d0 Hadrons: code cleaning 2018-07-02 17:52:39 +01:00
portelli 3ce68a751a Hadrons: stout smearing module 2018-07-02 17:52:04 +01:00
fionnoh daa0977d01 Included a print statement that indicates that the guess is being subtracted from the solve. 2018-06-28 16:34:56 +01:00
fionnoh a2929f4384 Removed A2A contraction module and replaced it with the beginnings of a meson field module 2018-06-28 16:17:26 +01:00
fionnoh 7fe3974c0a Included eigenPacks and action as references, not inputs, of A2A module. They now now longer need to be parameters in the meson field modules. 2018-06-28 16:14:49 +01:00
fionnoh f7e86f81a0 Changes A2A class to make use of the new Solver class 2018-06-28 16:14:16 +01:00
fionnoh fecec803d9 Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons 2018-06-28 16:13:43 +01:00
fionnoh 8fe9a13cdd Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons 2018-06-28 16:13:07 +01:00
portelli d2c42e6f42 Hadrons: scaled DWF action 2018-06-26 14:59:33 +01:00
portelli 049cc518f4 Hadrons: introduction message 2 2018-06-25 19:08:39 +01:00
portelli 2e1c66897f Hadrons: introduction message 2018-06-25 19:08:22 +01:00
portelli adcef36189 Hadrons: Möbius DWF action 2018-06-25 15:58:35 +01:00
fionnoh 2f121c41c9 Commiting reation of meson field code before a merge with the upstream branch feature/hadrons 2018-06-25 12:20:46 +01:00
portelli e0ed7e300f Hadrons: spurious Dminus removed 2018-06-22 16:33:43 +02:00
portelli 485207901b Merge branch 'develop' into feature/hadrons 2018-06-22 16:15:32 +02:00
portelli c760f0a4c3 Hadrons: remove make_5D/4D functions and FreeProp fix 2018-06-22 16:12:46 +02:00
portelli c84eeedec3 Hadrons: GaugeProp module for z-Wilson actions 2018-06-22 15:53:22 +02:00
fionnoh 1ac3526f33 Small changes to the A2A header and module 2018-06-22 12:29:42 +01:00
fionnoh 0de090ee74 Temporarily added in the contraction code that produced the working 2-pt function. This is commited for reference only and will be removed in the next push. 2018-06-22 12:28:41 +01:00
portelli 91405de3f7 Hadrons: new solver exposing fermion matrix and generic source/solve import/export 2018-06-22 12:14:37 +02:00
fionnoh 8fccda301a Fixed a bug where the guess was always subtracted after the solve and included appropriate weights for the sources in the one case we're looking at now. More work needs to be done to make the 5d/4d source logic less brittle. 2018-06-21 16:36:59 +01:00
fionnoh 7a0abfac89 Restructured the class that computes and returns the A2A vectors. 2018-06-21 16:36:06 +01:00
fionnoh ae37fda699 A more elegant way to subtract guesses from solve and a bool check before verifying residual 2018-06-20 16:07:40 +01:00
fionnoh b5fc5e2030 All to all module update that hit a promising milestone. Commiting for a reference for future changes. 2018-06-20 10:59:07 +01:00
portelli 8db0ef9736 Merge pull request #168 from jch1g10/feature/qed-fvol
Feature/qed fvol
2018-06-08 20:09:06 +02:00
Guido Cossu 95d4b46446 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-06-08 11:30:29 +01:00
paboyle 5dfd216a34 Better thread safety 2018-06-04 21:08:44 +01:00
paboyle c2e8d0aa88 Solve g++ problem on the lanczos test 2018-06-04 18:34:15 +01:00
James Harrison 0fe5aeffbb Merge branch 'feature/hadrons' into feature/qed-fvol 2018-06-04 16:59:43 +01:00
James Harrison 7fbc469046 Merge branch 'develop' into feature/hadrons 2018-06-04 16:58:30 +01:00
paboyle bf96a4bdbf Merge branch 'master' into develop 2018-06-04 14:03:11 +01:00
paboyle 84685c9bc3 Overflow fix 2018-06-04 13:42:07 +01:00
fionnoh a8d4156997 Added a Hadrons module that computes the all-to-all v and w vectors 2018-05-31 17:18:58 +01:00
fionnoh c18074869b Changes to Hadrons SchurRB solver to allow for a subtract_guess boolean to be passed 2018-05-31 17:17:16 +01:00
fionnoh f4c6d39238 CHanges made to SchurRB solvers to allow for the subtraction of a guess after solve 2018-05-31 17:16:20 +01:00
portelli 200d35b38a Merge branch 'develop' into feature/hadrons 2018-05-28 11:52:47 +02:00
portelli eb52e84d09 Merge branch 'feature/hadrons' of github.com:paboyle/Grid into feature/hadrons 2018-05-28 11:50:27 +02:00
portelli 72abc34764 Merge pull request #166 from guelpers/feature/hadrons
Feature/hadrons
2018-05-28 11:49:46 +02:00
portelli e3164d4c7b Hadrons: env function to get volume in double 2018-05-28 11:39:17 +02:00
James Harrison f5db386c55 Change MODULE_REGISTER_NS -> MODULE_REGISTER in UnitEM, ScalarVP and VPCounterTerms 2018-05-22 16:16:21 +01:00
James Harrison 294ee70a7a Merge branch 'feature/hadrons' into feature/qed-fvol
# Conflicts:
#	extras/Hadrons/modules.inc
#	lib/qcd/action/gauge/Photon.h
2018-05-21 18:02:41 +01:00
Azusa Yamaguchi 013ea4e8d1 Merge branch 'feature/staggered-comms-compute' into develop 2018-05-21 13:11:56 +01:00
Azusa Yamaguchi 7fbbb31a50 Merge branch 'develop' into feature/staggered-comms-compute
Conflicts:
	lib/qcd/action/fermion/ImprovedStaggeredFermion.cc
2018-05-21 13:07:29 +01:00
Azusa Yamaguchi 0e127b1fc7 New file single prec test 2018-05-21 12:57:13 +01:00
Azusa Yamaguchi 68c028b0a6 Comment 2018-05-21 12:54:25 +01:00
portelli 255d4992e1 Hadrons: stochastic scalar SU(N) free field fix 2018-05-18 20:49:55 +01:00
portelli a0d399e5ce Hadrons: yet other attempts at EMT NPR 2018-05-18 20:49:26 +01:00
portelli fd3b2e945a Hadrons: don't right result with empty stem 2018-05-18 20:48:24 +01:00
portelli b999984501 Merge branch 'develop' into feature/hadrons 2018-05-15 13:53:57 +01:00
Guido Cossu 7836cc2d74 No checksum output on log for scidac 2018-05-15 10:10:08 +01:00
portelli a61e0df54b Travis fix for Lime 2018-05-14 19:56:12 +01:00
portelli 9d835afa35 Attempt at solving the FP exception in the QED code 2018-05-14 19:05:54 +01:00
portelli 5e3be47117 Hadrons: scalar SU(N) various fixes 2018-05-14 18:58:39 +01:00
portelli 48de706dd5 Merge branch 'develop' into feature/hadrons 2018-05-11 18:06:40 +01:00
portelli f871fb0c6d check file is opened correctly in the Lime reader 2018-05-11 18:06:28 +01:00
portelli 93771f3099 Hadrons: scalar SU(N) stochastic free field 2018-05-10 22:29:48 +01:00
portelli 8cb205725b Merge branch 'develop' into feature/hadrons 2018-05-09 23:56:35 +01:00
portelli 9ad580d82f Hadrons: format fix 2018-05-07 21:38:15 +01:00
portelli 899f961d0d Hadrons: eigenvalue metadata saved with 16 significant digits 2018-05-07 21:37:03 +01:00
portelli 54d789204f more general implementation of the precision interface for serialisers 2018-05-07 21:17:46 +01:00
portelli 25828746f3 XML precision scientific with 16 digits by default 2018-05-07 21:04:31 +01:00
portelli f362c00739 Hadrons: better handling of automatic directory creation 2018-05-07 19:43:40 +01:00
Guido Cossu 25d1cadd3b Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-05-07 18:55:09 +01:00
Guido Cossu c24d53bbd1 Further debug of RNG I/O 2018-05-07 18:55:05 +01:00
portelli 2017e4e3b4 Hadrons: more verbose directory creation error 2018-05-07 18:12:22 +01:00
portelli 27a4d4c951 Hadrons: multi-file eigenpack in separate directory 2018-05-07 17:52:54 +01:00
portelli 2f92721249 Merge branch 'develop' into feature/hadrons 2018-05-07 17:26:47 +01:00
portelli 3c7a4106ed Trap for deadly empty comm thread option 2018-05-07 17:26:39 +01:00
portelli 3252059daf Hadrons: multi-file support for eigenpacks 2018-05-07 17:25:36 +01:00
paboyle 6eed167f0c Merge branch 'release/0.8.1' 2018-05-04 17:34:11 +01:00
paboyle 4ad0df6fde Bump volume for Gerardo 2018-05-04 17:33:23 +01:00
portelli 661381e881 Merge branch 'develop' into feature/hadrons 2018-05-04 14:52:17 +01:00
Peter Boyle 68a5079f33 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-05-04 14:13:54 +01:00
Peter Boyle 8634e19f1b Update 2018-05-04 14:13:35 +01:00
Azusa Yamaguchi 9ada378e38 Add timing 2018-05-04 10:58:01 +01:00
Vera Guelpers 9d9692d439 Fix double vs float in boundary phases 2018-05-03 16:40:16 +01:00
portelli 0659ae4014 Merge branch 'develop' into feature/hadrons 2018-05-03 16:20:22 +01:00
portelli bfbf2f1fa0 no threaded stencil benchmark if OpenMP is not supported 2018-05-03 16:20:01 +01:00
portelli dd6b796a01 Hadrons: scalar SU(N) volume factor fix 2018-05-03 16:19:17 +01:00
Vera Guelpers 52a856b4a8 FreeProp module for Hadrons 2018-05-03 12:33:20 +01:00
Vera Guelpers 04190ee7f3 5D free propagator for DWF and boundary conditions for free propagators 2018-05-03 12:31:36 +01:00
Azusa Yamaguchi 587bfcc0f4 Add Timing 2018-05-03 12:10:31 +01:00
Vera Guelpers 2700992ef5 Merge remote-tracking branch 'upstream/feature/hadrons' into feature/hadrons 2018-05-03 10:01:52 +01:00
Peter Boyle 8c658de179 Compressor speed up (a little); streaming stores 2018-05-02 17:52:16 +01:00
Guido Cossu ba37d51ee9 Debugging the RNG IO 2018-05-02 15:32:06 +01:00
Azusa Yamaguchi 4f4181c54a Merge branch 'feature/staggered-comms-compute' of https://github.com/paboyle/Grid into feature/staggered-comms-compute 2018-05-02 14:59:13 +01:00
Guido Cossu 4d4ac2517b Adding Scalar field theory example for Scidac format 2018-05-02 14:36:32 +01:00
Guido Cossu e568c24d1d Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-05-02 14:29:25 +01:00
Guido Cossu b458326744 Checkpointer module update 2018-05-02 14:29:22 +01:00
Guido Cossu 6e7d5e2243 HMC: added Scidac checkpointer and support for metadata 2018-05-02 14:28:59 +01:00
Azusa Yamaguchi b35169f1dd MultiShift for Staggered 2018-05-02 14:22:37 +01:00
Azusa Yamaguchi 441ad7498d add Iterative counter 2018-05-02 14:21:30 +01:00
Peter Boyle 6f6c5c549a Split off gparity 2018-05-02 14:11:23 +01:00
Peter Boyle 1584e17b54 Revert to fast versoin 2018-05-02 14:10:55 +01:00
Peter Boyle 12982a4455 Hypercube optimisation 2018-05-02 14:10:21 +01:00
Peter Boyle 172f412102 shmget reintroduce 2018-05-02 14:07:41 +01:00
Peter Boyle a64497265d TIming 2018-05-02 14:07:28 +01:00
portelli ca639c195f Merge branch 'develop' into feature/hadrons 2018-05-01 14:07:32 +01:00
portelli edc28dcfbf Hadrons: scalar SU(N) 2-pt fix 2018-05-01 14:02:31 +01:00
Peter Boyle c45f24a1b5 Improvements for tesseract 2018-04-30 21:50:00 +01:00
Dr Peter Boyle aaf37ee4d7 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-27 11:45:13 +01:00
Dr Peter Boyle 1dddd17e3c Benchmark improvements from tesseract 2018-04-27 11:44:46 +01:00
paboyle 661f1d3e8e Merge branch 'release/0.8.0' into develop 2018-04-27 11:22:33 +01:00
paboyle edcf9b9293 Merge branch 'release/0.8.0' 2018-04-27 11:13:19 +01:00
paboyle fe6860b4dd Update with LIME library guard 2018-04-27 08:57:34 +01:00
paboyle d6406b13e1 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-27 07:52:56 +01:00
paboyle e369d7306d Rename 2018-04-27 07:51:44 +01:00
paboyle 9f8d63e104 Roll over version 2018-04-27 07:51:12 +01:00
paboyle 9b0240d101 Hot start test 2018-04-27 07:50:51 +01:00
paboyle b27f0e5a53 Control over IO 2018-04-27 07:50:15 +01:00
paboyle 75e4483407 Stronger convergence test 2018-04-27 07:49:57 +01:00
Guido Cossu 0734e9ddd4 Debugging Scatter_plane_simple 2018-04-27 14:39:01 +09:00
paboyle 809b1cdd58 Bug fix for MPI running ; introduced last night 2018-04-27 05:19:10 +01:00
paboyle 1be8089604 Clean compile 2018-04-26 23:42:45 +01:00
paboyle 3e0eff6468 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-26 23:00:46 +01:00
paboyle 7ecc47ac89 Quenched test compile 2018-04-26 23:00:28 +01:00
paboyle e9f1ac09de static 2018-04-26 23:00:08 +01:00
Peter Boyle fa0d8feff4 Performance of CovariantCshift now non-embarrassing. 2018-04-26 17:56:27 +01:00
portelli 49b8501fd4 Merge branch 'develop' into feature/hadrons 2018-04-26 17:33:50 +01:00
portelli d47484717e Hadrons: scalar SU(N) result handling improvement 2018-04-26 17:32:37 +01:00
Peter Boyle 05b44aef6b Merge branch 'develop' of https://github.com/paboyle/Grid into develop
Conflicts:
	benchmarks/Benchmark_su3.cc
2018-04-26 15:38:49 +01:00
Peter Boyle 03e9832efa Use macros for bare openmp 2018-04-26 14:50:02 +01:00
Peter Boyle 28a375d35d Force static 2018-04-26 14:49:42 +01:00
Peter Boyle 3b06381745 Guard bare openmp statemetn with ifdef 2018-04-26 14:48:57 +01:00
Peter Boyle 91a0a3f820 Improvement 2018-04-26 14:48:35 +01:00
Peter Boyle 8f44c799a6 Saving the benchmarking tests for Cshift 2018-04-26 14:48:03 +01:00
Azusa Yamaguchi 96272f3841 Merge staggered fix linear operator and reduction 2018-04-26 10:33:19 +01:00
Azusa Yamaguchi 5c936d88a0 Merge branch 'feature/staggered-comms-compute' of https://github.com/paboyle/Grid into feature/staggered-comms-compute 2018-04-26 10:18:37 +01:00
Azusa Yamaguchi 1c64ee926e Faster staggered operator with m^2 term trivial used 2018-04-26 10:17:49 +01:00
Azusa Yamaguchi 2cbb72a81c Provide info if EE term is trivial (m^2 factor)
Better timing in staggered 4d case
2018-04-26 10:10:07 +01:00
Azusa Yamaguchi 31d83ee046 Enable special treatment of constEE cases 2018-04-26 10:08:46 +01:00
Azusa Yamaguchi a9e8758a01 Improvements to staggered tests timings 2018-04-26 10:08:05 +01:00
Azusa Yamaguchi 3e125c5b61 Faster linalg on CG optimised against staggered
Sum overhead is bigger for staggered
2018-04-26 10:07:19 +01:00
Azusa Yamaguchi eac6ec4b5e Faster reductions, important on single node staggered 2018-04-26 10:03:57 +01:00
Azusa Yamaguchi 213f8db6a2 Microsecond resultion 2018-04-26 10:01:39 +01:00
Guido Cossu 6358f35b7e Debug of previous commit 2018-04-26 14:18:11 +09:00
Guido Cossu 43f5a0df50 More timers in the integrator 2018-04-26 12:01:56 +09:00
Guido Cossu c897878776 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-26 11:31:57 +09:00
portelli cc6eb51e3e Hadrons: macro refactoring for library portability 2018-04-25 16:49:14 +01:00
Vera Guelpers 507009089b Merge remote-tracking branch 'upstream/feature/hadrons' into feature/hadrons 2018-04-25 09:36:39 +01:00
paboyle 2baf193031 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-25 00:14:03 +01:00
paboyle 362ba0443a Cshift updates 2018-04-25 00:12:11 +01:00
paboyle 276a2353df Move constructor 2018-04-25 00:11:07 +01:00
portelli b234784c8e Hadrons: scalar SU(N) takes operator pairs now 2018-04-24 19:52:12 +01:00
portelli 6ea2a8b7ca Hadrons: scheduler shows starting value 2018-04-24 19:51:47 +01:00
portelli c1d0359aaa Hadrons: scalar SU(N) kinetic term saves trace 2018-04-24 19:51:22 +01:00
portelli 047ee4ad0b Hadrons: scalar SU(N) cleanup 2018-04-24 19:50:58 +01:00
portelli a13106da0c Hadrons: scalar SU(N) gradient 2018-04-24 19:50:30 +01:00
portelli 75113e6523 Hadrons: Scalar SU(N) variable name update 2018-04-24 19:49:27 +01:00
portelli 325c73d051 Hadrons: module template update 2018-04-24 19:48:54 +01:00
portelli b25a59e95e Hadrons: mitigation of GCC/Intel compiler bug not generating defaulted destructors 2018-04-24 17:20:25 +01:00
Guido Cossu c5b9147b53 Correction of a minor bug in the su3 benchmark 2018-04-24 08:03:57 -07:00
Guido Cossu 64ac815fd9 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-24 17:27:38 +09:00
Guido Cossu a1be533329 Corrected Flop count in Benchmark su3 and expanded the Wilson flow output 2018-04-24 01:19:53 -07:00
portelli 7c4533797f Hadrons: scalar SU(N) EMT improvement term optional 2018-04-23 22:46:39 +01:00
portelli af84fd65bb Hadrons: missing dependency message improvement 2018-04-23 22:46:17 +01:00
Dan H 1a2613086a Fix print message. 2018-04-23 15:42:12 -04:00
Dan H 4f110c09a5 Add printing of whether there are unstaged changes in the git hash print. 2018-04-23 15:38:23 -04:00
portelli 6764362237 Hadrons: automatic directory creation fix 2018-04-23 18:45:39 +01:00
portelli 2fa2b0e0b1 Hadrons: Application header does not include all the modules 2018-04-23 17:57:17 +01:00
portelli b61292f735 Hadrons: recursive mkdir function 2018-04-23 17:36:43 +01:00
portelli ce7720e221 Hadrons: copyright update 2018-04-23 17:36:20 +01:00
portelli 853a5528dc Hadrons: template modules compilation optimisation 2018-04-23 17:35:01 +01:00
portelli 169f405c9c Hadrons: tests repaired 2018-04-23 12:48:34 +01:00
portelli c6125b01ce Hadrons: Error and Warning channels always on 2018-04-23 12:48:17 +01:00
portelli b0b5b34bff Hadrons: custom abort with module trace 2018-04-23 12:48:00 +01:00
portelli 1c9722357d Merge branch 'develop' into feature/hadrons
# Conflicts:
#	lib/qcd/action/fermion/FermionOperator.h
2018-04-20 17:15:21 +01:00
portelli 141da3ae71 function to get tensor dimensions 2018-04-20 17:13:34 +01:00
portelli 94edf9cf8b HDF5: direct access to group for custom operations 2018-04-20 17:13:21 +01:00
portelli c11a3ca0a7 vectorise/unvectorise in reverse order 2018-04-20 17:13:04 +01:00
paboyle 870b1a85ae Think I have the physical prop interface to CF and PF overlap right, but need a strong check/regression.
Only support Hw overlap, not Ht for now. Ht needs a new Dminus implemented.
2018-04-18 14:17:49 +01:00
paboyle b5510427f9 physical fermion interface, cshift benchmark in SU3. 2018-04-18 01:43:29 +01:00
Guido Cossu 26ed65c8f8 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-17 12:03:32 +01:00
paboyle f7f043d8cf Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-04-17 10:57:18 +01:00
paboyle ddcaa6ad29 Master does header on Nersc 2018-04-17 10:48:33 +01:00
portelli 334da7f452 Hadrons: can trace which module is throwing an error 2018-04-13 18:45:31 +02:00
portelli 4669ecd4ba Hadrons: build improvement 2018-04-13 18:21:18 +02:00
portelli 4573b34cac Hadrons: scalar SU(N) 2-pt functions with momentum 2018-04-13 18:21:00 +02:00
portelli 17f57e85d1 Merge branch 'develop' into feature/hadrons 2018-04-06 22:53:11 +01:00
portelli c8d4d184ee XML push fragment fix 2018-04-06 22:53:01 +01:00
portelli 17f27b1ebd Hadrons: eigenpack writer fix 2018-04-06 22:52:11 +01:00
portelli a16bbecb8a Hadrons: more feedback 2018-04-06 19:38:20 +01:00
portelli 7c9b0dd842 Hadrons: top level name for eigenpack metadata 2018-04-06 19:32:22 +01:00
portelli 6b7228b3e6 Hadrons: better metadata for eigenpack 2018-04-06 19:29:53 +01:00
portelli f117552334 post-merge fix 2018-04-06 18:38:46 +01:00
portelli a21a160029 Merge branch 'develop' into feature/hadrons
# Conflicts:
#	lib/serialisation/XmlIO.cc
2018-04-06 18:34:19 +01:00
portelli 1569a374a9 XML interface polish, XML fragments can be pushed into a writer 2018-04-06 18:32:14 +01:00
portelli eddf023b8a pugixml 1.9 update 2018-04-06 16:17:22 +01:00
portelli 6b8ffbe735 Hadrons: genetic minimum value type fix 2018-04-06 15:41:31 +01:00
portelli 81050535a5 Hadrons: truncate eigenvalues when loading partial eigenpack 2018-04-06 13:48:58 +01:00
portelli 7dcf5c90e3 Hadrons: eigenpack must be referred by solver when used 2018-04-06 13:16:28 +01:00
portelli 9ce00f26f9 not special characters in std::vector operator<< 2018-04-04 17:44:56 +01:00
portelli 85c253ed4a Test_serialisation MPI fix 2018-04-04 17:19:34 +01:00
portelli ccfc0a5a89 Hadrons: better string representation of module parameters 2018-04-04 17:19:22 +01:00
portelli d3f857b1c9 Hadrons: proper metadata for eigenpacks 2018-04-04 16:36:37 +01:00
portelli fb62035aa0 Hadrons: do not create RB coarse grids 2018-04-03 19:49:11 +01:00
portelli 0260bc7705 Hadrons: eigen pack writing only for boss node 2018-04-03 18:55:46 +01:00
portelli 68e6a58f12 Hadrons: several Lanczos fixes and improvements 2018-04-03 17:42:21 +01:00
portelli 640515e3d8 Merge branch 'develop' into feature/hadrons 2018-03-30 17:43:49 +01:00
paboyle f089bf5629 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-03-30 16:17:26 +01:00
paboyle 276f113f28 IO uses master boss node for metadata. 2018-03-30 16:17:05 +01:00
portelli 97c579f637 Merge branch 'develop' into feature/hadrons 2018-03-30 16:04:44 +01:00
portelli a13c109111 deterministic initialisation of field metadata 2018-03-30 16:03:01 +01:00
paboyle ab6afd18ac Still compile if no LIME 2018-03-30 13:39:20 +01:00
paboyle 5bde64d48b Barrier required in parallel when we use ftell 2018-03-30 12:41:30 +01:00
paboyle 2f5add4d5f Creation of file 2018-03-30 12:30:58 +01:00
portelli c5a885dcd6 I/O benchmark 2018-03-29 19:57:41 +01:00
portelli a4d8512fb8 Revert "Lattice serialisation, just HDF5 for the moment"
This reverts commit 8a0cf0194f.
2018-03-27 17:55:42 +01:00
portelli 5ec903044d Serial IO code cleaning for std:: convention 2018-03-27 17:11:50 +01:00
portelli 8a0cf0194f Lattice serialisation, just HDF5 for the moment 2018-03-26 19:16:16 +01:00
portelli 1c680d4b7a Merge branch 'develop' into feature/hadrons 2018-03-26 13:52:44 +01:00
Guido Cossu c9c073eee4 Changes in messages in test dwf mixedprec 2018-03-23 11:27:56 +00:00
Guido Cossu f290b2e908 Fix to pass CI tests 2018-03-23 11:14:23 +00:00
Guido Cossu 5f8225461b Fencing mixedcg test propagator write. LIME is still optional in Grid 2018-03-23 10:37:58 +00:00
portelli e9323460c7 Merge branch 'develop' into feature/hadrons 2018-03-22 10:48:37 +00:00
portelli 20e186a1e0 Merge pull request #158 from goracle/dev-pull
Make compilation faster by moving print of git hash.
2018-03-22 10:45:17 +00:00
Peter Boyle 6ef4af989b Merge pull request #159 from goracle/dev-precsafe
Add dimension check to precisionChange.
2018-03-22 10:41:53 +00:00
Dan H ccde8b817f Add dimension check to precisionChange. 2018-03-21 20:58:04 -04:00
Dan H 68168bf72d Revert "Add dimension match check to precisionChange."
This reverts commit 8f601d9b39.
2018-03-21 20:51:38 -04:00
Dan H e93d0feaa7 Merge branch 'dev-pull' of github.com:goracle/Grid into dev-pull 2018-03-21 20:39:30 -04:00
Dan H 8f601d9b39 Add dimension match check to precisionChange. 2018-03-21 20:38:19 -04:00
paboyle 5436308e4a Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-03-21 14:26:29 +00:00
paboyle 07fe7d0cbe Save file in current dir; print checksums 2018-03-21 14:26:04 +00:00
Guido Cossu 60b57706c4 Small bug fix in the shm file names 2018-03-21 13:57:30 +00:00
James Harrison 58c2f60b69 Merge branch 'feature/hadrons' into feature/qed-fvol 2018-03-20 20:19:18 +00:00
James Harrison bfa3a7b3b0 Merge branch 'feature/hadrons' into feature/qed-fvol
# Conflicts:
#	extras/Hadrons/Modules.hpp
#	extras/Hadrons/Modules/MGauge/StochEm.cc
#	extras/Hadrons/modules.inc
2018-03-20 20:17:59 +00:00
paboyle 954e38bebe Put a username in the path 2018-03-20 18:16:15 +00:00
paboyle b1a38bde7a Extra test for Gparity with plaquette action 2018-03-20 18:01:32 +00:00
Guido Cossu 2581875edc Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-03-19 18:00:08 +00:00
Guido Cossu f212b0a963 Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons 2018-03-19 17:57:13 +00:00
Guido Cossu 62702dbcb8 Fixing bug in the Point sink causing NaNs 2018-03-19 17:56:53 +00:00
portelli 41d6cab033 Merge branch 'develop' into feature/hadrons 2018-03-19 13:30:21 +00:00
portelli 5a31e747c9 Merge commit 'd5ce66f6ab2c44a12def7b6d26df80d6e646b1fb' into feature/hadrons 2018-03-19 13:19:09 +00:00
portelli cbc73a3fd1 Hadrons: CG guesser fix 2018-03-19 13:11:38 +00:00
Peter Boyle 6c6d43eb4e Drop RB on coarse space ; that was a mistake 2018-03-17 09:35:01 +00:00
Peter Boyle e1dcfd3553 typo fix 2018-03-16 23:10:47 +00:00
Peter Boyle 888838473a 4GB clean the offsets in parallel IO for multifile records 2018-03-16 21:54:56 +00:00
Peter Boyle 01568b0e62 Add a new SHM option 2018-03-16 21:54:28 +00:00
Peter Boyle d5ce66f6ab Extra SHM option 2018-03-16 21:37:03 +00:00
Guido Cossu d86936a3de Eliminating deprecated lex_sites 2018-03-16 12:26:39 +00:00
portelli d516938707 Hadrons: eigen packs I/O and deflation interface 2018-03-14 14:55:47 +00:00
portelli 72344d1418 Hadrons: change default Schur convention to DiagTwo 2018-03-13 17:10:54 +00:00
portelli 7ecf6ab38b Merge branch 'develop' into feature/hadrons 2018-03-13 16:11:59 +00:00
portelli 2d4d70d3ec Hadrons: LCL fixes 2018-03-13 16:10:36 +00:00
portelli 78f8d47528 Hadrons: environment access to derived objects 2018-03-13 16:10:16 +00:00
portelli b85f987b0b Hadrons: error message channel verbose during profiling 2018-03-13 16:09:22 +00:00
portelli f57afe2079 Hadrons: much cleaner eigenpack implementation, to be tested 2018-03-13 13:51:09 +00:00
Dan H 0fb84fa34b Make compilation faster by moving print of git hash. 2018-03-12 17:03:48 -04:00
Vera Guelpers 8462bbfe63 Gamma input for meson contraction with round brackets 2018-03-12 18:02:12 +00:00
portelli 229977c955 Hadrons: minor memory fix for ShiftProbe module 2018-03-09 21:56:27 +00:00
portelli e485a07133 Hadrons: garbage collector debug output 2018-03-09 21:56:01 +00:00
paboyle 0880747edb Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-03-09 20:44:42 +00:00
paboyle b801e1fcd6 fclose should be called through a call to close() 2018-03-09 20:44:10 +00:00
portelli 70ec2faa98 Hadrons: maximum iteration specified for tests and error if 0 2018-03-09 19:53:55 +00:00
portelli 2f849ee252 declaration fix 2018-03-08 23:34:00 +00:00
portelli bb6ed44339 Merge branch 'develop' into feature/hadrons 2018-03-08 23:09:28 +00:00
portelli 360cface33 Grid tensor serialisation fully implemented and tested 2018-03-08 19:12:03 +00:00
Azusa Yamaguchi 80302e95a8 MILC Interface 2018-03-08 15:34:03 +00:00
portelli caf2f6b274 Merge branch 'develop' of github.com:paboyle/Grid into develop 2018-03-08 09:52:25 +00:00
portelli c49be8988b Grid tensor serialisation 2018-03-08 09:51:22 +00:00
portelli 971c2379bd std::vector to tensor conversion + test units 2018-03-08 09:50:39 +00:00
Guido Cossu 94b0d66e4c Merge pull request #157 from goracle/dev-pull
Add print of the current git hash on Grid init.
2018-03-08 16:09:28 +09:00
Dan H 5e8af396fd Add print of the current git hash on Grid init. 2018-03-07 13:11:51 -05:00
portelli 9942723189 Merge branch 'develop' into feature/hadrons
# Conflicts:
#	lib/serialisation/BaseIO.h
2018-03-07 15:22:16 +00:00
portelli a7d19dbb64 Merge branch 'develop' of github.com:paboyle/Grid into develop
# Conflicts:
#	lib/serialisation/BaseIO.h
2018-03-07 15:13:54 +00:00
portelli 90dbe03e17 Conversion of Grid tensors to std::vector made more elegant, also pair syntax changed to (x y) to avoid issues with JSON/XML 2018-03-07 15:12:32 +00:00
portelli 8b14096990 Conversion of Grid tensors to std::vector made more elegant, also pair syntax changed to (x y) to avoid issues with JSON/XML 2018-03-07 15:12:18 +00:00
Azusa Yamaguchi b938202081 Overlapped Comm for Wilson DhopInternal 2018-03-07 14:08:43 +00:00
portelli e79ef469ac Merge branch 'develop' into feature/hadrons
# Conflicts:
#	lib/serialisation/BaseIO.h
2018-03-06 19:25:51 +00:00
portelli 485c5db0fe conversion of Grid tensors to nested std::vector in preparation for tensor serialisation 2018-03-06 19:22:03 +00:00
James Harrison c793947209 Add overloaded Photon constructors, with default parameters for IR improvements and infinite-volume G(x=0). 2018-03-06 16:27:26 +00:00
portelli 3e9ee053a1 Merge branch 'develop' into feature/hadrons 2018-03-05 20:01:38 +00:00
portelli dda6c69d5b Hadrons: scalar SU(N) shift probes 2018-03-05 20:00:29 +00:00
portelli cd51b9af99 Torture yourself with namespace lookup 101 2018-03-05 19:58:13 +00:00
paboyle c399c2b44d Guido broke the charge conjugate plaquette action with premature optimisation.
This sector of the code does not matter for anything other than Guido's quenched HMC
studies, and any plaq specific optimisations should be retained in a private branch
instead of destroying the code simplicity.
2018-03-05 12:55:41 +00:00
paboyle af7de7a294 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-03-05 12:22:41 +00:00
paboyle 1dc86efd26 Finalize protection 2018-03-05 12:22:18 +00:00
portelli f32555dcc5 Merge branch 'develop' into feature/hadrons 2018-03-03 15:31:52 +00:00
portelli 30391cb2eb Merge pull request #155 from fionnoh/develop
Some changes needed for deflation interface
2018-03-03 13:43:59 +00:00
portelli e93c883470 Hadrons: basic GraphViz visualisation 2018-03-03 13:42:36 +00:00
Fionn O hOgain 2e88408f5c Some changes needed for deflation interface 2018-03-02 22:27:41 +00:00
portelli fcac5c0772 Hadrons: scalar SU(N) fixes 2018-03-02 19:20:23 +00:00
portelli 90f4000935 Hadrons: scheduler debug less verbose 2018-03-02 19:20:01 +00:00
portelli 480708b9a0 Hadrons: safer error handling for HadronsXmlRun 2018-03-02 19:19:37 +00:00
portelli c4baf876d4 Hadrons: graph consistency check 2018-03-02 18:40:18 +00:00
portelli 2f4dac3531 Hadrons: legal update 2018-03-02 18:10:58 +00:00
portelli 3ec6890850 Merge branch 'feature/hadrons' of github.com:paboyle/Grid into feature/hadrons 2018-03-02 17:56:08 +00:00
portelli 018801d973 Hadrons: legal update 2018-03-02 17:56:00 +00:00
portelli 1d83521daa Hadrons: scalar SU(N) EMT 2018-03-02 17:55:18 +00:00
portelli fc5670c6a4 Merge pull request #151 from guelpers/feature/hadrons
Feature/hadrons
2018-03-02 17:54:43 +00:00
portelli d9c435e282 Hadrons: Scalar SU(N) transverse projection module 2018-03-02 17:35:12 +00:00
portelli 614a0e8277 Hadrons: Scalar SU(N) utility functions 2018-03-02 17:34:23 +00:00
Vera Guelpers aaf39222c3 update my fork and fixed conflicts 2018-03-02 17:08:08 +00:00
portelli 550142bd6a Hadrons: more code cleaning 2018-03-02 14:30:45 +00:00
portelli c0a929aef7 Hadrons: code cleaning 2018-03-02 14:29:54 +00:00
portelli 37fe944224 Hadrons: scalar kinetic term 2018-03-02 14:14:11 +00:00
Vera Guelpers 315a42843f changes requested for the pull request 2018-03-02 11:47:38 +00:00
portelli 83a101db83 Hadrons: more LCL fixes 2018-03-02 11:05:02 +00:00
portelli c4274e1660 Hadrons: LCL cleaning 2018-03-02 10:18:33 +00:00
portelli ba6db55cb0 Hadrons: reverse last commit 2018-03-01 23:30:58 +00:00
portelli e5ea84d531 Hadrons: LCL: orthogonalise coarse evec 2018-03-01 19:33:11 +00:00
portelli 15767a1491 Hadrons: LCL fine convergence test 2018-03-01 18:04:08 +00:00
portelli 4d2a32ae7a Hadrons: z-Mobius message fix 2018-03-01 18:03:44 +00:00
portelli 5b937e3644 Hadrons: VM memory profiling fix 2018-03-01 17:28:38 +00:00
portelli e418b044f7 Hadrons: code cleaning 2018-03-01 12:57:28 +00:00
portelli b8b05f143f Hadrons: Lanczos more conservative type names 2018-03-01 12:53:16 +00:00
portelli 6ec42b4b82 LCL: external storage fix 2018-03-01 12:27:29 +00:00
portelli abb7d4d2f5 Hadrons: z-Mobius action 2018-02-27 19:32:19 +00:00
portelli 16ebbfff29 Hadrons: Schur convention globally defined through a macro 2018-02-27 18:45:23 +00:00
portelli 4828226095 Hadrons: prettier log 2018-02-27 14:43:51 +00:00
portelli 8a049f27b8 Hadrons: Lanczos code improvement 2018-02-27 13:46:59 +00:00
portelli 43578a3eb4 Hadrons: copyright update 2018-02-26 19:24:19 +00:00
portelli fdbd42e542 Hadrons: first implementation of local coherence Lanczos 2018-02-26 19:22:43 +00:00
portelli e7e4cee4f3 Merge branch 'develop' into feature/hadrons 2018-02-26 15:05:05 +00:00
James Harrison ec3954ff5f QedFVol: Add input parameter G(x=0) for infinite-volume photon 2018-02-23 14:53:05 +00:00
Azusa Yamaguchi 0f468e2179 OverlappedComm for Staggered 5D and 4D. 2018-02-22 12:50:09 +00:00
James Harrison 8e61286741 Merge branch 'develop' into feature/qed-fvol 2018-02-20 15:33:35 +00:00
paboyle 4790e99817 Extra communicator free that I had missed.
Hard to audit them all as this is complex
2018-02-20 15:12:31 +00:00
paboyle 2dd63aa7a4 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-02-20 14:29:26 +00:00
paboyle 559a501140 Deflation interface for solvers 2018-02-20 14:29:08 +00:00
paboyle 945684c470 updates for deflation in the RB solver 2018-02-20 14:28:38 +00:00
Christopher Kelly e30a80a234 Relaxed constraints on MPI thread mode when not using multiple comms threads 2018-02-15 17:13:36 +00:00
James Harrison 69e4ecc1d2 QedFVol: Fix single precision build error 2018-02-14 17:37:18 +00:00
James Harrison 5f483df16b Merge branch 'develop' into feature/qed-fvol 2018-02-14 16:35:04 +00:00
James Harrison 4680a977c3 QedFVol: set infinite-volume photon propagator to 1 at x=0,
so that momentum-spage photon propagator is non-negative.
Need to check whether this is sufficient for all volumes.
2018-02-14 16:30:09 +00:00
Vera Guelpers de42456171 updated my fork and conflicts fixed 2018-02-14 13:57:56 +00:00
Vera Guelpers d55212c998 restructure SeqConservedCurrent for DWF to need less memory 2018-02-14 10:45:18 +00:00
paboyle c96483e3bd Whitespace only change 2018-02-13 11:39:07 +00:00
Vera Guelpers c6e1f64573 Test for QED 2018-02-13 09:30:23 +00:00
paboyle ae31a6a760 Move deflate to right class 2018-02-13 02:11:37 +00:00
paboyle dd8f2a64fe INterface to suit hadrons on Lanczos 2018-02-13 02:08:49 +00:00
James Harrison 724cf02d4a QedFVol: Implement infinite-volume photon 2018-02-12 17:18:10 +00:00
paboyle 7b8b2731e7 Conj error for complex coeffs 2018-02-12 16:06:31 +00:00
paboyle 237a8ec918 Communicator leak fixed (I think) 2018-02-12 13:27:20 +00:00
Vera Guelpers 49a0ae73eb Insertion of photon field in seqential conserved current 2018-02-12 09:36:08 +00:00
James Harrison 315f1146cd QedFVol: Fix output of VPCounterTerms module. 2018-02-08 20:40:45 +00:00
James Harrison 9f202782c5 QedFVol: Change format of scalar VP output files, and save diagrams without charge factors for consistency with ChargedProp module. 2018-02-07 20:31:50 +00:00
James Harrison 594a262dcc QedFVol: Remove redundant file Communicator_mpi.cc 2018-02-07 11:37:01 +00:00
James Harrison 7f8ca54285 Merge branch 'develop' into feature/qed-fvol 2018-02-07 10:11:00 +00:00
James Harrison c5b23c367e QedFVol: Fix segmentation fault when multiple propagator modules are used. 2018-02-05 11:46:33 +00:00
Vera Guelpers b6fe03eb26 BugFix: Now the stochatic EM potential weight is generated when calling for the first time 2018-02-02 15:29:38 +00:00
James Harrison f37ed4958b Implement IR improvement, with coefficients set in input file. 2018-02-02 11:56:51 +00:00
Peter Boyle 896f3a8002 Fix to MPI for Hokusai system 2018-02-01 18:51:51 +00:00
James Harrison 5f85473d6b QedFVol: Move Projection class into Result class 2018-02-01 16:16:13 +00:00
James Harrison ac3b0ebc58 QedFVol: New structure for ChargedProp output files 2018-02-01 12:31:32 +00:00
Guido Cossu f0fcdf75b5 Update README.md 2018-01-30 12:44:20 +01:00
Guido Cossu 53bffb83d4 Updating README with new SKL target 2018-01-30 12:42:36 +01:00
Guido Cossu cd44e851f1 Fixing compilation error in FundtoHirep 2018-01-30 06:04:30 +01:00
Guido Cossu fb24e3a7d2 Adding utilities for perf profiling 2018-01-29 11:11:45 +01:00
Guido Cossu 655a69259a Added support for GCC compilation for Skylake AVX512 2018-01-28 17:02:46 +01:00
James Harrison 4e0cf0cc28 QedFVol: Fix bug in ScalarVP.cc due to double use of temporary object. Still getting mpi3 errors when configured with enable-comms=mpi[-auto]. 2018-01-27 15:15:25 +00:00
Guido Cossu 507c4e9efc Correcting an missing semicolumn in avx512 2018-01-27 10:59:55 +01:00
James Harrison cdf550845f QedFVol: Fix bugs in StochEm.cc and ChargedProp.cc (still only works without MPI). 2018-01-26 21:25:20 +00:00
James Harrison 3db7a5387b BROKEN: Adapted scalarVP, UnitEm and VPCounterTerms modules to new Hadrons. Currently getting an assertion error from Communicator_mpi3.cc when I try to run. 2018-01-26 16:33:48 +00:00
Guido Cossu f8a5194c70 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-01-25 13:46:37 +01:00
Guido Cossu cff3bae155 Adding support for general Nc in the benchmark outputs 2018-01-25 13:46:31 +01:00
James Harrison 90dffc73c8 Merge branch 'feature/hadrons' into feature/qed-fvol
# Conflicts:
#	extras/Hadrons/Modules.hpp
#	extras/Hadrons/Modules/MGauge/StochEm.cc
#	extras/Hadrons/Modules/MScalar/ChargedProp.cc
#	extras/Hadrons/Modules/MScalar/ChargedProp.hpp
#	extras/Hadrons/modules.inc
#	lib/communicator/Communicator_mpi.cc
2018-01-24 16:41:44 +00:00
portelli a1151fc734 Hadrons: MPI-safe serial IO 2018-01-23 17:26:50 +00:00
James Harrison ab3baeb38f Implement contractions and data output in functions; calculate diagrams S, X and 4C separately; output 2E and 2T instead of sunset_shifted, sunset_unshifted, tadpole_shifted, tadpole_unshifted; add comments. 2018-01-23 17:07:45 +00:00
Vera Guelpers 389731d373 changed SeqConservedSummed.hpp to work with new hadrons interface 2018-01-23 10:11:33 +00:00
portelli 6e3ce7423e Hadrons: don't display module list at startup (too long) 2018-01-22 20:04:05 +00:00
portelli 15f15a7cfd Merge branch 'develop' into feature/hadrons
# Conflicts:
#	extras/Hadrons/Modules.hpp
#	extras/Hadrons/modules.inc
2018-01-22 20:03:36 +00:00
portelli 0e5f626226 Hadrons: module for scalar operator divergence 2018-01-22 19:38:19 +00:00
Azusa Yamaguchi 97b9c6f03d No option for interior/exterior split of asm kernels since different directions get interleaved 2018-01-22 11:04:19 +00:00
Azusa Yamaguchi 63982819c6 No option to overlap comms and compute for asm implementation since different directions are interleaved
in the kernels, introducing if else structure would be too painful
2018-01-22 11:03:39 +00:00
Vera Guelpers 6fec507bef merged new hadrons interface 2018-01-22 10:09:20 +00:00
James Harrison 219b3bd34f Remove freeVpTensor object 2018-01-19 17:14:11 +00:00
Guido Cossu b00d2d2c39 Correction of Representations compilation and small compilation error for Intel 17 2018-01-17 13:46:12 +00:00
Guido Cossu f1b3e21830 Merge branch 'feature/clover' into develop 2018-01-17 10:07:42 +00:00
Guido Cossu b7f8c5b823 Modify test to merge with the new Lanczos interface 2018-01-12 14:38:27 +00:00
Guido Cossu 3923683e9b Updating the feature/clover branch with the newest Hadron package 2018-01-12 13:35:51 +00:00
Guido Cossu e199fda9dc Merge pull request #136 from pretidav/feature/clover
Feature/clover
2018-01-12 11:57:08 +00:00
portelli 7bb405e790 Merge branch 'develop' into feature/hadrons
# Conflicts:
#	lib/communicator/Communicator_mpi3_leader.cc
#	lib/communicator/Communicator_shmem.cc
2018-01-11 18:50:15 +00:00
portelli ec16eacc6a Hadrons: scalar SU(N) 2-pt function 2018-01-10 22:12:21 +00:00
pretidav cf858deb16 Lanczos with 2 reps fixed (tobe tested) 2018-01-10 18:43:02 +01:00
David Preti a3affac963 SU3 restored + output filename for mesons and baryons fixed. 2018-01-10 14:56:54 +01:00
portelli d9d1f43ba2 Hadrons: code cleaning 2018-01-10 11:31:24 +00:00
portelli b7cd721308 Hadrons: scalar SU(N) tr(mag^n) 2018-01-10 11:25:59 +00:00
portelli 29f026c375 Hadrons: scalar SU(N) tr(phi^n) 1-pt function 2018-01-10 11:01:03 +00:00
portelli 58c7a13d54 Hadrons: result file macro with trajectory number 2018-01-10 10:59:58 +00:00
Azusa Yamaguchi 24162c9ead Staggered overlap comms comput 2018-01-09 13:02:52 +00:00
paboyle e564d11687 Allow resize of the shared memory buffers 2018-01-08 15:20:26 +00:00
paboyle 0b2162f375 Clean up 2018-01-08 14:06:53 +00:00
paboyle 5610570182 Synthetic test of lanczos 2018-01-08 11:36:39 +00:00
paboyle 44f65526e0 Simplify communicators 2018-01-08 11:35:43 +00:00
paboyle 43e48542ab Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2018-01-08 11:34:45 +00:00
paboyle 0b85f1bfc8 Simplify the communicator proliferation: mpi and none. 2018-01-08 11:33:47 +00:00
paboyle 9947cfbf14 Simplify number of communicator cases 2018-01-08 11:33:01 +00:00
paboyle 357badce5e Simplify communicator case proliferation 2018-01-08 11:32:16 +00:00
paboyle 0091eec23a Simplify communicator cases 2018-01-08 11:31:32 +00:00
paboyle 9e9c2962df Simplify comms layer proliferation 2018-01-08 11:30:22 +00:00
paboyle bda97212a9 Simplify proliferation of comms layers 2018-01-08 11:29:20 +00:00
paboyle b91282ad46 Simplify comms layer proliferation 2018-01-08 11:28:52 +00:00
paboyle 0a68470f9a Simplify comms layers 2018-01-08 11:28:30 +00:00
paboyle 6ecf280723 Simplify comms layer proliferation 2018-01-08 11:28:04 +00:00
paboyle 7eeab7f995 Simplify comms layers 2018-01-08 11:27:43 +00:00
paboyle 9b32d51cd1 Simplify comms layer proliferatoin 2018-01-08 11:27:14 +00:00
paboyle 7b3ed160aa Rationalise MPI options 2018-01-08 11:26:48 +00:00
paboyle 1a0163f45c Updated to do list 2018-01-08 11:26:11 +00:00
David Preti 9028e278e4 Trying to fix a bug with SU4 mesons (still under investigation) 2018-01-06 15:57:38 +01:00
portelli dd62f2f371 Hadrons: log message fix 2017-12-29 16:58:44 +01:00
portelli 0d612039ed Hadrons: prettier Grid logging (non-intrusive) 2017-12-29 16:58:23 +01:00
portelli e8ac75055c Hadrons: binary configuration loader 2017-12-27 14:24:29 +01:00
portelli 8b30c5956c Hadrons: copyright update 2017-12-26 14:16:47 +01:00
portelli 185da83454 Hadrons: new MIO module namespace, NERSC loader moved there 2017-12-26 14:05:17 +01:00
portelli 6718fa8c4f Merge branch 'feature/scalar_adjointFT' into feature/hadrons 2017-12-26 12:59:33 +01:00
pretidav 4ce63af7d5 Working on Hadrons with Hirep. (QCD is set for SU4) 2017-12-22 19:02:07 +01:00
Vera Guelpers 935cd1e173 conserved current insertion summed over Lorentzindex 2017-12-22 11:38:45 +00:00
Vera Guelpers 55e39df30f tadpole insertion for DWF 2017-12-22 11:36:31 +00:00
portelli 67c3fa0f5f Hadrons: all modules are now ported, more tests need to be done 2017-12-21 11:39:07 +00:00
portelli 65d4f17976 Hadrons: no errors when trying to recreate a cache 2017-12-19 20:28:32 +00:00
portelli e2fe97277b Hadrons: getReference use is rare, empty by default 2017-12-19 20:28:04 +00:00
Guido Cossu 84f9c37ed4 Merge branch 'feature/scalar_adjointFT' of https://github.com/paboyle/Grid into feature/scalar_adjointFT 2017-12-19 15:43:55 +00:00
portelli bcf6f3890c Hadrons: more fixes after test 2017-12-14 21:14:10 +00:00
portelli 591a38c487 Hadrons: VM fixes 2017-12-14 19:42:16 +00:00
James Harrison 581be32ed2 Implement infrared improvement for v=0 on-shell self-energy 2017-12-14 13:42:41 +00:00
portelli 842754bea9 Hadrons: most modules ported to the new interface, compiles but untested 2017-12-13 19:41:41 +00:00
James Harrison 6bc136b1d0 Add module for calculating diagrams required for HVP counter-terms 2017-12-13 17:31:01 +00:00
portelli 0887566134 Hadrons: scheduler back! 2017-12-13 16:36:15 +00:00
portelli 61fc50d616 Hadrons: better organisation of the VM 2017-12-13 13:44:23 +00:00
portelli a9c8d7dad0 Hadrons: code cleaning 2017-12-13 12:13:40 +00:00
portelli 259d504ef0 Hadrons: first full implementation of the module memory profiler 2017-12-12 19:32:58 +00:00
portelli f3a77f4b7f Merge branch 'feature/hadrons' into feature/hadrons-new-memory-model 2017-12-12 14:05:23 +00:00
portelli 26d7b829a0 Hadrons: error managed through expections 2017-12-12 14:04:28 +00:00
portelli 64161a8743 Hadrons: much simpler reference dependency 2017-12-12 13:08:01 +00:00
portelli 2401360784 Merge pull request #138 from guelpers/feature/hadrons
bug fix in sequential insertion of conserved vector current
2017-12-11 18:53:41 +01:00
Vera Guelpers 2cfb50cbe5 bug fix in sequential insertion of conserved vector current 2017-12-08 11:13:39 +00:00
portelli f9aa39e1c4 global memory debug through command line flag 2017-12-07 14:40:58 +01:00
portelli 0fbf445edd Hadrons: object creation that get properly captured by the memory profiler 2017-12-06 16:51:48 +01:00
portelli e78794688a memory profiler improvement 2017-12-06 16:50:25 +01:00
portelli 9e31307963 Merge branch 'feature/hadrons' into feature/hadrons-new-memory-model 2017-12-06 16:49:32 +01:00
portelli 29e2eddea8 Merge branch 'develop' into feature/hadrons-new-memory-model 2017-12-06 16:49:21 +01:00
portelli 0a038ea15a Merge branch 'develop' into feature/hadrons 2017-12-06 16:49:10 +01:00
portelli 62eb1f0e59 FermionOperator virtual destructor needed for polymorphism 2017-12-06 16:48:17 +01:00
portelli 5422251959 Hadrons: execution part moved in a new virtual machine class 2017-12-05 15:31:59 +01:00
paboyle 9579c9c327 Threading improvement 2017-12-05 14:12:22 +00:00
paboyle 3729c7a7a6 Clean up of test 2017-12-05 13:07:31 +00:00
paboyle c24d4c8d0e Improved parallel RNG init 2017-12-05 13:01:10 +00:00
paboyle a14038051f Improved AllToAll asserts 2017-12-05 11:43:25 +00:00
paboyle 3e560b9462 Faster RNG init 2017-12-05 11:42:05 +00:00
paboyle d93c6760ec Faster code for split unsplit 2017-12-05 11:39:26 +00:00
paboyle ae3b7713a9 Cold start doesnt need RNG 2017-12-05 11:36:31 +00:00
portelli cbd8fbe771 Merge branch 'feature/hadrons' into feature/hadrons-new-memory-model 2017-12-03 19:48:56 +01:00
portelli d391f05cb7 Merge branch 'develop' into feature/hadrons 2017-12-03 19:48:46 +01:00
portelli 3127b52c90 bootstrap script does not destroy Eigen is working offline 2017-12-03 19:48:34 +01:00
portelli 01f00385a4 Hadrons: genetic pair selection based on exponential probability 2017-12-03 19:47:40 +01:00
portelli 59aae5f5ec Hadrons: garbage collector clean temporaries 2017-12-03 19:47:11 +01:00
portelli 624246409c Hadrons: module setup/execute protected to forbid user to bypass execution control 2017-12-03 19:46:18 +01:00
portelli 2a9ebddad5 Hadrons: scheduler offline, minimal code working again 2017-12-03 19:45:15 +01:00
portelli ff7afe6e17 Merge branch 'feature/hadrons' into feature/hadrons-new-memory-model 2017-12-01 19:45:44 +00:00
portelli 33cb509d4b Merge branch 'develop' into feature/hadrons 2017-12-01 19:45:32 +00:00
portelli 456c78c233 Merge branch 'develop' into feature/hadrons-new-memory-model 2017-12-01 19:45:12 +00:00
portelli 2fd4989029 Merge branch 'develop' of github.com:paboyle/Grid into develop 2017-12-01 19:44:31 +00:00
portelli 2427a21428 minor serial IO fixes, XML now issues warning when trying to read absent nodes, these becomes 2017-12-01 19:44:07 +00:00
portelli 514993ed17 Hadrons: progress on the interface, genetic algorithm freezing 2017-12-01 19:38:23 +00:00
paboyle 28ceacec45 Split/Unsplit working 2017-11-27 15:13:29 +00:00
paboyle e6a3e375cf Debug 2017-11-27 15:10:22 +00:00
paboyle 4987edbd44 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-11-27 12:34:56 +00:00
paboyle ad140bb6e7 Clean on multinode target after split 1 1 2 4 -> 1 1 2 2 2017-11-27 12:34:25 +00:00
paboyle 1f04e56038 Believe split/unsplit works, but need to make pretty 2017-11-27 12:33:08 +00:00
paboyle 4bfc8c85c3 Clean up verbose communicator create 2017-11-27 12:32:37 +00:00
azusayamaguchi e55397bc13 Staggerd cg 2017-11-24 14:18:30 +00:00
portelli a3fe874a5b Hadrons: everything is broken, repairing while implementing the new memory model 2017-11-22 23:27:19 +00:00
portelli f403ab0133 gitignore update 2017-11-22 17:13:09 +00:00
paboyle 94b8fb5686 Debug in progress 2017-11-19 01:39:04 +00:00
Guido Cossu 1f1d77b01a Performance metrics for the Scalar Action force term 2017-11-14 10:01:48 +00:00
pretidav 6a15e2e8ef Added WilsonTwoIndexAntiSymmImpl instantiation in WilsonKernelsHand.cc (shoud not be necessary) 2017-11-12 14:16:19 +01:00
portelli 074d17429f Merge branch 'develop' into feature/scalar_adjointFT
# Conflicts:
#	lib/communicator/Communicator_mpi3.cc
2017-11-11 18:09:55 +00:00
Peter Boyle 25f73018f4 Merge pull request #135 from fionnoh/develop
Declaring virtual functions as pure virtual functions.
2017-11-09 23:19:08 +00:00
fionnoh 1d7ccc6b2c Declaring virtual functions as pure virtual functions. 2017-11-09 19:46:57 +00:00
pretidav 59d9ccf70c restored WilsonKernelsHand.cc and added Qtop to production codes 2017-11-08 22:02:32 +01:00
Azusa Yamaguchi 1860b1698c Fixed the bag on MPI_T at Cam 2017-11-08 09:03:01 +00:00
Azusa Yamaguchi 9b8d1cc3da Staggered Schur decomposed matrix norm changed to not be the Schur anymore :(
Carleton wanted this for multimass / multishift
2017-11-07 14:48:45 +00:00
James Harrison 0c668bf46a QedFVol: Write to output files from one process only. 2017-11-07 14:46:39 +00:00
Guido Cossu 149c3f9e9c Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-11-07 14:01:13 +00:00
Guido Cossu c519aab19d Fixing the MPI memory leak in the communicators 2017-11-07 13:55:37 +00:00
paboyle 69929f20bb Destructor fix. Split Grid and MPI3 will not yet work without more effort from me. 2017-11-06 23:45:00 +00:00
James Harrison 840814c776 QedFVol: Patch to fix MPI communicators error 2017-11-06 16:34:55 +00:00
pretidav a493429218 added Production tests for MixedRep, Adj, 2S, 2AS. Still missing QObs. The HMC is not printing correctly all the actions and forces. 2017-11-04 18:16:54 +01:00
pretidav 915f610da0 clover 2indexSymm hmc production test created. clover 2indexAsymm and clover mixed to be filled. 2017-11-04 01:17:06 +01:00
pretidav c79606a5dc Test production code wilson clover. Still missing QObs measurement on-the-fly. 2017-11-03 22:46:32 +01:00
James Harrison 95af55128e QedFVol: Redo optimisation of scalar VP (extra memory requirements were not the problem), and undo optimisation of charged propagator (which seemed to be causing HDF5 errors, although I don’t know why). 2017-11-03 18:46:16 +00:00
James Harrison 9f2a57e334 QedFVol: Undo optimisation of scalar VP, to reduce memory requirements 2017-11-03 13:10:11 +00:00
James Harrison c645d33db5 QedFVol: Redo optimisation of charged propagator, and fix I/O bug 2017-11-03 10:59:26 +00:00
James Harrison e0f1349524 QedFVol: Undo optimisation of charged propagator 2017-11-03 09:22:41 +00:00
paboyle 360efd0088 Improved treatment of reverse asked for by chris.
Truncate the basis.
Power method renormalises
2017-11-02 22:05:31 +00:00
pretidav 7b42ac9982 added polyakov loop observable to the hmc 2017-11-02 21:58:16 +01:00
paboyle c5c647e35e Merge branch 'feature/lanczos-reorg' into develop 2017-11-02 15:23:11 +00:00
portelli a4e5fd1000 Merge branch 'feature/hadrons' into feature/hadrons-new-memory-model 2017-11-01 19:24:51 +00:00
portelli 682e7d7839 Merge branch 'develop' into feature/hadrons 2017-11-01 19:24:38 +00:00
Guido Cossu 8e057721a9 Anisotropic Clover term written and tested 2017-11-01 12:50:54 +00:00
Guido Cossu fa5e4add47 Added support for anisotropy to the WilsonFermion class 2017-10-31 18:20:38 +00:00
James Harrison 79b761f923 Merge branch 'develop' into feature/qed-fvol
# Conflicts:
#	lib/communicator/Communicator_base.cc
2017-10-30 15:53:18 +00:00
James Harrison 0d4e31ca58 QedFVol: Calculate phase factors for momentum projections once per configuration only. 2017-10-30 15:46:50 +00:00
James Harrison b07a354a33 QedFVol: output scalar propagator before FFT in spatial directions. 2017-10-30 14:20:44 +00:00
paboyle 27ea2afe86 No compile on comms == none fix 2017-10-30 01:14:11 +00:00
paboyle 78e8704eac Shaking out 2017-10-30 00:25:31 +00:00
paboyle 67131d82f2 Get subrank info from communicator constructor 2017-10-30 00:24:11 +00:00
paboyle 615a9448b9 Extended sub comm supported 2017-10-30 00:23:34 +00:00
paboyle 00164f5ce5 : 2017-10-30 00:22:52 +00:00
paboyle a7f72eb994 SHaking out 2017-10-30 00:22:06 +00:00
paboyle 501fa1614a Communicator updates for split grid 2017-10-30 00:16:12 +00:00
paboyle 5bf42e1e15 Update 2017-10-30 00:05:21 +00:00
paboyle fe4d9b003c More digits 2017-10-30 00:04:47 +00:00
paboyle 4a699b4da3 New rank can be found out 2017-10-30 00:04:14 +00:00
paboyle 689323f4ee Reverse dim ordering lexico support 2017-10-30 00:03:15 +00:00
Guido Cossu 749189fd72 Full clover force correct 2017-10-29 12:03:08 +00:00
Guido Cossu f941c4ee18 Clover term force ok 2017-10-29 11:43:33 +00:00
paboyle 84b441800f Merge branch 'develop' into feature/lanczos-reorg 2017-10-27 14:21:38 +01:00
paboyle 1ef424b139 Split grid Y2K bug fix attempt 2017-10-27 14:20:35 +01:00
paboyle aa66f41c69 Bug fix in the coarse restore...
Think this is nearly there
2017-10-27 10:29:34 +01:00
paboyle f96c800d25 Passes reload of coarse basis 2017-10-27 09:43:22 +01:00
paboyle 32a52d7583 Move the local coherence lanczos into algorithms.
Keep the I/O in the tester. Other people can copy this method to write other I/O formats.
2017-10-27 09:04:31 +01:00
paboyle fa04b6d3c2 Finished ? Verifying coarse evec restore 2017-10-27 08:18:29 +01:00
paboyle 7fab183c0e Better read test 2017-10-27 08:17:49 +01:00
paboyle 9ec9850bdb 64bit ftello update 2017-10-26 23:34:31 +01:00
paboyle 0c4ddaea0b Cleaning up 2017-10-26 23:31:46 +01:00
paboyle 00ebc150ad Mistake in string parse; interface is ambiguous and must fix. Is char * a file, or a XML buffer ? 2017-10-26 23:30:37 +01:00
paboyle 0f3e9ae57d Gsites error. Only appeared (so far) in I/O code for even odd fields 2017-10-26 23:29:59 +01:00
Azusa Yamaguchi 034de160bf Staggered updates : Schur fixed and added a unit test for Test_staggered_cg_schur.cc giving stronger check 2017-10-26 20:58:46 +01:00
Guido Cossu 76bcf6cd8c Deleting vscode settings file 2017-10-26 18:45:41 +01:00
Guido Cossu 91b8bf0613 Debugging force term 2017-10-26 18:23:55 +01:00
paboyle 14507fd6e4 Final? candidate for push back on the lanczos reorg feature 2017-10-26 16:25:01 +01:00
paboyle 2db05ac214 Test for split/unsplit in isolation 2017-10-26 07:48:03 +01:00
paboyle 31f99574fa Moving these out of algorithms 2017-10-26 07:47:42 +01:00
paboyle a34c8a2961 Update to IRL; getting close to the structure I would like. 2017-10-26 07:45:56 +01:00
paboyle ccd20df827 Better IRL interface 2017-10-26 01:59:59 +01:00
paboyle e9be293444 Better messaging 2017-10-26 01:59:30 +01:00
paboyle d577211cc3 Relax stoppign condition 2017-10-25 23:57:54 +01:00
paboyle f4336e480a Faster converge time 2017-10-25 23:53:44 +01:00
paboyle e4d461cb03 Messagign 2017-10-25 23:53:19 +01:00
paboyle 3d63b4894e Use existing functionality where possible 2017-10-25 23:52:47 +01:00
paboyle 08583afaff Red black friendly coarsening 2017-10-25 23:51:18 +01:00
paboyle b395a312af Better error messaging 2017-10-25 23:50:37 +01:00
paboyle 66295b99aa Bit less verbose SciDAC IO 2017-10-25 23:50:05 +01:00
paboyle b8654be0ef 64 bit safe offsets 2017-10-25 23:49:23 +01:00
paboyle a479325349 Rewrite of local coherence lanczos 2017-10-25 23:48:47 +01:00
paboyle f6c3f6bf2d XML serialisation of parms and initialise from parms object 2017-10-25 23:47:59 +01:00
paboyle d83868fdbb Identity linear op added -- useful in circumstances where a linear op may or may not be needed.
Supply a trivial one if not needed
2017-10-25 23:47:10 +01:00
paboyle 303e0b927d Improvements for coarse grid compressed lanczos 2017-10-25 23:46:33 +01:00
paboyle 28ba8a0f48 Force spacing more nicely 2017-10-25 23:45:57 +01:00
Azusa Yamaguchi f9e28577f3 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-10-25 21:07:56 +01:00
Guido Cossu e0cae833da Merge branch 'develop' into feature/scalar_adjointFT 2017-10-25 10:49:50 +01:00
Guido Cossu 8a3aae98f6 Solving minor bug in compilation 2017-10-25 10:34:49 +01:00
Guido Cossu 8309f2364b Solving again the MPI comm bug with FFTs 2017-10-25 10:24:14 +01:00
Azusa Yamaguchi cac1750078 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-10-24 23:30:36 +01:00
Guido Cossu e17cd35151 Merge branch 'develop' into feature/scalar_adjointFT 2017-10-24 17:31:22 +01:00
Guido Cossu ccdec7a7ab Merge branch 'develop' into feature/clover 2017-10-24 16:51:14 +01:00
Guido Cossu 93642d813d Merging 2017-10-24 16:48:05 +01:00
Guido Cossu 0bc381f982 Merge pull request #133 from pretidav/feature/clover
Feature/clover
2017-10-24 15:15:21 +01:00
Guido Cossu 2986aa76f8 Restoring Perfcounts 2017-10-24 13:32:02 +01:00
Guido Cossu 657779374b Adding vscode to gitignore 2017-10-24 13:27:17 +01:00
Guido Cossu ec8cd11c1f Cleanup and prepare for pull request 2017-10-24 13:21:17 +01:00
Guido Cossu cbda4f66e0 Debug of the field strength 2017-10-24 10:20:13 +01:00
Guido Cossu 6579dd30ff More debug test 2017-10-23 18:47:00 +01:00
Guido Cossu 031c94e02e Debugging process for the clover term 2017-10-23 18:27:34 +01:00
Guido Cossu 6391b2a1d0 Added test for Wilson and Clover fermions 2017-10-23 14:42:35 +01:00
Guido Cossu 2e50b55ae4 Changes in the Makefile to compile against Chroma on Linux 2017-10-23 13:32:26 +01:00
James Harrison c433939795 QedFVol: Temporarily remove incomplete implementation of infinite-volume photon 2017-10-20 16:27:58 +01:00
James Harrison b6a4c31b48 Merge branch 'feature/qed-fvol' of https://github.com/jch1g10/Grid into feature/qed-fvol 2017-10-20 16:25:07 +01:00
James Harrison 98b1439ff9 QedFVol: pass arbitrary input values to photon constructor in UnitEm 2017-10-20 16:24:09 +01:00
Guido Cossu 27936900e6 Putting the FG verbosity in the Integrator level 2017-10-18 13:08:09 +01:00
James Harrison 564738b1ff Add module for unit EM field 2017-10-17 14:03:57 +01:00
Guido Cossu cd3e810d25 Merge branch 'develop' into feature/scalar_adjointFT 2017-10-17 11:31:14 +01:00
pretidav 317ddfedee updated test clover + first attempt derivative clove term (still missing spin part) 2017-10-16 02:47:33 +02:00
paboyle e325929851 ALl codes compile against the new Lanczos call signature 2017-10-13 14:02:43 +01:00
paboyle 47af3565f4 Logging improvement; reunified the Lanczos codes 2017-10-13 13:23:07 +01:00
paboyle 4b4d187935 Reunified the Lanczos implementations 2017-10-13 13:22:44 +01:00
paboyle 9aff354ab5 Final version prior to reunification 2017-10-13 13:22:26 +01:00
paboyle cb9ff20249 Approx tests and lanczos improvement 2017-10-13 11:30:50 +01:00
James Harrison a80e43dbcf Added infinite-volume photon in Photon.h (not checked yet) 2017-10-11 16:44:51 -04:00
paboyle 9fe6ac71ea Starting reorg of Blocked lanczos 2017-10-11 10:12:07 +01:00
portelli 5c392a6ecc Merge commit 'bf58557fb1ec710c766e19c9a8809b0a352de239' into feature/scalar_adjointFT 2017-10-10 17:14:56 +01:00
Azusa Yamaguchi f1fa00b71b Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-10-10 14:26:44 +01:00
paboyle bf58557fb1 Block compressed Lanczos 2017-10-10 14:15:11 +01:00
paboyle 10cb37f504 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-10-10 14:09:44 +01:00
Azusa Yamaguchi 1374c943d4 Correct Schur operator called 2017-10-10 13:59:50 +01:00
paboyle a1d80282ec cb factorise 2017-10-10 13:49:31 +01:00
paboyle 4eb8bbbebe Christop mods 2017-10-10 13:48:51 +01:00
paboyle d1c6288c5f Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-10-10 13:38:40 +01:00
Azusa Yamaguchi dd949bc428 Merge branch 'feature/staggering' into develop 2017-10-10 13:02:51 +01:00
Azusa Yamaguchi bb7378cfc3 Schur for staggered 2017-10-10 12:02:18 +01:00
Azusa Yamaguchi f0e084a88c Schur staggered 2017-10-10 10:00:43 +01:00
paboyle 153672d8ec Split CG testing 2017-10-09 23:20:58 +01:00
paboyle 08ca338875 Split grid communication 2017-10-09 23:19:45 +01:00
paboyle f7cbf82c04 Better stdout/err debug 2017-10-09 23:18:48 +01:00
paboyle 07009c569a Comms splitting improvements 2017-10-09 23:16:51 +01:00
Guido Cossu 15d690e9b9 Adding the cartesian communicator destructor 2017-10-09 09:59:58 +01:00
portelli 63b2bc1936 Merge branch 'develop' into feature/hadrons
# Conflicts:
#	lib/qcd/action/fermion/FermionOperatorImpl.h
2017-10-05 14:16:23 +01:00
David Preti d810e8c8fb first attempt to write C terms in clover derivative. Some shifts to be fixed 2017-10-05 10:13:53 +02:00
Azusa Yamaguchi 09f4cdb11e Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering 2017-10-04 10:51:16 +01:00
Azusa Yamaguchi 1e54882f71 Stagger 2017-10-04 10:51:06 +01:00
Guido Cossu 27caff92c6 Merge branch 'feature/scalar_adjointFT' of https://github.com/paboyle/Grid into feature/scalar_adjointFT 2017-10-04 09:44:27 +01:00
portelli d38cee73bf Scalar: easier Fourier acceleration parametrisation through -D flags 2017-10-03 17:29:34 +01:00
portelli 8784f2a88d post-merge fix 2017-10-03 14:38:10 +01:00
portelli c497864b5d Merge commit 'd54807b8c0cd1a7658ff8563bb00d1137b987e3e' into feature/scalar_adjointFT
# Conflicts:
#	lib/communicator/Communicator_base.h
#	lib/communicator/Communicator_mpi.cc
#	lib/communicator/Communicator_mpit.cc
2017-10-03 14:27:54 +01:00
portelli 05c1c88440 Scalar: more action generalisation 2017-10-03 14:26:20 +01:00
paboyle d54807b8c0 MPIT works with split grid now 2017-10-02 23:14:56 +01:00
Guido Cossu f6ba2b95ce Merge branch 'develop' into feature/scalar_adjointFT 2017-10-02 15:19:20 +01:00
paboyle 5625b47c7d Merge branch 'feature/dwf-multirhs' into develop 2017-10-02 12:42:32 +01:00
paboyle 1edcf902b7 Macos ANON 2017-10-02 12:41:02 +01:00
paboyle e5c19e1fd7 RB constructor change 2017-10-02 12:25:52 +01:00
paboyle a11d0a33d1 Merge branch 'feature/dwf-multirhs' of https://github.com/paboyle/Grid into feature/dwf-multirhs 2017-10-02 11:42:07 +01:00
paboyle 4f8b6f26b4 Merge branch 'develop' into feature/dwf-multirhs 2017-10-02 11:41:49 +01:00
paboyle 073525c5b3 Small patch from cori 2017-10-02 03:38:21 -07:00
Azusa Yamaguchi eb6153080a Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering 2017-10-02 08:56:33 +01:00
Guido Cossu f7072d1ac2 Solving an annoying compilation error in json 2017-10-02 07:13:40 +01:00
portelli a021933002 Scalar: SU(N) action change to t'Hooft scaling 2017-09-29 16:09:34 +01:00
James Harrison b99622d9fb QedFVol: fix problem with JSON wanting gcc 4.9 2017-09-28 13:34:33 -04:00
portelli 937c77ead2 Merge branch 'develop' into feature/qed-fvol 2017-09-28 16:25:20 +01:00
portelli 95e5a2ade3 Merge pull request #116 from jch1g10/feature/qed-fvol
Feature/qed fvol
2017-09-25 15:08:33 +01:00
David Preti 56478d63a5 clover + test (valence) 2017-09-24 19:32:15 +02:00
portelli df21668f2c memory profiler update 2017-09-22 14:21:18 +01:00
Guido Cossu 482368e9de Merge branch 'develop' into feature/scalar_adjointFT 2017-09-21 13:44:08 +01:00
paboyle fddeb29d6b Bug fix with spreadout FFT 2017-09-21 11:10:08 +01:00
paboyle a9ec5cf564 Christoph bug report integrate 2017-09-21 10:32:41 +01:00
Peter Boyle 946a8671b9 Merge pull request #129 from djm2131/feature/eofa
Add support for DWF with the exact one flavor algorithm
2017-09-21 10:15:21 +01:00
Azusa Yamaguchi a6eeea777b Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-09-21 10:12:41 +01:00
Peter Boyle 771a1b8e79 Merge pull request #128 from paboyle/feature/CG-reliable-update
Feature/cg reliable update
2017-09-21 10:12:03 +01:00
Peter Boyle bfb68e6f02 Merge pull request #130 from giltirn/gparity-handunroll
Gparity handunroll
2017-09-21 10:11:00 +01:00
Azusa Yamaguchi 77f7737ccc Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-09-19 14:28:01 +01:00
Guido Cossu 9a827d0242 Fixing a compilation error 2017-09-18 14:55:51 +01:00
Guido Cossu 999c623590 Solving a memory leak in Communicator_mpi 2017-09-18 14:39:04 +01:00
paboyle 18c335198a Merge branch 'hotfix/dirac-ITT-fix1' into develop 2017-09-16 18:19:02 +01:00
paboyle f9df685cde Merge branch 'hotfix/dirac-ITT-fix1' 2017-09-16 18:18:48 +01:00
paboyle 17c5b0f152 Patching comparison point 2017-09-16 18:18:07 +01:00
paboyle 5918769f97 Subtle Naik term bug updated in Stencil; less on logical && with a function call on right 2017-09-16 12:51:26 +01:00
Guido Cossu b542d349b8 Minor cosmetic changes 2017-09-15 11:48:36 +01:00
Guido Cossu 91eaace19d Added support for FFT accelerated updates 2017-09-15 11:33:45 +01:00
Guido Cossu bbaf1ada91 Merge branch 'feature/json-fix' into develop 2017-09-08 16:02:08 +01:00
Guido Cossu 1950ac9294 Fixed the Intel compiler problem with the JSON classes 2017-09-08 15:18:59 +01:00
Guido Cossu 13fa70ac1a Merge branch 'develop' into feature/json-fix 2017-09-08 13:42:20 +01:00
Guido Cossu 7cb2b11f26 Fixing Intel compiler error for the JSON parser 2017-09-08 13:41:53 +01:00
Guido Cossu 1184ed29ae Merge pull request #124 from nmeyer-ur/feature/arm-neon
Added integer reduce functionality
2017-09-08 10:54:35 +02:00
paboyle 203c7bf6fa Merge branch 'hotfix/dirac-ITT-fix' into develop 2017-09-05 15:08:51 +01:00
paboyle c709883f3f Merge branch 'hotfix/dirac-ITT-fix' 2017-09-05 15:08:16 +01:00
paboyle aed5de4d50 Patching macos compile 2017-09-05 15:07:07 +01:00
paboyle ba27cc6571 Mac os happiness 2017-09-05 15:00:16 +01:00
paboyle d856327250 Merge branch 'release/dirac-ITT' into develop 2017-09-05 14:56:12 +01:00
paboyle d75369cb56 Merge branch 'release/dirac-ITT' 2017-09-05 14:55:54 +01:00
Peter Boyle bf973d0d56 SHM complete 2017-09-05 14:30:29 +01:00
Peter Boyle 837bf8a5be Updating to control the SHM allocation scheme under configure time options 2017-09-05 12:51:02 +01:00
Peter Boyle c05b2199f6 Improvements to huge memory 2017-09-04 10:41:21 -04:00
Azusa Yamaguchi a5fe07c077 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-09-04 14:10:15 +01:00
Azusa Yamaguchi b83b2b1415 Stability improvement to BCG. Force m_rr hermitian beyond rounding. 2017-09-04 14:09:47 +01:00
James Harrison 91676d1dda Fix “MAP_ANONYMOUS undefined” error on OSX. 2017-09-01 15:48:30 +01:00
Peter Boyle b331be9101 Better reporting 2017-08-31 11:32:57 +01:00
Peter Boyle 49c20a9fa8 Patch to reporting 2017-08-31 11:32:21 +01:00
paboyle 7359df3501 Full reporting for benchmark; save robustness factor 2017-08-31 10:42:35 +01:00
Christopher Kelly 59bd1fe21b Fix for 'perm' and 'local' not being set for hand-unrolled external-site Dslash, which caused incorrect behavior of G-parity kernel 2017-08-29 13:07:37 -07:00
portelli a56e3b40c4 Merge branch 'develop' into feature/hadrons 2017-08-29 11:03:53 -06:00
Nils Meyer 4e907fef2c Merge remote-tracking branch 'grid/develop' into feature/arm-neon 2017-08-29 17:47:36 +02:00
Christopher Kelly 67888b657f Merge branch 'gparity-handunroll' of https://github.com/giltirn/Grid into gparity-handunroll 2017-08-29 09:52:05 -04:00
Christopher Kelly 74af885d4e Removed some no-longer-needed associated with G-parity hand unrolled kernel 2017-08-29 09:50:37 -04:00
James Harrison ac3611bb19 Merge branch 'develop' of https://github.com/paboyle/Grid into feature/qed-fvol 2017-08-29 11:53:37 +01:00
Christopher Kelly d36d2fb40d Added ability to override default Ls in Benchmark_dwf 2017-08-28 06:53:56 -07:00
Peter Boyle 5b9267e88d Cleaner comms benchmark treatment for one node runs 2017-08-27 18:24:48 -04:00
paboyle 15fd4003ef Improving presentation of results 2017-08-27 13:46:02 +01:00
paboyle 4b4c2a715b fcntl.h needed 2017-08-26 11:38:04 +01:00
paboyle 54a5e6c1d0 Check if we get huge pages on linux. Larry Meadows piece of magic. 2017-08-25 22:36:08 +01:00
paboyle 73aeca7dea Merge branch 'feature/multi-communicator' into develop 2017-08-25 21:55:09 +01:00
paboyle ad89abb018 Fix 2017-08-25 20:43:37 +01:00
paboyle 80c5bce5bb Merge branch 'develop' into feature/multi-communicator 2017-08-25 20:21:26 +01:00
paboyle f68b5de9c8 No compile fix on Clang 2017-08-25 19:35:21 +01:00
Peter Boyle d0f3d525d5 Optimal block size for KNL 2017-08-25 19:33:54 +01:00
Christopher Kelly f365a83fae In G-parity unrolled kernel, replaced calls to permute and exchange with run-time-evaluated permute type with explicit calls to appropriate underlying functions 2017-08-25 14:24:11 -04:00
Peter Boyle 3a58217405 Updated 2017-08-25 14:29:53 +01:00
Peter Boyle c289699d9a updated from cambridge mpi3 shakeout 2017-08-25 11:41:01 +01:00
Peter Boyle c3b1263e75 Benchmark prep 2017-08-25 09:25:54 +01:00
Christopher Kelly 34a9aeb331 Reduced number of if-statement evaluations in G-parity unrolled kernel 2017-08-24 13:53:50 -07:00
portelli 5846566728 Merge branch 'develop' into feature/hadrons 2017-08-24 18:20:52 +01:00
portelli 102ea9ae66 CI update 2017-08-24 18:17:09 +01:00
James Harrison cc4afb978d Fix bug in non-zero momentum projection 2017-08-24 17:31:44 +01:00
portelli 21b02760c3 Merge branch 'develop' into feature/hadrons 2017-08-24 17:05:45 +01:00
Peter Boyle 2bcb704af2 Merge pull request #121 from Lanny91/feature/hadrons
Feature/hadrons
2017-08-24 12:59:08 +01:00
paboyle 5fa386ddc9 FFT test compile fixed 2017-08-24 10:17:52 +01:00
Christopher Kelly edabb3577f Imported Benchmark_gparity 2017-08-23 16:54:06 -04:00
Christopher Kelly ce5df177ee Removed superfluous implementation of G-parity twist for hand-unrolled kernel from GparityWilsonImpl 2017-08-23 15:05:22 -04:00
Christopher Kelly a0bb8e5b46 Added hand-unrolled kernel implementations of all the other dslash precision / comms precision combinations with G-parity 2017-08-23 14:44:40 -04:00
Christopher Kelly 46f88e6d72 G-parity hand-unrolled intrinsics twist now uses one less permute and one less temporary 2017-08-23 13:21:10 -04:00
David Murphy dd8f1ea189 Vectorized Mobius EOFA Dperp + shift operation 2017-08-23 13:17:26 -04:00
Christopher Kelly b61835c1a5 Added inplace version of intrinsic G-parity twist to hand-unrolled kernel 2017-08-23 12:33:48 -04:00
Azusa Yamaguchi d9cd4f0273 Staggered multinode block cg debugged. Missing global sum.
Code stalls and resumes on KNL at cambridge. Curious.

CG iterations 23ms each, then 3200 ms pauses. Mean bandwidth reports
as 200MB/s. Comms dominant in the report. However, the time behaviour suggests it
is *bursty*.... Could be swap to disk?
2017-08-23 15:07:18 +01:00
David Murphy 459f70e8d4 Check-in of working Mobius EOFA class and tests 2017-08-22 22:38:30 -04:00
Christopher Kelly 061e48fd73 Replaced slow unpack-repack in G-parity BC twist with intrinsics version 2017-08-22 18:12:12 -04:00
Christopher Kelly ab50145001 Implemented first, unoptimized version of hand-unrolled G-parity kernels
Improved Test_gparity
2017-08-22 17:12:25 -04:00
paboyle b49bec0cec MAP_HUGETLB portability fix 2017-08-20 03:08:54 +01:00
paboyle ae56e556c6 finalise issue on new OPA revert 2017-08-20 02:53:12 +01:00
paboyle 1cdf999668 Moving multicommunicator into mpi3 also for threading 2017-08-20 02:39:10 +01:00
paboyle 11062fb686 Comms none fail fix 2017-08-20 01:37:07 +01:00
paboyle 383ca7d392 Switch off comms for now until feature/multi-communicator is merged 2017-08-20 01:27:48 +01:00
paboyle a446d95c33 Trying to pass TeamCity and Travis 2017-08-20 01:10:50 +01:00
paboyle be66e7dd95 Merge branch 'develop' into feature/multi-communicator 2017-08-19 23:12:38 +01:00
paboyle 6d0d064a6c Update TODO 2017-08-19 23:11:30 +01:00
paboyle bfef525ed2 New benchmark prep 2017-08-19 23:10:12 +01:00
Peter Boyle 0b0cf62193 Fix mpi 3 interface change 2017-08-19 13:18:50 -04:00
Peter Boyle 7d88198387 Merge branch 'develop' into feature/multi-communicator 2017-08-19 13:03:35 -04:00
Peter Boyle 2f619482b8 Enable blocking stencil send 2017-08-19 12:53:59 -04:00
Peter Boyle d6472eda8d Use mmap 2017-08-19 12:53:18 -04:00
Peter Boyle 9e658de238 Use Vector 2017-08-19 12:52:44 -04:00
Peter Boyle bcefdd7c4e Align both allocator calls to 2MB 2017-08-19 12:49:02 -04:00
David Murphy 9d45fca8bc Implement MobiusEOFAFermioncache.cc 2017-08-17 23:45:36 -04:00
David Murphy ac9e6b63c0 More re-import of Mobius EOFA 2017-08-17 19:28:53 -04:00
David Murphy e140b3f802 Beginning to re-import Mobius EOFA 2017-08-16 23:36:23 -04:00
David Murphy d9d3d30cc7 Minor clean-up 2017-08-16 20:57:51 -04:00
David Murphy 47a12ec7b5 Implement EOFA pseudofermion force and Shamir tests for G-parity and non G-parity cases 2017-08-16 19:50:08 -04:00
David Murphy ec1e2f7a40 Add (mostly implemented) ExactOneFlavourRatio pseudofermion class and tests of Shamir heatbath and action 2017-08-16 12:38:59 -04:00
David Murphy 41f73ec083 Add ChronoForecast class for forecasting solutions across poles in the EOFA heatbath 2017-08-16 12:37:38 -04:00
Guido Cossu fd367d8bfd Debugging the PointerCache 2017-08-16 09:42:57 +01:00
David Murphy 6d0786ff9d Typo fixes and check-in of G-parity action test for DWF 2017-08-15 22:47:00 -04:00
David Murphy b7f93aeb4d Change CayleyFermion5D::SetCoefficientsInternal to virtual to allow overriding in derived EOFA classes 2017-08-15 14:18:51 -04:00
David Murphy 202a7fe900 Re-import DWF and abstract base EOFA fermion classes and tests 2017-08-15 13:36:08 -04:00
Guido Cossu 8d168ded4a Correction of the dagger version of the Clover 2017-08-15 10:50:44 +01:00
Guido Cossu 8a3fe60a27 Added more asserts at grid creation time 2017-08-08 11:36:20 +01:00
Guido Cossu 44051aecd1 Checking for integer divisions in cartesian full 2017-08-08 10:31:12 +01:00
Guido Cossu 06e6f8de00 Check that the reduced dim is an integer 2017-08-08 10:22:12 +01:00
Guido Cossu dbe4d7850c Make a test file compatible with all architectures 2017-08-06 10:49:45 +01:00
Guido Cossu 4fe182e5a7 Added high level HMC support for overriding default SIMD lane decomposition 2017-08-06 10:46:19 +01:00
Guido Cossu 75ee6cfc86 Debugging the Clover term 2017-08-04 16:08:07 +01:00
Guido Cossu fde71c3c52 Merge branch 'develop' into feature/clover 2017-08-04 12:19:57 +01:00
Guido Cossu 175f393f9d Binary IO error checking 2017-08-04 12:14:10 +01:00
Christopher Kelly 7d867a8134 Merge branch 'develop' into feature/CG-reliable-update 2017-08-02 09:48:04 -04:00
Christopher Kelly 9939b267d2 Added switching to fallback linear operator in reliable update CG, and added recalculation of b parameter on update. 2017-07-31 13:39:44 -04:00
Lanny91 323e9c439a Hadrons: Legal banner fixes 2017-07-31 12:26:34 +01:00
Lanny91 28396f1048 Merge branch 'feature/rare_kaon' of https://github.com/Lanny91/Grid into feature/hadrons 2017-07-31 12:19:54 +01:00
Lanny91 67b34e5789 Modified conserved current 5th dimension loop for compatibility with 5D vectorisation. 2017-07-31 11:35:01 +01:00
Peter Boyle 14d53e1c9e Threaded MPI calls patches 2017-07-29 13:08:10 -04:00
Guido Cossu 8bd869da37 Correcting a bug in the IO routines 2017-07-27 15:12:50 +01:00
Guido Cossu c7036f6717 Adding checks for libm and libstdc++ 2017-07-27 11:15:09 +01:00
Guido Cossu c0485d799d Explicit parameter declaration in the WilsonGauge test 2017-07-26 16:26:04 +01:00
Guido Cossu 7abc5613bd Added smearing to the topological charge observable 2017-07-26 16:21:17 +01:00
Guido Cossu 237cfd11ab Solving the spurious O2 flags 2017-07-26 12:08:51 +01:00
Guido Cossu a4b7dddb67 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-07-26 12:07:38 +01:00
Guido Cossu 5696781862 Debug error in Tensor mult 2017-07-26 12:07:34 +01:00
Christopher Kelly 8f4b3049cd Merge branch 'feature/CG-reliable-update' into ckelly_develop 2017-07-25 11:55:26 -04:00
Christopher Kelly 2a6e673a91 Merge branch 'develop' into feature/CG-reliable-update 2017-07-25 11:54:43 -04:00
Christopher Kelly 9b6cde173f Merge branch 'feature/CG-reliable-update' into ckelly_develop 2017-07-25 11:51:08 -04:00
Christopher Kelly 9f280b82c4 Added mixed-precision CG with reliable updates 2017-07-25 11:30:41 -04:00
portelli c3f0889eda Merge pull request #123 from giltirn/develop
Fix for 'using namespace' in lib/qcd/utils/GaugeFix.h
2017-07-25 11:32:02 -03:00
Nils Meyer 7a53dc3715 Added integer reduce functionality 2017-07-24 11:12:59 +02:00
Christopher Kelly 0f214ad427 Moved FourierAcceleratedGaugeFixer into Grid::QCD namespace and removed 'using namespace' directives from header 2017-07-21 11:13:51 -04:00
Peter Boyle fe4912880d Update README.md 2017-07-17 09:53:07 +01:00
Lanny91 875e1a841f Hadrons: updated Quark -> MFermion/GaugeProp module name in test. 2017-07-16 13:47:00 +01:00
Lanny91 0366288b1c Hadrons: added tests for 3pt contractions. 2017-07-16 13:45:55 +01:00
Lanny91 6293d438cd Hadrons: sink smearing compatibility for 3pt contraction modules. 2017-07-16 13:43:25 +01:00
Lanny91 852ade029a Hadrons: Added module to sink a propagator 2017-07-16 13:41:47 +01:00
Peter Boyle f038c6babe Update README.md 2017-07-14 22:59:16 +01:00
Peter Boyle 169f4b2711 Update README.md 2017-07-14 22:56:06 +01:00
Peter Boyle 2d8aff36fe Update README.md 2017-07-14 22:52:16 +01:00
Guido Cossu 9fa07eecde Merge branch 'develop' into feature/json-fix 2017-07-12 15:47:22 +01:00
azusayamaguchi 659d7d1a40 For test/solver
Fixed
2017-07-12 15:01:48 +01:00
Guido Cossu f64fb7bd77 Fix gcc error on JSON compilation 2017-07-12 14:55:42 +01:00
Guido Cossu 2a35449b91 Merge branch 'develop' into feature/json-fix 2017-07-12 14:47:00 +01:00
Guido Cossu 184af5bd05 Added support for std::pair in the JSON serialiser 2017-07-12 14:44:53 +01:00
Guido Cossu 097c9637ee Fixed the JSON parsing error 2017-07-11 14:31:57 +01:00
azusayamaguchi dc6f078246 fixed the header file for mpi3 2017-07-11 14:15:08 +01:00
Peter Boyle 8a4714a4a6 Update README.md 2017-07-09 00:11:54 +01:00
Peter Boyle 40e119c61c NUMA improvements worth preserving from AMD EPYC tests 2017-07-08 22:27:11 -04:00
Guido Cossu d9593c4b81 Merge branch 'develop' into feature/json-fix 2017-07-07 14:17:50 +01:00
paboyle ac740f73ce Works on Cori 2017-07-02 16:47:58 -07:00
paboyle 75dc7794b9 Working on Cori 2017-07-02 16:47:42 -07:00
paboyle dee68fc728 IO working multiple nodes again. Strategy of all nodes writing metadata is unsafe.
Only one rank should do this. must identify this rank. Means pass communicator to the
Objects.
2017-07-02 23:33:48 +01:00
paboyle a2d3643634 Merge branch 'feature/dwf-multirhs' of https://github.com/paboyle/Grid into feature/dwf-multirhs 2017-07-02 14:59:22 -07:00
paboyle 57002924bc NERSC shakeout of this 2017-07-02 14:58:30 -07:00
Peter Boyle 7b0237b081 Update README.md 2017-07-01 10:24:41 +01:00
Peter Boyle b68ad0cc0b Update README.md 2017-07-01 10:20:07 +01:00
Peter Boyle 37263fd9b1 Update README.md 2017-07-01 10:06:24 +01:00
Peter Boyle 3d09e3e9e0 Update README.md 2017-07-01 10:05:46 +01:00
Peter Boyle 1354b46338 Update README.md 2017-07-01 10:04:32 +01:00
Peter Boyle 251a97fe1b Update README.md 2017-07-01 09:55:36 +01:00
Peter Boyle e18929eaa0 Update README.md 2017-07-01 09:53:15 +01:00
Peter Boyle f3b0a92e71 Update README.md 2017-07-01 09:48:00 +01:00
Peter Boyle a0be3f7330 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-06-30 10:53:50 +01:00
Peter Boyle b5a6e4f1fd Best option for Xeon cache blocking set 2017-06-30 10:53:22 +01:00
Peter Boyle 7a788db3dc Guard first touch 2017-06-30 10:49:08 +01:00
Peter Boyle f20eceb6cd First touch once per page in a threaded loop 2017-06-30 10:48:27 +01:00
Peter Boyle 38325ebbc6 Interleave code path; not enabled 2017-06-30 10:23:51 +01:00
Peter Boyle b73bd151bb Switch off counters by default 2017-06-30 10:16:35 +01:00
Peter Boyle 694b305cab Update to reporting 2017-06-30 10:16:13 +01:00
Peter Boyle 2d3737a133 O3, KNL 2017-06-30 10:15:59 +01:00
Peter Boyle ac1f1838bc KNL only 2017-06-30 10:15:32 +01:00
Guido Cossu 09d09d0fe5 Update README.md 2017-06-29 11:48:11 +01:00
Guido Cossu bf630a6821 README file update 2017-06-29 11:42:25 +01:00
Guido Cossu 8859a151cc Small corrections to the NEON port 2017-06-29 11:30:29 +01:00
Guido Cossu 688a39cfd9 Merge pull request #114 from nmeyer-ur/feature/arm-neon
ARM neon intrinsics support
Guido: checked and approved
2017-06-29 09:57:17 +01:00
paboyle 6f5a5cd9b3 Improved threaded comms benchmark 2017-06-28 23:27:02 +01:00
Nils Meyer 0933aeefd4 corrected Grid_neon.h 2017-06-28 20:22:22 +02:00
Peter Boyle 322f61acee Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-06-28 15:30:35 +01:00
Peter Boyle 08e04b9676 Better benchmarks 2017-06-28 15:30:06 +01:00
portelli feaa2ac947 Merge branch 'feature/scalar-hmc-update' into develop 2017-06-28 12:46:18 +01:00
portelli 07de925127 minor scalar action fixes 2017-06-28 12:45:44 +01:00
Nils Meyer a9c816a268 moved file to correct folder 2017-06-27 21:39:15 +02:00
Nils Meyer e43a8b6b8a removed comments 2017-06-27 20:58:48 +02:00
Nils Meyer bf729766dd removed collision with QPX implementation 2017-06-27 20:32:24 +02:00
Guido Cossu dafb351d38 Merge pull request #120 from paboyle/feature/scalar-hmc-update
Scalar HMC update. 
I agree with the changes.
2017-06-27 16:23:14 +01:00
portelli 0b707b861c Merge branch 'develop' into feature/scalar-hmc-update 2017-06-27 14:40:05 +01:00
portelli 15e87a4607 HDF5 IO fix 2017-06-27 14:39:27 +01:00
portelli 7d7220cbd7 scalar: lambda/4! convention 2017-06-27 14:38:45 +01:00
Lanny91 7d2d5e8d3d Merge branch 'develop' of https://github.com/paboyle/Grid into feature/hadrons 2017-06-26 15:19:46 +01:00
paboyle 54e94360ad Experimental: Multiple communicators to see if we can avoid thread locks in --enable-comms=mpit 2017-06-24 23:10:24 +01:00
portelli 0af740dc15 minor scalar HMC code improvement 2017-06-24 23:04:05 +01:00
portelli d2e8372df3 SU(N) algebra fix (was not working) 2017-06-24 23:03:39 +01:00
paboyle 869b99ec1e Threaded calls to multiple communicators 2017-06-24 10:55:54 +01:00
paboyle 4a29ab0d0a Merge branch 'feature/dwf-multirhs' of https://github.com/paboyle/Grid into feature/dwf-multirhs 2017-06-23 23:10:43 +01:00
paboyle 0165bcb58e Added an update to TODO list 2017-06-23 23:10:24 +01:00
Lanny91 deca1ecc50 Merge branch 'develop' of https://github.com/paboyle/Grid into feature/rare_kaon 2017-06-23 19:35:19 +02:00
portelli 4372d04ad4 Merge pull request #118 from Lanny91/hotfix/bgq
Hotfix/bgq
2017-06-23 16:59:27 +01:00
paboyle 349d75e483 Precision fix 2017-06-23 02:57:59 -07:00
Lanny91 56abbdf4c2 AVX512 integer reduce fix (for non-intel compiler) 2017-06-23 11:09:14 +02:00
Lanny91 af71c63f4c AVX2 fix 2017-06-23 11:03:12 +02:00
paboyle e51475703a Ticking off lots on the TODO list 2017-06-23 09:42:21 +01:00
paboyle 1feddf4ba6 const fixes 2017-06-22 19:32:41 +01:00
paboyle 600d7ddc2e Proof of concept : Multi RHS solver, running independent solves on different ranks 2017-06-22 18:54:34 +01:00
paboyle e504260f3d Able to run a test job splitting into multiple MPI subdomains. 2017-06-22 18:53:11 +01:00
Lanny91 0440d4ce66 Merge branch 'develop' of https://github.com/paboyle/Grid into hotfix/bgq 2017-06-22 17:09:42 +02:00
Lanny91 08b0e472aa Fixed hadrons tests after merge 2017-06-22 16:34:33 +02:00
Lanny91 c11d69787e Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/rare_kaon
# Conflicts:
#	extras/Hadrons/Modules.hpp
#	extras/Hadrons/Modules/MFermion/GaugeProp.hpp
#	extras/Hadrons/modules.inc
#	tests/hadrons/Test_hadrons.hpp
#	tests/hadrons/Test_hadrons_meson_3pt.cc
2017-06-22 16:26:31 +02:00
Lanny91 dc6b2d30d2 Documentation fix 2017-06-22 16:09:45 +02:00
Lanny91 7a3bd5c66c Hadrons: new conserved current contraction test (for regression testing) 2017-06-22 16:06:15 +02:00
Lanny91 18211eb5b1 Hadrons: Fixed test to use new implementation of meson module. 2017-06-22 16:03:59 +02:00
Lanny91 863bb2ad10 Moving overly-specialised code out of Grid 2017-06-22 16:02:15 +02:00
paboyle 5e4bea8f20 Benchmark DWF works 2017-06-22 08:38:54 +01:00
paboyle 6ebf9f15b7 Splitting communicators first cut 2017-06-22 08:14:34 +01:00
paboyle 1d7aa673a4 Include BlockCG by default 2017-06-21 21:08:53 +01:00
paboyle b9104f3072 Block CG 2017-06-21 21:08:03 +01:00
portelli b22eab8c8b Merge commit 'a7d56523abee6c9030fdd9303c79954897b1086f' into feature/hadrons 2017-06-21 18:32:48 +01:00
paboyle a7d56523ab Merge branch 'feature/lanczos-simplify' into develop 2017-06-21 14:03:20 +01:00
paboyle 9e56c65730 Updated TODO list 2017-06-21 14:02:58 +01:00
paboyle ef4f2b8c41 todo update 2017-06-21 09:22:20 +01:00
paboyle e8b95bd35b Clean up finished. Could shrink Lanczos to around 400 lines at a push 2017-06-21 02:50:09 +01:00
paboyle 7e35286860 Simplified lanczos, added Eigen diagonalisation.
Curious if we can deprecate dependencly on BLAS.
Will see when we get 48^3 running on our BG/Q port
2017-06-21 02:26:03 +01:00
paboyle 0486ff8e79 Improved the lancos 2017-06-20 18:46:01 +01:00
portelli 1e8a2e1621 various compatibility fixes after merge 2017-06-20 17:24:55 +01:00
portelli 7587df831a Merge branch 'develop' into feature/hadrons
# Conflicts:
#	lib/qcd/action/scalar/ScalarImpl.h
2017-06-20 15:50:39 +01:00
Azusa Yamaguchi e9cc21900f Block solver complete for staggered. Now stable on mass 0.003 and
gives 8x (!) speed up on Haswell laptop vs. standard CG for 8 RHS solves.

166 iterations vs. 537 iterations so algorithmic gain + 2x in flop rate gain.

Better than a slap in the face with a wet kipper.
2017-06-20 12:37:41 +01:00
Azusa Yamaguchi 0a8faac271 Fix make tests compile 2017-06-19 22:54:18 +01:00
Azusa Yamaguchi abc4de0fd2 No compile make tests fix 2017-06-19 22:03:03 +01:00
portelli b672717096 Test_serialiation update for JSON 2017-06-19 14:38:39 +01:00
portelli 284ee194b1 JSON update 2017-06-19 14:38:15 +01:00
Azusa Yamaguchi cfe3cd76d1 Block solver improvements 2017-06-19 14:04:21 +01:00
Azusa Yamaguchi 3fa5e3109f Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-06-19 14:01:44 +01:00
paboyle 8b7049f737 Improved detectino of usqcdInfo for plaq/linktr 2017-06-19 08:46:07 +01:00
paboyle c85024683e Merge branch 'feature/parallelio' into develop 2017-06-19 01:39:48 +01:00
paboyle 1300b0b04b Update to enable multiple records per file more consistent with SciDAC.
open, close, write records...
2017-06-19 01:01:48 +01:00
paboyle e6d984b484 ILDG tests 2017-06-18 00:13:22 +01:00
paboyle 1d18d95d4f Class name return 2017-06-18 00:13:03 +01:00
paboyle ae39ec85a3 ComplexField defined 2017-06-18 00:12:48 +01:00
paboyle b96daf53a0 Query tensor structures 2017-06-18 00:12:15 +01:00
paboyle 46879e1658 Complex defined in Impl even for gauge. 2017-06-18 00:11:45 +01:00
paboyle ae4de94798 SciDAC I/O support 2017-06-18 00:11:23 +01:00
paboyle 0ab555b4f5 SciDAC I/O and ILDG improvements 2017-06-18 00:11:02 +01:00
paboyle 8e9be9f84f Updates for SciDAC IO 2017-06-18 00:10:42 +01:00
paboyle d572170170 Update for SciDAC 2017-06-18 00:10:20 +01:00
portelli 81b18f843a Merge branch 'feature/scalar_adjointFT' into feature/hadrons
# Conflicts:
#	lib/qcd/action/scalar/ScalarImpl.h
2017-06-16 17:59:55 +01:00
Lanny91 1bd311ba9c Faster sequential conserved current implementation, now compatible with 5D vectorisation & G-parity. 2017-06-16 16:43:15 +01:00
Lanny91 41af8c12d7 Code cleaning for conserved current contractions. Will now be easier to implement mobius conserved current. 2017-06-16 16:38:59 +01:00
Lanny91 a833f88c32 Added missing SIMD integer reduction implementation for AVX, AVX-512, SSE4, IMCI 2017-06-16 15:58:47 +01:00
Lanny91 07b2c1b253 Placeholder precision change functions to allow Grid to compile with QPX (warning: no actual functionality) 2017-06-16 15:04:26 +01:00
Lanny91 735cbdb983 QPX Integer reduction (+ integer reduction test) 2017-06-14 10:55:10 +01:00
Lanny91 2ad54c5a02 QPX exchange support 2017-06-14 10:53:39 +01:00
paboyle 12ccc73cf5 Serialisation no compile fix 2017-06-14 05:19:17 +01:00
Nils Meyer 3d04dc33c6 ARM neon intrinsics support 2017-06-13 13:26:59 +02:00
paboyle e7564f8330 Starting a test for reading an ILDG file. 2017-06-13 12:22:50 +01:00
paboyle 91199a8ea0 openmpi is not const safe 2017-06-13 12:21:29 +01:00
paboyle 0494feec98 Libz dependency 2017-06-13 12:00:23 +01:00
paboyle a16b1e134e gcc 4.9 fix 2017-06-13 10:48:43 +01:00
James Harrison 20e92a7009 QedVFol: Allow output of scalar propagator and vacuum polarisation projected to arbitrary lattice momentum, not just zero-momentum. 2017-06-12 18:27:32 +01:00
Lanny91 5633a2db20 Faster implementation of conserved current site contraction. Added 5D vectorised support, but not G-parity. 2017-06-12 10:41:02 +01:00
Lanny91 2d433ba307 Changed header include guards to match new convention 2017-06-12 10:32:14 +01:00
paboyle 769ad578f5 Odd new error on G++ 49 on travis 2017-06-12 00:41:21 +01:00
paboyle eaac0044b5 Compile fixes 2017-06-12 00:20:49 +01:00
paboyle 56042f002c New files 2017-06-11 23:19:20 +01:00
paboyle 3bfd1f13e6 I/O improvements 2017-06-11 23:14:10 +01:00
James Harrison 42f0afcbfa QedFVol: Output all scalar VP diagrams separately 2017-06-09 18:08:40 +01:00
Azusa Yamaguchi 70ab598c96 Move gfix into utils 2017-06-08 22:22:23 +01:00
Azusa Yamaguchi 1d0ca65e28 Move Gfix into utils 2017-06-08 22:21:50 +01:00
Azusa Yamaguchi 2bc4d0a20e Move code into utils 2017-06-08 22:21:25 +01:00
James Harrison 20ac13fdf3 QedFVol: add ChargedProp as an input to ScalarVP module, instead of calculating scalar propagator within ScalarVP. 2017-06-08 17:43:39 +01:00
portelli 2490816297 Hadrons: rare kaon program removed 2017-06-07 20:11:02 -05:00
portelli 5f55bca378 Hadrons: Quark module renamed MFermion::GaugeProp 2017-06-07 20:10:48 -05:00
James Harrison e38612e6fa QedFVol: Update ScalarVP module for compatibility with new scalar action 2017-06-07 17:42:00 +01:00
James Harrison c2b2b71c5d Merge branch 'feature/qed-fvol' of https://github.com/paboyle/Grid into feature/qed-fvol
# Conflicts:
#	extras/Hadrons/Modules.hpp
#	extras/Hadrons/modules.inc
2017-06-07 16:59:47 +01:00
James Harrison 009f48a904 QedFVol: Add missing factor of 2 in free vacuum polarisation 2017-06-07 16:34:09 +01:00
Lanny91 b8e45ae490 Fixed remaining fermion type aliases after merge. 2017-06-07 16:26:22 +01:00
Lanny91 b35fc4e7f9 Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/rare_kaon
# Conflicts:
#	extras/Hadrons/Global.hpp
#	tests/hadrons/Test_hadrons_rarekaon.cc
2017-06-07 14:38:51 +01:00
Lanny91 60f11bfd72 Removed redundant test module 2017-06-07 12:34:47 +01:00
portelli f6aa82b7f2 Merge branch 'develop' into feature/hadrons 2017-06-06 11:46:33 -05:00
portelli 22749699a3 Fixes after merge and point sink module 2017-06-06 11:45:30 -05:00
Lanny91 8d442b502d Sequential current fix for spacial indices. 2017-06-06 17:06:40 +01:00
Lanny91 e5c8b7369e Boundary condition option in quark actions for hadrons tests. 2017-06-06 14:19:10 +01:00
portelli 0503c028be Merge branch 'feature/qed-fvol' into feature/hadrons (non-trivial conflicts on scalar Impl)
# Conflicts:
#	configure.ac
#	lib/qcd/action/scalar/Scalar.h
2017-06-05 16:37:47 -05:00
Lanny91 c504b4dbad Code cleaning 2017-06-05 15:56:43 +01:00
Lanny91 622a21bec6 Improvements to sequential conserved current test and small bugfix. 2017-06-05 15:55:32 +01:00
Lanny91 eec79e0a1e Ward Identity test improvements and conserved current bug fixes 2017-06-05 11:55:41 +01:00
paboyle 092dcd4e04 MPI I/O only if MPI compiled 2017-06-02 22:50:25 +01:00
Guido Cossu 4a8c4ccfba Test wilson flow, added maxTau for adaptive flow 2017-06-02 17:02:29 +01:00
Guido Cossu 9b44189d5a Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-06-02 16:56:00 +01:00
Guido Cossu 7da4856e8e Wilson flow with adaptive steps 2017-06-02 16:55:53 +01:00
Guido Cossu aaf1e33a77 Adding adaptive integration in the WilsonFlow 2017-06-02 16:32:35 +01:00
paboyle 094c3d091a Improved and RNG's now survive checkpoint 2017-06-02 00:38:58 +01:00
Peter Boyle 4b98e524a0 Roll over to MPI version of I/O 2017-06-01 17:38:18 -04:00
Peter Boyle 1a1f6d55f9 Roll over to MPI IO for parallel IO 2017-06-01 17:37:26 -04:00
Peter Boyle 21421656ab Big changes improving the code to use MPI IO 2017-06-01 17:36:53 -04:00
Peter Boyle 6f687a67cd As local vols increase, use 64 bits for safety 2017-06-01 17:36:18 -04:00
paboyle b30754e762 Merge branch 'feature/parallelio' of https://github.com/paboyle/Grid into feature/parallelio 2017-05-30 23:41:28 +01:00
paboyle 1e429a0d57 Added MPI version 2017-05-30 23:41:07 +01:00
paboyle d38a4de36c Beginning move to MPI IO 2017-05-30 23:40:39 +01:00
paboyle ef1b7db374 Diff comparison check 2017-05-30 23:40:11 +01:00
paboyle 53a9aeb965 Cosmetic only 2017-05-30 23:39:53 +01:00
paboyle e30fa9f4b8 RankCount; need to clean up ambigious ProcessCount 2017-05-30 23:39:16 +01:00
paboyle 58e8d0a10d reverse direction lexico mapping 2017-05-30 23:38:30 +01:00
paboyle 62cf9cf638 Cleaner code 2017-05-30 23:38:02 +01:00
paboyle 0fb458879d Precision safe compile 2017-05-30 23:37:02 +01:00
Peter Boyle 725c513d94 Better MPI3 benchmarking 2017-05-29 16:47:32 -04:00
portelli d8648307ff Merge branch 'develop' into feature/hadrons 2017-05-29 12:58:08 +01:00
portelli 064315c00b Hadrons: mesons gamma list fix 2017-05-29 12:57:33 +01:00
Guido Cossu 7c6cc85df6 Updating WilsonFlow test 2017-05-27 18:03:49 +01:00
Guido Cossu a6691ef87c Merge pull request #110 from Lanny91/feature/hadrons
Hadrons: Fermion boundary conditions can now be set in measurement code.
2017-05-26 16:43:22 +01:00
Lanny91 23135aa58a Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/rare_kaon 2017-05-26 16:00:50 +01:00
Lanny91 8e0ced627a Hadrons: Fermion boundary conditions can now be set in measurement code. 2017-05-26 15:59:15 +01:00
Guido Cossu 0de314870d Faster derivative for WilsonGauge 2017-05-26 14:31:49 +01:00
Guido Cossu ffb91e53d2 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-05-26 12:46:02 +01:00
Guido Cossu f4e8bf2858 Fixing the topological charge. Wilson Flow tested, ok 2017-05-26 12:45:59 +01:00
portelli a74c34315c Bootstrap script fix 2017-05-25 14:27:49 +01:00
paboyle 69470ccc10 Update to do list 2017-05-25 13:41:26 +01:00
paboyle b8b5934193 Attempts to speed up the parallel IO 2017-05-25 13:32:24 +01:00
Guido Cossu 75856f2945 Compilation fix in the Tensor_exp 2017-05-25 12:44:56 +01:00
Guido Cossu 3c112a7a25 Small correction to the general exp definition 2017-05-25 12:09:00 +01:00
Guido Cossu ab3596d4d3 Using Cayley-Hamilton form for the exponential of SU(3) matrices 2017-05-25 12:07:47 +01:00
paboyle a8c10b1933 Use a global-X x Local-Y chunksize for parallel binary I/O.
Gives O(32 x 8 x 18*8*8) chunk size on configuration I/O.

At 150KB should be getting close to packet sizes and 4MB filesystem
block sizes that are reasonably (!?) performant. We shall see once I move
this off my laptop and over to BNL and time it.
2017-05-25 11:43:33 +01:00
Guido Cossu 15e801af3f Fixing a compilation error for generic SIMD 2017-05-19 16:39:36 +01:00
Guido Cossu 0ffc235741 Adding more statistics to the Benchmark_comms. Min and max 2017-05-19 10:55:04 +01:00
Guido Cossu 8e19c99c7d Adding more statistical info in the Benchmark_comms 2017-05-18 19:07:35 +01:00
Guido Cossu a0bc0ad06f Reverting change in Bechmark_comms. Keeping 300 iterations 2017-05-18 17:48:11 +01:00
Guido Cossu a8fb2835ca Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-05-18 14:45:00 +01:00
Guido Cossu bc862ce3ab Fixing an allocation issue in Benchmark_comms 2017-05-18 14:44:56 +01:00
Lanny91 08b314fd0f Hadrons: conserved current test fixes. Axial current tests now also optional. 2017-05-18 13:16:14 +01:00
portelli 22f4feee7b Merge branch 'develop' into feature/scalar_adjointFT 2017-05-17 13:27:13 +02:00
portelli 3f858d6755 Scalar: phi^2 observable 2017-05-17 13:25:14 +02:00
paboyle 3267683e22 Union workaround for g++ 2017-05-17 11:26:18 +01:00
Azusa Yamaguchi f46a67ffb3 No compile issue on clang on mac fixed.
Compiler version was clang++-3.9 under mpicxx
2017-05-17 10:51:01 +01:00
paboyle f7b8383ef5 Half precisoin comms mixed prec test 2017-05-16 14:52:51 +01:00
Guido Cossu 10f2872aae Faster exponentiation for lattice fields 2017-05-15 15:51:16 +01:00
Lanny91 34332fe393 Improvement to sequential conserved current insertion tests 2017-05-12 16:30:43 +01:00
Lanny91 c2010f21ab Added sequential propagator test for gamma matrix insertion 2017-05-12 16:23:01 +01:00
Lanny91 98f610ce53 Reduced code duplication in hadron tests 2017-05-12 16:15:26 +01:00
Lanny91 d44cc204d1 Added test module for sequential gamma matrix insertion 2017-05-12 14:58:17 +01:00
portelli 35fa3d1dfd Merge branch 'master' into feature/scalar_adjointFT 2017-05-12 10:41:39 +01:00
paboyle cd73897b8d Merge branch 'release/v0.7.0' into develop 2017-05-12 01:16:02 +01:00
paboyle c4435e6beb Merge branch 'release/v0.7.0' 2017-05-12 01:15:59 +01:00
paboyle 7a8f6af5f8 Drop verbose compiler predefine check 2017-05-11 12:48:40 +01:00
paboyle 49a5d9bac7 Clang major, minor trailing underscore 2017-05-11 12:25:02 +01:00
paboyle 2b3fdd4a58 Print CXX predefines 2017-05-11 12:05:50 +01:00
paboyle 34502ec471 4.8 dropped as buggy. 2017-05-11 11:43:39 +01:00
paboyle 8a43e88b4f Compiler check early in build 2017-05-11 11:43:06 +01:00
portelli d1ece74137 HMC scalar test: magnetisation measurement 2017-05-11 11:40:44 +01:00
paboyle 238df20370 Still working on the compiler compat checks 2017-05-11 11:30:14 +01:00
paboyle 97a32a6145 Add 4.8 test 2017-05-11 11:24:21 +01:00
paboyle 655492a443 Compiler detection 2017-05-11 11:21:11 +01:00
paboyle 1cab06f6bd Compat checks for compilers 2017-05-11 10:20:24 +01:00
portelli 43c817cc67 Scalar action: const fix 2017-05-11 00:07:17 +01:00
paboyle f8024c262b Update Eigen 2017-05-10 13:30:09 +01:00
Guido Cossu 4cc5f01f4a Small change in the readme about the intel compiler 2017-05-09 15:38:59 +01:00
James Harrison 5cfc0180aa QedFVol: Output free VP along with charged VP. 2017-05-09 12:46:57 +01:00
James Harrison 914f180fa3 QedFVol: Implement exact O(alpha) vacuum polarisation. 2017-05-09 11:46:25 +01:00
Guido Cossu 9c12c37aaf Confirming the fix on the complex boundary conditions 2017-05-09 08:41:29 +01:00
Guido Cossu 806eaa0530 Adding back the IO tests in the list 2017-05-08 22:26:44 +01:00
Guido Cossu 01d0e54594 Merge branch 'release/v0.7.0' into develop 2017-05-08 22:02:51 +01:00
Guido Cossu 5aafa335fe Fixing JSON error for complex numbers 2017-05-08 21:56:44 +01:00
Guido Cossu 8ba0494485 Fixing JSON for complex numbers 2017-05-08 21:41:39 +01:00
Peter Boyle d99d98d9fd Merge branch 'release/v0.7.0' of https://github.com/paboyle/Grid into release/v0.7.0 2017-05-08 15:08:20 -04:00
Peter Boyle 95a017a4ae Relax force constraints to pass in single precision. 2017-05-08 15:06:41 -04:00
paboyle 92f92379e6 Adding olivers test version 2017-05-08 18:42:19 +01:00
paboyle 529e78d43f Restart the v0.7.0 release 2017-05-08 18:20:04 +01:00
paboyle 4ec746d262 Merge branch 'release/v0.7.0' into develop 2017-05-06 18:43:03 +01:00
paboyle 51bf1501fc Merge branch 'release/v0.7.0' 2017-05-06 18:42:50 +01:00
paboyle 66d819c054 More info on gcc bug 2017-05-06 18:42:11 +01:00
paboyle 3f3686f869 formattign 2017-05-06 18:41:27 +01:00
paboyle 26bb829f8c Formatting 2017-05-06 18:40:55 +01:00
paboyle 67cb04fc66 README update 2017-05-06 18:39:54 +01:00
paboyle a40bd68aed Version update 2017-05-06 17:00:14 +01:00
paboyle 36495e0fd2 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-05-06 16:39:27 +01:00
paboyle 93f6c15772 Warning squash 2017-05-06 16:38:58 +01:00
Peter Boyle cb93eeff21 Update README 2017-05-06 16:28:12 +01:00
paboyle c7cc7e6101 Fix 2017-05-06 16:10:09 +01:00
paboyle c349aa6511 DEFINE warning elimination 2017-05-06 16:08:35 +01:00
paboyle 3bae0a2d5c Drop a gcc warning 2017-05-06 15:51:42 +01:00
paboyle c1c7566089 GCC bug work around in 5.0 through 6.2 inclusive. 2017-05-06 15:20:25 +01:00
paboyle 2439999ec8 Warning elimination; drop to -O2 on G++ bad versions 2017-05-06 14:44:49 +01:00
paboyle 1d96f662e3 Fixed 4d fermion gparity force. Put strong tests on make check force tests 2017-05-06 00:46:31 +01:00
paboyle 41d1889941 trusty ubuntu 2017-05-05 21:25:35 +01:00
paboyle 0c3981e0c3 Trying to force recent automake 2017-05-05 21:15:22 +01:00
paboyle c727bd4609 Trying to work around automake version 2017-05-05 21:00:00 +01:00
paboyle db23749b67 Adding travis to make check 2017-05-05 20:42:08 +01:00
paboyle 751f2b9703 Better check and benchmark driving 2017-05-05 19:54:38 +01:00
Guido Cossu 741bc836f6 Exposing support for Ncolours and Ndimensions and JSON input file for the ScalarAction 2017-05-05 17:36:43 +01:00
James Harrison 6cb563a40c QedFVol: Access HVP tensor using a vector<vector<ScalarField>> instead of vector<vector<ScalarField*>> 2017-05-05 17:12:41 +01:00
paboyle 697c0603ce SITMO I/O for NERSC working now bit repro 2017-05-05 16:54:44 +01:00
paboyle 14bedebb11 Source pointed to 2017-05-05 16:17:27 +01:00
Guido Cossu 8546d01a4c Merge branch 'develop' into feature/scalar_adjointFT 2017-05-05 15:47:33 +01:00
paboyle 47b5c07ffb Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-05-05 14:27:02 +01:00
Guido Cossu da86a2bf54 Merge branch 'feature/hmc_generalise' into develop 2017-05-05 14:23:02 +01:00
paboyle c1cb60a0b3 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-05-05 14:22:37 +01:00
Guido Cossu 5ed5b4bfbf Merge branch 'develop' into feature/hmc_generalise 2017-05-05 14:22:33 +01:00
Guido Cossu de84aacdfd Fixing a configure error for the smearing tests 2017-05-05 13:59:10 +01:00
paboyle 2888003765 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-05-05 13:02:24 +01:00
paboyle da06bf5b95 Zmobius force test added 2017-05-05 12:52:45 +01:00
Guido Cossu 20999c1370 Merge branch 'develop' into feature/hmc_generalise 2017-05-05 12:47:17 +01:00
Lanny91 77e0af9c2e Compilation fix after merge - conserved current code not yet operational for vectorised 5D or Gparity Impl. 2017-05-05 12:27:50 +01:00
paboyle 33f0ed1a33 No compile fix 2017-05-05 11:04:30 +01:00
paboyle 50be56433b Delete old and defunct tests 2017-05-04 23:41:16 +01:00
paboyle 43924007db Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-05-04 19:53:41 +01:00
paboyle 78ef10e60f Mobius force improvement 2017-05-04 19:53:21 +01:00
Lanny91 ca1077c560 Merge branch 'develop' of https://github.com/paboyle/Grid into feature/rare_kaon
# Conflicts:
#	lib/qcd/action/fermion/WilsonFermion5D.cc
#	tests/hadrons/Test_hadrons_rarekaon.cc
2017-05-04 16:22:33 +01:00
portelli 679ae98b14 Merge branch 'feature/better-external-library' into develop 2017-05-04 15:42:12 +01:00
paboyle 90f6bc16bb No compile clang fix 2017-05-04 12:15:06 +01:00
Peter Boyle 9b5b639546 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-05-03 20:51:40 -04:00
Peter Boyle 945767c6d8 More info 2017-05-03 20:26:35 -04:00
Peter Boyle 422cdf4979 Some checks 2017-05-03 18:37:38 -04:00
Peter Boyle 38db174f3b Print statement 2017-05-03 18:25:26 -04:00
Peter Boyle 92e364a35f Better reporting in benchmark for MPI3 2017-05-03 15:43:36 -04:00
James Harrison db3837be22 QedFVol: Change “double” to “Real” in ScalarVP.cc 2017-05-03 13:26:49 +01:00
James Harrison 2f0dd83016 Calculate HVP using a single contraction of O(alpha) charged propagators. 2017-05-03 12:53:41 +01:00
portelli 58299b8ba2 Git info separated from version in git-config 2017-05-02 20:04:41 +01:00
portelli 124bf4d829 git ref in config summary 2017-05-02 19:41:01 +01:00
portelli e8e56b3414 Config summary saved in git-config 2017-05-02 19:40:47 +01:00
portelli 89c430136d grid-config program 2017-05-02 19:13:13 +01:00
portelli ea9aef7baa New header for standard headers (was an issue with Remez.h and external compilation) 2017-05-02 18:26:11 +01:00
portelli c9e9e8061d Merge branch 'feature/hadrons' into develop 2017-05-02 18:23:47 +01:00
Guido Cossu 453cf2a1c6 Moving the topological charge outside the HMC related routines 2017-05-02 14:40:12 +01:00
Guido Cossu de7bbfa5f9 Adding ParameterFile option for the HMC 2017-05-02 12:16:16 +01:00
portelli dda8d77c87 Merge branch 'feature/hadrons' into feature/rare_kaon 2017-05-01 17:50:57 +01:00
portelli aa29f4346a Hadrons: weird bus error with recent macOS clang 2017-05-01 17:49:08 +01:00
Guido Cossu 86116dbed6 Adding boundary condition switch (compile time) for the Mobius HMC example 2017-05-01 16:33:11 +01:00
Guido Cossu 7bd31e3f7c Adding external file support in the Mobius example (JSON) 2017-05-01 16:30:24 +01:00
Guido Cossu 74f451715f Fix for Mac compilation on the size_t uint64_t types 2017-05-01 15:12:07 +01:00
Guido Cossu 655be8ed76 Adding tests for the mobius operator 2017-05-01 14:42:16 +01:00
Guido Cossu 4063238943 Adding HMC test file example for Mobius + smearing 2017-05-01 13:44:00 +01:00
Guido Cossu 3344788fa1 Merge branch 'develop' into feature/hmc_generalise 2017-05-01 12:13:56 +01:00
Guido Cossu 62a64d9108 EO support, wip 2017-05-01 11:06:21 +01:00
Lanny91 49331a3e72 Minor improvements to Ward Identity checks 2017-04-28 16:50:17 +01:00
Lanny91 51d84ec057 Bugfixes in Wilson 5D sequential conserved current insertion 2017-04-28 16:49:14 +01:00
Lanny91 db14fb30df Hadrons: overhaul of conserved current test 2017-04-28 16:48:00 +01:00
Lanny91 b9356d3866 Added more complete test of sequential insertion of conserved current. 2017-04-28 16:46:40 +01:00
Guido Cossu 99a73f4287 Correcting the M and Mdag in the clover term 2017-04-28 15:51:05 +01:00
Lanny91 f302eea91e SitePropagator redefined to be a scalar object in TYPE_ALIASES. 2017-04-28 15:27:49 +01:00
Guido Cossu 5553b8d2b8 Clover term compiles, not tested 2017-04-28 15:23:34 +01:00
Lanny91 a6ccbbe108 Conserved current sequential source now registered properly and fixed module inputs. 2017-04-28 10:43:47 +01:00
James Harrison 3ac27e5596 QedFVol: remove unnecessary copies of free propagator from shifted sources in ScalarVP 2017-04-27 14:17:50 +01:00
Peter Boyle 99220f6531 Fixes and better timing 2017-04-26 17:24:11 -04:00
Lanny91 d2003f24f4 Corrected incorrect usage of ExtractSlice for conserved current code. 2017-04-26 17:25:28 +01:00
Lanny91 6299dd35f5 Hadrons: Added test of conserved current code. Tests Ward identities for conserved vector and partially conserved axial currents. 2017-04-26 12:41:39 +01:00
Lanny91 a39daecb62 Removed make_5D const declaration to avoid compilation error 2017-04-26 12:39:07 +01:00
Lanny91 159770e21b Legal Banners added 2017-04-26 09:32:57 +01:00
paboyle 2a6d093749 move the sudo: required to match locatoin on Guido's branch 2017-04-26 09:15:34 +01:00
paboyle c947947fad sudo required suggested by guido 2017-04-26 08:45:36 +01:00
paboyle f555b50547 Merge branch 'feature/half-prec-comms' into develop 2017-04-26 08:43:40 +01:00
paboyle 738c1a11c2 longer nloop 2017-04-26 08:43:20 +01:00
Peter Boyle f8797e1e3e bug fix. works now and great face performance 2017-04-26 03:14:02 -04:00
Peter Boyle fd1eb7de13 Clean implementation of the exterior faces listing only those points on the boudary 2017-04-26 02:34:52 -04:00
Peter Boyle 2ce898efa3 Pretty code 2017-04-26 02:34:25 -04:00
Lanny91 dc5a6404ea Hadrons: modules for testing conserved current contractions and sequential insertion. 2017-04-25 22:08:33 +01:00
Lanny91 44260643f6 First conserved current implementation for Wilson fermions only. Not implemented for Gparity or 5D-vectorised Wilson fermions. 2017-04-25 18:00:24 +01:00
Lanny91 1425afc72f Rare Kaon test fix 2017-04-25 17:26:56 +01:00
James Harrison bd466a55a8 QedFVol: remove charge dependence in chargedProp function of ScalarVP 2017-04-25 10:04:03 +01:00
paboyle ab66bac4e6 Think I'm getting on top of the reduced cost exterior precomputed list of links 2017-04-25 08:50:26 +01:00
paboyle 56277a11c8 Build a list of whats on the surface 2017-04-24 17:06:15 +01:00
Guido Cossu 752048f410 Merge branch 'develop' into feature/clover 2017-04-24 14:41:20 +01:00
paboyle 916e9e1d3e Merge branch 'feature/half-prec-comms' of https://github.com/paboyle/Grid into feature/half-prec-comms 2017-04-24 10:39:19 +01:00
Peter Boyle 5b55867a7a Slightly cheaper Ext assembly 2017-04-24 05:36:11 -04:00
Peter Boyle 3accb1ef89 Debugged assemply split phase with interior suppression 2017-04-23 19:30:19 -04:00
Peter Boyle e3d0e31525 Debugged assemply split phase with interior suppression 2017-04-23 19:29:27 -04:00
Peter Boyle 5812eb8a8c Partially fixed. But the comms-overlap does not work yet. 2017-04-22 18:50:25 -04:00
paboyle 4dd3763294 Use OMP as much as possible 2017-04-22 20:35:20 +01:00
paboyle c429ace748 Cleaner OpenMP use 2017-04-22 20:28:42 +01:00
paboyle ac58565d0a Dangerous rewrite of the assembly. If I make a mistake the debug will be painful. 2017-04-22 19:31:04 +01:00
paboyle 3703b718aa Mark up a table if a given site only receives from itself; including MPI3 splitting info. 2017-04-22 19:28:37 +01:00
paboyle b722889234 Try a better load balancing loop 2017-04-22 19:27:41 +01:00
paboyle abba44a837 Hand unrolled for overlapped comms 2017-04-22 17:45:17 +01:00
paboyle f301be94ce Fixed 2017-04-22 17:42:31 +01:00
Peter Boyle 1d1b225497 Hand unrolled Nc=3 kernels support split phase compute (on-node, off-node). 2017-04-22 09:05:28 -04:00
Peter Boyle 53a785a3dd Fixing the KNL compile 2017-04-22 08:11:51 -04:00
paboyle 736bf3c866 Major rework of stencil. Half precision and MPI3 now working. 2017-04-22 11:33:50 +01:00
paboyle b9bbe5d188 L1p config bg/q 2017-04-22 11:33:09 +01:00
paboyle 3844bcf800 If no f16c instructions supported must use software half precision conversion.
This will also become useful on BG/Q, so will move out from SSE4 into a general area.
Lifted the Eigen half precision from web. Looks sensible, but not extensively regressed
against the intrinsics implementation yet.
2017-04-20 15:30:52 +01:00
paboyle e1a2319d01 Simple compressor moved out of cshift into stencil 2017-04-20 13:18:15 +01:00
paboyle 180c732b4c Move compressors out of Cshift.
Slice iterators would help
2017-04-20 13:17:55 +01:00
paboyle 957a706d0b Useful script 2017-04-20 13:17:44 +01:00
paboyle d2312e9874 Drop compressor entirely from Cshift to only Stencil. 2017-04-20 13:16:55 +01:00
paboyle fc4ab9ccd5 Working half precision comms 2017-04-20 11:20:26 +01:00
paboyle 4a340aa5ca Massive compressor rework to support reduced precision comms 2017-04-20 09:28:27 +01:00
paboyle 3b7de792d5 Type comparison in the traits work 2017-04-18 13:28:04 +01:00
paboyle 557c3fa109 Pretty change 2017-04-18 13:27:38 +01:00
paboyle ec18e9f7f6 Merge branch 'develop' into feature/half-prec-comms 2017-04-18 11:39:39 +01:00
paboyle a839d5bc55 Updated todo list 2017-04-18 11:22:17 +01:00
paboyle de41b84c5c Merge branch 'feature/normHP' into develop 2017-04-18 10:57:21 +01:00
paboyle 8e161152e4 MultiRHS solver improvements with slice operations moved into lattice and sped up.
Block solver requires a lot of performance work.
2017-04-18 10:51:55 +01:00
paboyle 3141ebac10 MultiRHS working, starting to optimise. Block doesn't and I thought it already was; puzzled. 2017-04-17 10:50:19 +01:00
paboyle 7ede696126 Non compile of tests fixed 2017-04-16 23:40:00 +01:00
paboyle bf516c3b81 higher precision reduction variables in norm and inner product 2017-04-15 12:27:28 +01:00
paboyle 441a52ee5d First cut at higher precision reduction 2017-04-15 10:57:21 +01:00
paboyle a8db024c92 Cleaning up the dense matrix and lanczos sector 2017-04-15 08:54:11 +01:00
paboyle a9c22d5f43 Verbose removal 2017-04-14 14:38:49 +01:00
paboyle 3ca41458a3 Fix to no USE_FP16 case 2017-04-14 14:20:54 +01:00
paboyle 9e2d29c644 USE_FP16 macro 2017-04-14 14:17:14 +01:00
Guido Cossu b694996302 adding comments 2017-04-14 13:30:14 +01:00
Peter Boyle 951be75292 Half precision conversion working on AVX512 now too 2017-04-13 17:35:11 +01:00
James Harrison c8e6f58e24 Fix typos in ScalarVP 2017-04-13 17:04:37 +01:00
Peter Boyle b9113ed310 Patches for knl 2017-04-13 12:02:12 -04:00
James Harrison 888988ad37 Merge branch 'feature/qed-fvol' of https://github.com/paboyle/Grid into feature/qed-fvol
# Conflicts:
#	lib/qcd/action/fermion/Fermion.h
2017-04-13 15:54:40 +01:00
portelli 1407418755 Old qed-fvol program build disabled 2017-04-13 15:32:30 +01:00
portelli a6a0da873f Merge branch 'feature/hadrons' into feature/qed-fvol 2017-04-13 15:31:06 +01:00
paboyle 42fb49d3fd Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-04-13 14:12:47 +01:00
paboyle 2a54c9aaab Merge branch 'feature/block-cg' into develop 2017-04-13 14:12:24 +01:00
paboyle 0957378679 Fixing conditional ugly way 2017-04-13 13:47:56 +01:00
paboyle 2ed6c76fc5 Getting multiline if then fi working 2017-04-13 13:43:13 +01:00
paboyle d3b9a7fa14 F16c apparently requires AVX, even if the 128 bit are used.
Seems odd.
2017-04-13 13:19:11 +01:00
paboyle 75ea306ce9 Another try at travis 2017-04-13 13:05:32 +01:00
paboyle 4226c633c4 Default to FP16 off again 2017-04-13 12:51:39 +01:00
paboyle 5a4eafbf7e .travis 2017-04-13 12:50:43 +01:00
paboyle eb8e26018b Travis update for macos 2017-04-13 12:35:11 +01:00
paboyle db5ea001a3 Update to use Xcode 8.3 since -mfp16 causes SIGILL 2017-04-13 12:22:40 +01:00
paboyle 2846f079e5 Predicate tests on fp16 being enabled 2017-04-13 12:08:05 +01:00
paboyle 1d502e4ed6 FP16 optional compile time 2017-04-13 11:55:24 +01:00
paboyle 73cdf0fffe Drop f16c from SSE because of a macos compile error on travis 2017-04-13 11:23:41 +01:00
paboyle 1c25773319 Trap illegal instructions 2017-04-13 10:51:40 +01:00
paboyle c38400b26f Trap signals 2017-04-13 10:35:20 +01:00
paboyle 9c3065b860 Debug flags off again 2017-04-13 10:01:32 +01:00
paboyle 94eb829d08 Align cast fixed for __mm128i gcc complained 2017-04-13 08:40:44 +01:00
paboyle 68392ddb5b Exchange in generic
Precision change in AVX, SSE, AVX512, Generic. QPX still to do.
2017-04-13 08:38:12 +01:00
paboyle cb6b81ae82 Half precision conversion 2017-04-12 19:32:37 +01:00
Lanny91 c382c351a5 Quark test output correction. 2017-04-12 14:36:15 +01:00
Lanny91 af2d6ce2e0 Encapsulated 4D->5D and 5D->4D conversions in separate functions & added corresponding tests. 2017-04-12 14:36:02 +01:00
portelli 90ec6eda0c Rare K test solver name fix 2017-04-10 17:48:58 +01:00
Lanny91 ac1253bb76 Corrected solver in rare kaon test 2017-04-10 17:42:55 +01:00
portelli fe8d625694 Merge commit '5e477ec553aa48d7d19b5a7c45d41acbb3392bcb' into feature/rare_kaon 2017-04-10 17:23:37 +01:00
portelli 53e76b41d2 Merge branch 'develop' into feature/hadrons 2017-04-10 17:00:53 +01:00
portelli 8ef4300412 spurious .dirstamp files removed 2017-04-10 17:00:22 +01:00
portelli 98a24ebf31 The macro “magics” is very intensive for the preprocessor in the measurement code which has numerous serialisable classes. Reducing the number of serialisable fields to 64 (instead of 1024) helps a lot, this is enough for now and can be extended trivially if needed in the future. 2017-04-10 16:58:54 +01:00
James Harrison e4a105a30b Merge branch 'feature/qed-fvol' of https://github.com/paboyle/Grid into feature/qed-fvol 2017-04-10 16:35:01 +01:00
James Harrison 26ebe41fef QedFVol: Implement charged propagator calculation within ScalarVP module 2017-04-10 16:33:54 +01:00
paboyle b12dc89d26 Commenting and clean up 2017-04-10 20:38:20 +09:00
paboyle d80d802f9d MultiRHS solver test 2017-04-10 00:12:12 +09:00
paboyle 3d99b09dba Start of blockCG 2017-04-09 23:42:10 +09:00
paboyle db5f6d3ae3 Verbose fix 2017-04-09 23:41:30 +09:00
paboyle 683550f116 Const args improvement 2017-04-09 23:41:04 +09:00
Lanny91 5e477ec553 Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/rare_kaon 2017-04-07 11:51:09 +01:00
paboyle 55d0329624 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2017-04-07 11:08:14 +09:00
paboyle 86aaa35294 Christoph needs SchurDiagTwoKappa which is mobius specific. 2017-04-07 11:07:40 +09:00
Guido Cossu 363611ae21 Merge branch 'develop' into feature/clover 2017-04-05 16:26:04 +01:00
Guido Cossu 172d3dc93a Correcting names in tests 2017-04-05 16:24:04 +01:00
Guido Cossu 3b8a791e28 Merge branch 'develop' into feature/clover 2017-04-05 16:20:28 +01:00
Guido Cossu 7b03d8d087 Fixing the remaining merge conflicts 2017-04-05 16:17:46 +01:00
Guido Cossu 4b759b8f2a Merge branch 'feature/hmc_generalise' into feature/scalar_adjointFT 2017-04-05 14:50:28 +01:00
Guido Cossu 8c540333d5 Merge branch 'develop' into feature/hmc_generalise 2017-04-05 14:41:04 +01:00
Guido Cossu 6fd82228bf Working on the derivative 2017-04-05 10:51:44 +01:00
paboyle 5592f7b8c1 Creation mode better implementation 2017-04-05 02:35:34 +09:00
paboyle 35da4ece0b UID fix 2017-04-05 02:18:15 +09:00
paboyle 061b15b9e9 Merge branch 'feature/sitmo-skipahead' into develop 2017-04-05 01:24:49 +09:00
Guido Cossu ca6efc685e Merge branch 'develop' into feature/clover 2017-04-04 10:19:02 +01:00
portelli 1e496fee74 Merge branch 'develop' into feature/qed-fvol
# Conflicts:
#	lib/qcd/action/fermion/Fermion.h
2017-04-03 19:02:57 +01:00
portelli ff4e54ef80 Merge branch 'develop' into feature/hadrons 2017-04-03 18:56:21 +01:00
paboyle 561426f6eb Clean up 2017-04-02 23:13:48 +09:00
paboyle 83f6fab8fa Big/Small crush test, and fast SITMO rng init, faster but not ideal
MT and Ranlux init.
2017-04-02 12:10:51 +09:00
paboyle 0fade84ab2 No random device 2017-04-02 00:29:40 +09:00
paboyle 9dc7ca4c3b Sitmo fast init 2017-04-02 00:28:22 +09:00
paboyle 935d82f5b1 sanity checks 2017-04-02 00:27:28 +09:00
paboyle 9cbcdd65d7 No random device seed 2017-04-02 00:26:57 +09:00
paboyle f18f5ed926 Drop random device 2017-04-02 00:26:26 +09:00
paboyle d1d63a4f2d sitmo default 2017-04-02 00:26:05 +09:00
paboyle 7e5faa0f34 Multiple RNGs 2017-04-02 00:25:44 +09:00
paboyle 6af459cae4 Christoph's coefficients. 2017-03-31 17:07:43 +09:00
paboyle 1c4bc7ed38 Debugged staggered conventions 2017-03-31 14:41:48 +09:00
Lanny91 cd1bd921bd Reduced code duplication for Weak Hamiltonian contraction modules 2017-03-30 18:02:14 +01:00
Guido Cossu b8ae787b5e Correcting a simple typo 2017-03-30 11:33:15 +01:00
Guido Cossu fbe2c3b5f9 ]Merge branch 'develop' into feature/clover 2017-03-30 11:18:31 +01:00
Guido Cossu 1ed69816b9 First steps for the force term 2017-03-30 11:14:27 +01:00
Lanny91 fff5751b1a HADRONS: Updated rare kaon test program, including all contractions. Sink smearing still to be implemented. 2017-03-30 10:57:01 +01:00
Lanny91 2c81696fdd HADRONS: 4pt Weak + current disconnected topology (e.g. for rare neutral kaon decays) 2017-03-30 10:37:17 +01:00
Lanny91 c9dc22efa1 HADRONS: Standalone disconnected loop contraction. 2017-03-30 10:33:18 +01:00
Lanny91 0ab04a000f HADRONS: 3pt contraction with gamma insertion between two propagators. 2017-03-30 10:30:58 +01:00
paboyle 93ea5d9468 Pretty code 2017-03-30 15:00:03 +09:00
paboyle 1ec5d32369 Chulwoo's test to zmobius helped me shake out 2017-03-30 13:45:13 +09:00
paboyle 9fd23faadf Pretty layout 2017-03-30 13:44:45 +09:00
paboyle 10e4fa0dc8 Template instantiation improvements 2017-03-30 13:44:25 +09:00
paboyle c4aca1dde4 Conjugate coefficients on adjoint 2017-03-30 13:44:05 +09:00
paboyle b9e8ea3aaa conjugate coefficient on the dagger 2017-03-30 13:43:13 +09:00
paboyle 077aa728b9 Fix the ZMobius (I think) 2017-03-30 13:42:09 +09:00
paboyle a8d83d886e Macro controls 2017-03-30 13:31:34 +09:00
paboyle 7fd46eeec4 Trailing whitespace removal 2017-03-30 13:31:10 +09:00
paboyle e0c4eeb3ec Compiles again 2017-03-30 13:30:45 +09:00
paboyle cb9a297a0a Chulwoo's Zmobius test 2017-03-30 13:30:25 +09:00
paboyle 2b115929dc Small AVX512 asm ifdef patch 2017-03-29 18:51:23 +09:00
paboyle 5c6571dab1 Merge branch 'feature/bgq-asm' into develop 2017-03-29 18:48:55 +09:00
paboyle 417ec56cca Release candidate 2017-03-29 05:45:33 -04:00
paboyle 756bc25008 Verbose header print by default 2017-03-29 04:44:17 -04:00
paboyle 35695ba57a Bug fix in MPI3 2017-03-29 04:43:55 -04:00
paboyle 81ead48850 Log any errors to a file 2017-03-29 04:39:52 -04:00
paboyle d805867e02 Better init 2017-03-28 13:25:05 -04:00
paboyle e55a751e23 Merge branch 'feature/bgq-asm' of https://github.com/paboyle/Grid into feature/bgq-asm 2017-03-28 12:20:12 -04:00
paboyle 358eb75995 Shorten loop 2017-03-28 12:20:02 -04:00
paboyle 98f9318279 Build on AVX2 and MPI passing with clang++ 2017-03-28 23:16:04 +09:00
paboyle 4b17e8eba8 Merge branch 'develop' into feature/bgq-asm
Conflicts:
	lib/qcd/action/fermion/Fermion.h
	lib/qcd/action/fermion/WilsonFermion.cc
	lib/util/Init.cc
	tests/Test_cayley_even_odd_vec.cc
2017-03-28 04:49:30 -04:00
paboyle 75112a632a IO improvements to fail on IO error 2017-03-28 02:28:04 -04:00
paboyle 18bde08d1b Merge branch 'feature/staggering' into develop 2017-03-28 15:25:55 +09:00
James Harrison 9f755e0379 Add functions momD1 and momD2 to ScalarVP 2017-03-27 16:49:18 +01:00
James Harrison 4512dbdf58 Rename module ScalarFV to ScalarVP 2017-03-27 15:02:16 +01:00
James Harrison 483fd3cfa1 Add propagator expansion terms as inputs to ScalarFV 2017-03-27 13:24:51 +01:00
Guido Cossu 3750b9ffee Deleting MPI test for OSX in Travis 2017-03-27 16:53:32 +09:00
Guido Cossu 5e549ebd8b Adding force terms 2017-03-27 16:43:15 +09:00
Guido Cossu fff484eca5 Populating Clover fermions methods 2017-03-27 15:12:57 +09:00
Guido Cossu 5fdc05782b More in the clover fermion class 2017-03-27 10:54:16 +09:00
paboyle d45cd7e677 Adding a simple read of NERSC test 2017-03-26 09:24:26 -04:00
paboyle 4e96679797 Added a bnl log 2017-03-25 09:25:46 -04:00
James Harrison 85516e9c7c Output all terms of scalar propagator separately 2017-03-24 17:13:55 +00:00
James Harrison 0c006fbfaa Add ScalarFV inputs to ScalarFV.hpp 2017-03-24 11:59:09 +00:00
James Harrison 54c10a42cc Add source and emField inputs to ScalarFV module 2017-03-24 11:42:32 +00:00
Guido Cossu a04eb7df5d Starting Clover term 2017-03-24 12:43:28 +09:00
Guido Cossu 4c1ea8677e Small cosmetic changes and vscode gitignore 2017-03-23 14:09:35 +09:00
paboyle fc93f0b2ec Save some code for static huge tlb's. It is ifdef'ed out but an interesting root only experiment.
No gain from it.
2017-03-21 22:30:29 -04:00
paboyle 8c8473998d Average over whole cluster the comm time. 2017-03-21 22:29:51 -04:00
James Harrison ef0fe2bcc1 Added empty ScalarFV module 2017-03-21 11:39:46 +00:00
Guido Cossu 120fb59978 Adding tests for WilsonFlow classes 2017-03-21 16:11:35 +09:00
Guido Cossu fd56b3ff38 Merge branch 'develop' into feature/hmc_generalise 2017-03-21 13:33:41 +09:00
Guido Cossu 0ec6829edc Fixing compilation errors for the WilsonFlow 2017-03-21 13:06:32 +09:00
Guido Cossu 18b7845b7b Adding WilsonFlow smearing 2017-03-21 11:52:05 +09:00
Guido Cossu 3d0fe15374 Added topological charge measurement 2017-03-17 16:14:57 +09:00
Guido Cossu 91886068fe Fixed seg fault for observable modules 2017-03-17 13:59:31 +09:00
Guido Cossu 6d1e9e5f92 Small cleanup of the observables 2017-03-17 11:42:55 +09:00
Guido Cossu b640230b1e Moving hmc observables in a different directory 2017-03-17 11:40:17 +09:00
paboyle e7c36771ed ZMobius prep for asm 2017-03-15 14:23:33 -04:00
Guido Cossu 038b6ee9cd Fixing JSON compilation error 2017-03-16 01:09:24 +09:00
Guido Cossu 38806343a8 Improving efficiency of the force term 2017-03-15 15:16:16 +09:00
Guido Cossu 831ca4e3bf Added Scalar action for fields in the adjoint representation 2017-03-14 14:55:18 +09:00
paboyle 8dc57a1e25 Layout change 2017-03-13 11:11:46 +00:00
paboyle f57bd770b0 Merge branch 'bugfix/dminus' into feature/bgq-asm 2017-03-13 11:11:03 +00:00
paboyle 4ed10a3d06 Merge branch 'develop' into feature/bgq-asm 2017-03-13 11:10:10 +00:00
Peter Boyle dfefc70b57 Merge pull request #93 from Lanny91/hotfix/qpx
Some fixes for QPX and generic SIMD types.
2017-03-13 09:31:26 +00:00
Chulwoo Jung 0b61f75c9e Adding ZMobius CG test 2017-03-13 00:12:43 -04:00
Chulwoo Jung 33edde245d Changing Dminus(Dag) to use full vectors to work correctly 2017-03-12 23:02:42 -04:00
paboyle b64e004555 MPI run fail on macos 2017-03-13 01:59:01 +00:00
paboyle 447c5e6cd7 Z mobius hermiticity correction 2017-03-13 01:30:43 +00:00
paboyle 8b99d80d8c Merge branch 'bgq-asm-shmemfixes' into feature/bgq-asm 2017-03-12 23:30:09 +00:00
Guido Cossu b3dede4dd3 Merge branch 'develop' into feature/hmc_generalise 2017-03-10 23:57:37 +09:00
Guido Cossu 4e34132f4d Correcting modules use in test files 2017-03-10 23:54:53 +09:00
Guido Cossu c07cb10247 Merge branch 'feature/hmc_generalise' of https://github.com/paboyle/Grid into feature/hmc_generalise 2017-03-10 22:37:25 +09:00
Guido Cossu d7767a2a62 Few more tests 2017-03-10 22:33:48 +09:00
Guido Cossu ec035983fd Fixing the implicit integration 2017-03-01 11:56:35 +00:00
paboyle 3901b17ade timeings from BNL 2017-02-28 17:06:45 -05:00
paboyle af230a1fb8 Average the time across the whole machine for outliers 2017-02-28 17:05:22 -05:00
Christopher Kelly 06a132e3f9 Fixes to SHMEM comms 2017-02-28 13:31:54 -08:00
Guido Cossu 596dcd85b2 Auxiliary fields 2017-02-27 13:16:38 +00:00
paboyle 96d44d5c55 Header fix 2017-02-24 19:12:11 -05:00
Guido Cossu 7270c6a150 Integrator works now 2017-02-24 17:03:42 +00:00
Lanny91 7fe797daf8 SIMD vector length sanity checks 2017-02-23 16:49:44 +00:00
Lanny91 486a01294a Corrected QPX SIMD width 2017-02-23 16:47:56 +00:00
paboyle 586a7c90b7 Merge branch 'develop' into feature/bgq-asm 2017-02-23 00:26:59 +00:00
paboyle e099dcdae7 Merge branch 'develop' into feature/bgq-asm 2017-02-23 00:25:29 +00:00
paboyle 4e7ab3166f Refactoring header layout 2017-02-22 18:09:33 +00:00
paboyle aac80cbb44 Bug fix from Chris K 2017-02-22 12:19:09 -05:00
Lanny91 c80948411b Added tRotate function and MaddRealPart struct for generic SIMD, bugfix in MultRealPart and minor cosmetic changes. 2017-02-22 14:57:10 +00:00
Lanny91 95625a7bd1 Use Grid Integer type 2017-02-22 13:09:32 +00:00
Lanny91 0796696733 Emulated integer vector type for QPX and generic SIMD instruction sets. 2017-02-22 12:01:36 +00:00
Peter Boyle f8b9ad7d50 Merge pull request #91 from sunpho84/public_modules_memebers
making public same serializable parameters in HMC Module
2017-02-22 00:53:20 +00:00
Peter Boyle 04a1959895 Merge pull request #90 from sunpho84/liming
adding --with switch to pass lime path
2017-02-22 00:52:53 +00:00
Peter Boyle cc773ae70c Merge pull request #89 from sunpho84/prepend_package_with_grid
Prepending PACKAGE_ with GRID_ in Config.h
2017-02-22 00:52:10 +00:00
Peter Boyle d21c51b9be Merge pull request #88 from sunpho84/pickpoketting
now it is possible to pass {coords list} to a peek or poke
2017-02-22 00:51:33 +00:00
Peter Boyle 597a7b4b3a Merge pull request #81 from edbennett/develop
Fix misleading message: "doxygen-pdf requires doxygen-pdf"
2017-02-22 00:50:59 +00:00
azusayamaguchi 1c30e9a961 Verified 2017-02-21 23:01:25 +00:00
Francesco Sanfilippo 93cc270016 making public same serializable parameters in HMC Module
RNGModuleParameters
GridModuleParameters
2017-02-21 23:11:56 +01:00
Francesco Sanfilippo 29b60f7e1a adding --with switch to pass lime path 2017-02-21 23:09:39 +01:00
Francesco Sanfilippo 041884acf0 Prepending PACKAGE_ with GRID_ in Config.h
Avoid polluting linking progr
2017-02-21 22:51:36 +01:00
Francesco Sanfilippo 15e668eef1 now it is possible to pass {coords list} to a peek or poke 2017-02-21 22:48:38 +01:00
azusayamaguchi bf7e3f20d4 Staggaered fermion optimised version 2017-02-21 14:35:42 +00:00
Guido Cossu 902afcfbaf Adding metric and the implicit steps 2017-02-21 11:30:57 +00:00
paboyle 3ae92fa2e6 Global changes to parallel_for structure.
Move the comms flags to more sensible names
2017-02-21 05:24:27 -05:00
paboyle 3906cd2149 Stencil fix on BNL KNL system 2017-02-20 17:51:31 -05:00
paboyle 5a1fb29db7 Useful debug code info to preserve 2017-02-20 17:49:23 -05:00
paboyle 661fc4d3d1 Debug AVX512 exchange code paths 2017-02-20 17:48:36 -05:00
paboyle 41009cc142 Move excange into the stencil only; keep Cshift fully general 2017-02-20 17:48:04 -05:00
paboyle 37720c4db7 Count bytes off node only 2017-02-20 17:47:40 -05:00
paboyle 1a30455a10 1000 iters on bmark for more accurate timing 2017-02-20 17:47:01 -05:00
Guido Cossu 97a6b61551 Covariant laplacian and implicit integration 2017-02-20 11:17:27 +00:00
paboyle cd0da81196 Merge branch 'feature/bgq-asm' of https://github.com/paboyle/Grid into feature/bgq-asm 2017-02-16 18:52:30 -05:00
paboyle f246fe3304 Improvements to avx for invertible to avoid latent bug 2017-02-16 23:52:44 +00:00
paboyle 8a29c16bde Faster gather exchange 2017-02-16 23:52:22 +00:00
paboyle d68907fc3e Debug temp 2017-02-16 18:51:35 -05:00
paboyle 5c0adf7bf2 Make clang happy with parenthesis 2017-02-16 23:51:33 +00:00
paboyle be3a8249c6 Faster gather 2017-02-16 23:51:15 +00:00
paboyle bd600702cf Vectorise the XYZT face gathering better.
Hard coded for simd_layout <= 2 in any given spread out direction; full generality is inconsistent
with efficiency.
2017-02-15 11:11:04 +00:00
Lanny91 f011bdb869 Fixed overwrite of pminus projection in construction of 4d propagator from 5d. 2017-02-14 14:07:17 +00:00
Guido Cossu bafb101e4f Testing different versions of the Laplacian 2017-02-13 15:38:11 +00:00
Guido Cossu 08fdf05528 Added and tested the covariant laplacian + CG solver 2017-02-13 15:05:01 +00:00
paboyle aca7a3ef0a Optimisation control improvements 2017-02-10 18:22:31 -05:00
Guido Cossu 9e72a6b22e Reverting to Xcode 7.3 2017-02-10 12:57:03 +00:00
Guido Cossu 1c12c5612c Xcode 8.2 for travis 2017-02-10 12:12:01 +00:00
Guido Cossu a8193c4bcb Correcting travis compilation on gcc 2017-02-10 10:59:30 +00:00
Guido Cossu c3d7ec65fa All tests compile. 2017-02-10 10:27:51 +00:00
Guido Cossu 8b6a6c8236 Resolving small merge conflict 2017-02-09 16:20:24 +00:00
Guido Cossu e0571c872b Merge branch 'develop' into feature/hmc_generalise 2017-02-09 16:12:00 +00:00
Guido Cossu c67f41887b Reverting parameters to original 2017-02-09 15:59:56 +00:00
Guido Cossu 84687ccf1f Handling an Intel compiler warning for Json class 2017-02-09 15:33:33 +00:00
Guido Cossu 3274561cf8 Cleanup 2017-02-09 15:18:38 +00:00
portelli e08fbb3771 Merge pull request #84 from Lanny91/feature/rare_kaon
Rare Kaon decay contraction code
2017-02-08 08:23:34 -08:00
Lanny91 d7464aa0fe Switched from XmlWriter to CorrWriter in contraction code 2017-02-08 16:13:44 +00:00
Lanny91 00d29153f0 Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/rare_kaon 2017-02-08 16:11:15 +00:00
portelli 2ce989f220 Hadrons: default I/O to HDF5 2017-02-08 07:50:05 -08:00
Lanny91 d7a1dc85be Revert "Hadrons: test for rare kaon contraction code."
This reverts commit 1e257a1251.
2017-02-08 13:23:05 +00:00
Lanny91 fc19503673 Removed MSink namespace. 2017-02-08 13:17:39 +00:00
Lanny91 beba824136 Make use of GammaL class in Weak Hamiltonian contractions 2017-02-08 12:45:39 +00:00
Lanny91 6ebf8b12b6 Removed unnecessary repeat of write in Weak Hamiltonian contractions 2017-02-08 12:43:13 +00:00
Lanny91 e5a7ed4362 Moved write outside of loop, some physics corrections 2017-02-08 12:29:33 +00:00
Lanny91 b9f7ea47c3 Access hasModule function directly from Environment instance. 2017-02-08 10:10:06 +00:00
Lanny91 06f7ee202e Revert "Add function to say whether or not a module exists in application class"
This reverts commit 522f6bf91a.
2017-02-08 10:08:18 +00:00
Lanny91 2b2fc6453f Fixed single precision compatibility issues 2017-02-07 13:59:29 +00:00
Lanny91 bdd2765461 Added missing allocation of Weak Hamiltonian result vector 2017-02-07 13:06:42 +00:00
paboyle 2c246551d0 Overlap comms and compute options in wilson kernels 2017-02-07 01:37:10 -05:00
paboyle 71ac2e7940 Faster RNG init 2017-02-07 01:33:23 -05:00
paboyle 2bf4688e83 Running on BNL KNL 2017-02-07 01:32:10 -05:00
paboyle a48ee6f0f2 Don't use MPI3_leader any more. No real gain and complex 2017-02-07 01:31:24 -05:00
paboyle 73547cca66 MPI3 working i think 2017-02-07 01:30:02 -05:00
paboyle 123c673db7 Policy to control async or sync SendRecv 2017-02-07 01:24:54 -05:00
paboyle 61f82216e2 Communicator Policy, NodeCount distinct from Rank count 2017-02-07 01:22:53 -05:00
paboyle 8e7ca92278 Debugged cshift case 2017-02-07 01:21:32 -05:00
paboyle 485ad6fde0 Stencil working in SHM MPI3 2017-02-07 01:20:39 -05:00
paboyle 6ea2184e18 OMP define change 2017-02-07 01:17:16 -05:00
paboyle fdc170b8a3 Parallel fors in lattice transfer 2017-02-07 01:16:39 -05:00
paboyle 060da786e9 Comms benchmark improvements 2017-02-07 01:07:39 -05:00
paboyle 85c7bc4321 Bug fixes for cases that physics code couldn't hit but latent
and discovered on KNL (long vector, y SIMD dir) and checker dir set to y.
Remove the assertions on these code paths now they are tested.
2017-02-07 01:01:15 -05:00
paboyle 0883d6a7ce Overlap comms compute support; make reg naming consistent with bgq aasm 2017-02-07 00:59:32 -05:00
paboyle 9ff97b4711 Improved stencil tests passing all on KNL multinode 2017-02-07 00:58:34 -05:00
paboyle b5e9c900a4 Better printing and signal handling options 2017-02-07 00:57:55 -05:00
paboyle 4bbdfb434c Overlap comms compute modifications 2017-02-07 00:57:01 -05:00
Lanny91 4a45c06dd7 Code cleaning and addition of Weak Hamiltonian contraction log message 2017-02-06 20:12:30 +00:00
Lanny91 d6a7d7d1e0 Hadrons: added missing momentum parameter in rare kaon contraction test 2017-02-06 18:15:49 +00:00
Lanny91 1a122a0dd8 Hadrons: corrected gamma matrix inputs in rare kaon test 2017-02-06 17:35:41 +00:00
Lanny91 20e20733e8 Merge branch 'feature/hadrons' into feature/rare_kaon 2017-02-06 14:12:21 +00:00
Lanny91 b7cd1a19e3 Utilities for reading and writing "pair" objects. 2017-02-06 14:08:59 +00:00
Lanny91 f510002a62 Merge remote-tracking branch 'paboyle/feature/hadrons' into feature/hadrons 2017-02-03 14:37:34 +00:00
Christopher Kelly c94133af49 Added iteration reporting to CG and mixed CG
Added ability to manually change the initial CG inner tolerance in mixed CG
Added .hpp files to filelist script
2017-02-02 17:04:42 -05:00
portelli eedcaf6470 Merge branch 'feature/hadrons' into feature/qed-fvol 2017-02-01 15:53:10 -08:00
portelli e7d8030a64 operator>> for serialisable enums 2017-02-01 15:51:08 -08:00
portelli d775fbb2f9 Gammas: code cleaning and gamma_L implementation & test 2017-02-01 15:45:05 -08:00
portelli 863855f46f header fix 2017-02-01 11:59:44 -08:00
portelli 419af7610d New gamma matrices tidying: generated code is confined to Gamma.* for readability 2017-02-01 11:23:12 -08:00
Lanny91 1e257a1251 Hadrons: test for rare kaon contraction code. 2017-02-01 16:36:40 +00:00
Lanny91 522f6bf91a Add function to say whether or not a module exists in application class 2017-02-01 16:36:08 +00:00
Lanny91 d35d87d2c2 Weak Hamiltonian Eye-type contraction execution 2017-02-01 16:33:24 +00:00
Lanny91 74a5cda84b Removed unnecessary "3pt" labels 2017-02-01 15:03:49 +00:00
Lanny91 5be05d85b8 Fixed collision of Wall source and sink header ifdefs 2017-02-01 13:56:22 +00:00
Lanny91 35ac85aea8 Updated Weak Hamiltonian contractions to use zero-flop gamma matrices 2017-02-01 12:57:34 +00:00
Lanny91 fa237401ff Consistent variable name in macro 2017-02-01 12:56:55 +00:00
Lanny91 97053adcb5 Merge branch 'feature/hadrons' into feature/rare_kaon 2017-02-01 10:13:29 +00:00
Lanny91 f8fbe4d7a3 Merge remote-tracking branch 'paboyle/feature/hadrons' into feature/hadrons
# Conflicts:
#	extras/Hadrons/Modules/MContraction/Meson.hpp
#	tests/hadrons/Test_hadrons_meson_3pt.cc

Updated Meson.hpp to utilise zero-flop gamma matrices.
2017-02-01 09:27:00 +00:00
Lanny91 ef31c012bf Merge remote-tracking branch 'paboyle/develop' into feature/hadrons 2017-01-31 17:36:10 +00:00
portelli 7da7d263c4 typo 2017-01-30 10:53:13 -08:00
portelli 1140573027 Gamma adj fix: now in Grid namespace to avoid collisions 2017-01-30 10:53:04 -08:00
Lanny91 9e9f621d5d Hadrons: added Weak Hamiltonian module dependencies, some reformatting. 2017-01-30 17:54:21 +00:00
Lanny91 651e1a7cbc Hadrons: Momentum inserted as multiples of 2*pi/L 2017-01-30 17:14:33 +00:00
portelli a0cfbb6e88 Merge branch 'feature/gammas' into feature/hadrons
# Conflicts:
#	.gitignore
#	lib/qcd/spin/Dirac.cc
#	scripts/filelist
2017-01-30 09:10:49 -08:00
Lanny91 c4d3672720 Hadrons: Momentum projection in meson module. 2017-01-30 17:09:04 +00:00
portelli 515a26b3c6 gammas: copyright update 2017-01-30 09:07:09 -08:00
Guido Cossu 16be6d378c Now action factory support different Fields (templated) 2017-01-30 14:22:41 +00:00
Guido Cossu f05d0565aa Adding ScalarField theory 2017-01-30 10:59:28 +00:00
portelli b39f0d1fb6 Hadrons: default I/O to HDF5 if possible, XML otherwise 2017-01-27 18:12:35 -08:00
portelli 9f1267dfe6 Merge branch 'feature/qed-fvol' of github.com:paboyle/Grid into feature/qed-fvol 2017-01-27 17:06:34 -08:00
portelli 2e90285232 Merge pull request #80 from jch1g10/feature/qed-fvol
ChargedProp: remove ScalarField fs
2017-01-27 17:06:13 -08:00
portelli e254de982e Merge branch 'feature/qed-fvol' of github.com:paboyle/Grid into feature/qed-fvol 2017-01-27 17:02:35 -08:00
portelli 28d99b5297 Merge branch 'develop' into feature/qed-fvol 2017-01-27 16:59:53 -08:00
edbennett c946d3bf3f Merge branch 'develop' of github.com:edbennett/Grid into develop 2017-01-27 22:12:11 +00:00
edbennett 1c68098780 fix misleading message: "doxygen-pdf requires doxygen-pdf" 2017-01-27 22:04:26 +00:00
Lanny91 9bf4108d1f Weak Hamiltonian contraction modules, for Eye and NonEye contraction topologies. Execution for NonEye type diagrams has been implemented, but not yet for Eye type. 2017-01-27 16:58:11 +00:00
Guido Cossu 899e685627 Merge branch 'feature/sitmo_rng' into develop 2017-01-27 14:15:56 +00:00
James Harrison ee93f0218b ChargedProp: remove ScalarField fs 2017-01-27 12:22:48 +00:00
Guido Cossu 6929a84c70 Reformatting files 2017-01-27 11:54:44 +00:00
Guido Cossu 5c779a789b Moving registrations in an independent file 2017-01-27 11:23:51 +00:00
portelli 161ed102a5 Merge pull request #79 from jch1g10/feature/qed-fvol
Fixed bug in ChargedProp
2017-01-26 19:49:14 -08:00
portelli 3bf993d81a gitignore update 2017-01-26 17:00:59 -08:00
portelli fad743fbb1 Build system sanity check: corrected several headers not in the <Grid/*> format 2017-01-26 17:00:41 -08:00
Guido Cossu e863a948e3 Cleaning up files and directories 2017-01-26 15:24:49 +00:00
James Harrison f65a585236 ChargedProp: Switch to HDF5 output 2017-01-26 15:02:30 +00:00
Lanny91 977f34dca6 Added missing typename 2017-01-26 13:18:33 +00:00
Lanny91 90ad956340 Merge branch 'develop' of https://github.com/paboyle/Grid into feature/rare_kaon 2017-01-26 12:08:41 +00:00
Guido Cossu 7996f06335 Commented out registrations.
Move to an independent file that is linked only for the factory managed HMC
2017-01-25 18:27:45 +00:00
Guido Cossu ef8d3831eb Temporary patch the threading error in InsertSlice and ExtractSlice
Find source and fix the error
2017-01-25 18:12:04 +00:00
Guido Cossu 70ed9fc40c Updating the engine to the last version 2017-01-25 18:10:41 +00:00
Guido Cossu 7b40a3e3e5 Reorganizing files 2017-01-25 18:09:46 +00:00
portelli 4d3787db65 Hadrons fixed for new gammas, Meson only does one contraction but this’ll change in the future 2017-01-25 09:59:00 -08:00
Guido Cossu 677757cfeb Added and tested SITMO PRNG 2017-01-25 12:47:22 +00:00
Guido Cossu f7fbbaaca3 Compiles after merging 2017-01-25 12:11:58 +00:00
Guido Cossu 17629b8d9e Merge branch 'develop' into feature/hmc_generalise 2017-01-25 11:33:53 +00:00
Guido Cossu 0baa20d292 Againg fixing compilation on Travis, no LIME lib present 2017-01-25 11:18:44 +00:00
Guido Cossu 4571c918a4 Fixing compilation error when compiling without LIME 2017-01-25 11:14:43 +00:00
Guido Cossu 5251ea4d30 Adding more fermion action modules, generalised DWF 2017-01-25 11:10:44 +00:00
portelli 05cb6d318a gammas: adjoint implemented as a symbolic operation 2017-01-24 18:07:43 -08:00
portelli 0432e30256 Gamma right multiply code fix (now passes consistency check) 2017-01-24 17:36:23 -08:00
portelli 2c3ebc1e07 .gitignore update 2017-01-24 17:35:42 -08:00
portelli 068b28af2d Extensive gamma test program 2017-01-24 17:35:29 -08:00
portelli f7db342f49 Serialisable enums can be converted to int 2017-01-24 17:33:26 -08:00
portelli d65e81518f Merge branch 'feature/hadrons' into develop 2017-01-24 09:21:44 -08:00
Guido Cossu 7f456b4173 👷 Added all pseudofermion actions to the serialiser 2017-01-24 13:57:32 +00:00
portelli a37e71f362 New automatic implementation of gamma matrices, Meson and SeqGamma are broken 2017-01-23 19:13:43 -08:00
James Harrison ae99e99da2 Fixed bug in ChargedProp 2017-01-23 17:27:50 +00:00
Lanny91 c291ef77b5 Merge branch 'feature/hadrons' of https://github.com/paboyle/Grid into feature/hadrons 2017-01-23 15:24:47 +00:00
Lanny91 7dd2764bb2 Wall sink smearing 2017-01-23 15:17:54 +00:00
Guido Cossu 244f8fb6dc Added JSON parser (without NextElement) 2017-01-23 14:57:38 +00:00
azusayamaguchi 05c1924819 Timing loop change 2017-01-23 10:43:45 +00:00
portelli f3ca29af6c Merge branch 'feature/hadrons' into feature/qed-fvol 2017-01-21 13:41:05 -08:00
portelli b7da264b0a Hadrons: Application is not storing the environment ref but calling getInstance() each time, solving a very nasty set fault on Linux/KNL 2017-01-21 13:40:23 -08:00
portelli 37988221a8 Merge branch 'feature/serialisation-hdf5' into feature/qed-fvol 2017-01-20 14:04:20 -08:00
portelli 74ac2aa676 Merge branch 'feature/serialisation-hdf5' into feature/hadrons 2017-01-20 14:03:51 -08:00
portelli 4c75095c61 HDF5: header fix 2017-01-20 12:14:01 -08:00
portelli afa095d33d HDF5: better complex number support 2017-01-20 12:10:41 -08:00
portelli 6b5259cc10 HDF5 detects if a name is a dataset or not without using exception catching 2017-01-20 11:03:19 -08:00
Guido Cossu 27dfe816fa Added TwoFlavorsEO
Had to remove a conformability check in the Derivative of SchurDiff,
see the comments in the file
2017-01-20 16:59:31 +00:00
Lanny91 af29be2c90 Simplified operation of meson module. Result has been modified to output one contraction at a time for each pair of gamma insertions at source and sink. 2017-01-20 16:38:50 +00:00
Guido Cossu f96fac0aee All functionalities ready.
Todo: add all the fermion action modules
2017-01-20 12:56:20 +00:00
portelli 7423a352c5 HDF5: typos 2017-01-19 18:33:04 -08:00
portelli 81e66d6631 HDF5: revert back to native types 2017-01-19 18:24:53 -08:00
portelli ade1058e5f Hdf5Type does not need to be a pointer anymore 2017-01-19 18:23:55 -08:00
portelli 6eea9e4da7 HDF5 types static initialisation is mysteriously buggy on BG/Q, changing strategy 2017-01-19 18:02:53 -08:00
portelli 2c673666da Standardisation of HDF5 types 2017-01-19 17:19:12 -08:00
portelli 7a327a3f28 Merge branch 'develop' into feature/qed-fvol 2017-01-19 14:22:36 -08:00
Lanny91 07f2ebea1b Meson module now takes list of gamma matrices to insert at source and sink. 2017-01-19 22:18:42 +00:00
portelli d6401e6d2c Merge branch 'feature/hadrons' into develop 2017-01-19 14:10:01 -08:00
portelli 24d3d31b01 Genetic scheduler: uses insert instead of emplace for better compiler compatibility 2017-01-19 14:08:22 -08:00
Guido Cossu 851f2ad8ef Adding fermions actions support in the factories 2017-01-19 10:00:02 +00:00
portelli 5405526424 Code typo 2017-01-18 22:42:19 -08:00
portelli f3f0b6fef9 serious rewriting of Test_serialisation, now crashes if IO inconsistent 2017-01-18 17:41:05 -08:00
portelli 654e0b0fd0 Serialisable object are now comparable with == 2017-01-18 17:40:32 -08:00
portelli 4be08ebccc debug code cleaning 2017-01-18 17:39:59 -08:00
portelli f599cb5b17 HDF5 serial IO implemented and tested 2017-01-18 16:50:21 -08:00
Guido Cossu 23e0561dd6 Added all required functionalities, time for cleaning
All actions to be added
2017-01-18 16:31:51 +00:00
portelli a4a509497a Merge branch 'develop' of github.com:paboyle/Grid into develop 2017-01-17 16:22:22 -08:00
portelli 5803933aea First implementation of HDF5 serial IO writer, reader is still empty 2017-01-17 16:21:18 -08:00
Lanny91 8ae1a95ec6 Legal banners and module descriptions 2017-01-17 18:14:20 +00:00
Lanny91 82b7d4eaf0 Added noise loop dependencies 2017-01-17 15:58:32 +00:00
Lanny91 78774fbdc0 Construct loop propagator 2017-01-17 15:29:45 +00:00
Guido Cossu 924130833e Moved more parameters to serialization 2017-01-17 13:22:18 +00:00
Guido Cossu 7cf833dfe9 Fixed compilation error in tests hadrons (capital letter in dir name) 2017-01-17 11:00:54 +00:00
Guido Cossu 0157274762 HMC factories 2017-01-17 10:46:49 +00:00
Guido Cossu 87e8aad5a0 Added support for input file HMC modules (missing the actions yet) 2017-01-16 16:07:12 +00:00
Guido Cossu c6f59c2933 Adding factories 2017-01-16 10:18:09 +00:00
portelli 91a3534054 Lattice slice utilities now thread safe 2017-01-16 06:32:25 +00:00
portelli 16a8e3d0d4 gitignore update for ST3 2017-01-16 06:32:05 +00:00
Lanny91 b7f90aa011 Added momentum choice for wall source 2017-01-13 15:54:19 +00:00
portelli 92f8950a56 Charged scalar prop: cleaning and output 2017-01-13 13:30:56 +00:00
portelli 65987a8a58 First implementation of the scalar QED propagator, runs but absolutely not checked 2017-01-12 20:44:23 +00:00
portelli 889d828bc2 Code cleaning 2017-01-12 18:17:44 +00:00
Lanny91 f22b79da8f Added missing type aliases 2017-01-12 12:52:12 +00:00
Lanny91 3855673ebf Added header for wall source 2017-01-12 11:42:37 +00:00
Lanny91 4db82da0db Wall sources 2017-01-12 11:41:10 +00:00
Lanny91 0cdc3d2fa5 Merge remote-tracking branch 'refs/remotes/paboyle/feature/hadrons' into feature/hadrons 2017-01-12 11:26:55 +00:00
portelli ad98b6193d creating the necessary caches for the FFT EM scalar propagator 2017-01-11 18:40:43 +00:00
portelli fc760016b3 More uniform cache name for scalar momentum propagators 2017-01-11 18:39:58 +00:00
portelli 2da86f7dae Merge branch 'feature/hadrons' into feature/qed-fvol 2017-01-11 18:38:05 +00:00
portelli 41df1db811 Hadrons: number of dimensions entirely determined by the initial grid 2017-01-11 18:37:49 +00:00
Guido Cossu 0dfda4bb90 Working on the RNGModule 2017-01-09 11:06:18 +00:00
Guido Cossu 1189ebc8b5 Cleaning up the checkpointers interface 2017-01-05 15:52:52 +00:00
portelli 97843e2b58 Hadrons: free scalar buffer fix and output 2017-01-05 14:58:55 +00:00
portelli 82b3f54697 scalar free propagator fix 2017-01-05 14:58:07 +00:00
Guido Cossu 1bb8578173 Added module for checkpointers 2017-01-05 13:09:32 +00:00
Peter Boyle c3b6d573b9 Merge branch 'feature/bgq-asm' of https://github.com/paboyle/Grid into feature/bgq-asm 2016-12-30 22:42:17 +00:00
portelli 673994b281 Hadrons: modules for scalar propagators 2016-12-29 22:44:58 +01:00
portelli bbc0eff078 Hadrons: scalar sources 2016-12-29 22:44:22 +01:00
portelli 4c60e31070 Hadrons: code cleaning 2016-12-29 22:44:08 +01:00
portelli afbf7d4c37 QED Gimpl moved in Photon.h 2016-12-29 22:43:38 +01:00
portelli 8c3cc32364 Scalar action 2016-12-29 22:42:58 +01:00
Peter Boyle 1e179c903d Worried about integer; suspect where statements are broken 2016-12-27 17:46:38 +00:00
Peter Boyle 669cfca9b7 No inline 2016-12-27 17:45:40 +00:00
Peter Boyle ff2f559a57 Remove inline on gather optimised path 2016-12-27 17:45:19 +00:00
Peter Boyle 03c81bd902 Merge branch 'feature/bgq-asm' of https://github.com/paboyle/Grid into feature/bgq-asm 2016-12-27 11:25:35 +00:00
Peter Boyle a869addef1 Stats switch off 2016-12-27 11:25:22 +00:00
Peter Boyle 1caa3fbc2d LOCK UNLOCK only 2016-12-27 11:24:45 +00:00
Peter Boyle 3d21297bbb Call the fast path compressor for wilson kernels to avoid if else on projector 2016-12-27 11:23:13 +00:00
Peter Boyle 25efefc5b4 Back to original thread policy post test 2016-12-23 09:49:04 +00:00
Peter Boyle eabf316ed9 BGQ performance ASM 2016-12-22 21:56:08 +00:00
Peter Boyle 04ae7929a3 BGQ or KNL assembler now 2016-12-22 17:53:22 +00:00
Peter Boyle caba0d42a5 L1p controls 2016-12-22 17:52:55 +00:00
Peter Boyle 9ae81c06d2 L1p controls for BG/Q 2016-12-22 17:52:21 +00:00
Peter Boyle 0903c48caa Hot start SU3 2016-12-22 17:51:45 +00:00
Peter Boyle 7dc36628a1 QPX finishing 2016-12-22 17:50:48 +00:00
Peter Boyle b8cdb3e90a Debug hack; raises from 62GF/s to 72 GF/s per node on BG/Q 2016-12-22 17:50:14 +00:00
Peter Boyle 5241245534 Default to static scheduling 2016-12-22 17:49:21 +00:00
Dr Peter Boyle 960316e207 type conversion in printf 2016-12-22 17:27:01 +00:00
Guido Cossu 5214846341 Adding a resource manager 2016-12-22 12:41:56 +00:00
portelli 4c3fd9fa3f stochastic QED field module in Hadrons 2016-12-22 00:29:41 +01:00
portelli 17b3a10d46 stochastic QED: function to cache 1/sqrt(khat^2) 2016-12-22 00:29:19 +01:00
portelli 149a46b92c Merge branch 'feature/hadrons' into feature/qed-fvol 2016-12-22 00:26:43 +01:00
portelli 3215ae6b7e Hadrons: genetic scheduler crashes in multi-thread with 1 module, multi-threading deactivated for now 2016-12-22 00:26:30 +01:00
portelli 7a85fddc7e Hadrons: modification of registration mechanism to allow for persistent caches 2016-12-22 00:25:36 +01:00
Guido Cossu ce1a115e0b Removing redundant arguments for integrator functions, step 1 2016-12-20 17:51:30 +00:00
portelli db9c28a773 qed-fvol: Photon parameter name fix 2016-12-20 12:41:39 +01:00
portelli 9ac3ac41df serialisable Photon parameters 2016-12-20 12:41:01 +01:00
portelli 2af9ab9034 old Makefile cleaning 2016-12-20 12:40:26 +01:00
portelli 6f1ea96293 Merge branch 'develop' into feature/qed-fvol 2016-12-20 12:33:02 +01:00
portelli f8d11ff673 better serialisable enums (can be encapsulated into classes) 2016-12-20 12:31:49 +01:00
paboyle 3f2d53a994 BGQ assembler beginning 2016-12-20 10:21:26 +00:00
paboyle 8a337f3070 Move cayley into mainstream tests 2016-12-18 02:35:31 +00:00
paboyle a59f5374d7 Evade warning 2016-12-18 02:23:55 +00:00
paboyle 4b220972ac Warning fix 2016-12-18 02:14:17 +00:00
paboyle 629f43e36c Return statement needed 2016-12-18 02:09:37 +00:00
paboyle a3172b3455 Precision error 2016-12-18 02:07:45 +00:00
paboyle 3e6945cd65 Fixing AVX Z-mobius 2016-12-18 02:05:11 +00:00
paboyle 87be03006a AVX 512 code broke other compiles; fixing 2016-12-18 01:45:09 +00:00
paboyle f17436fec2 Bad commit fixed 2016-12-18 01:27:34 +00:00
Peter Boyle 4d8b01b7ed Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2016-12-18 00:56:57 +00:00
Peter Boyle fa6acccf55 Zmobius asm 2016-12-18 00:56:19 +00:00
Peter Boyle 55cb22ad67 Z mobius bmark 2016-12-18 00:55:37 +00:00
azusayamaguchi df9108154d Debugged 2 versions of assembler; ls vectorised, xyzt vectorised 2016-12-17 23:47:51 +00:00
azusayamaguchi b3e7f600da Partial implementation of 4d vectorisation assembler 2016-12-16 23:50:30 +00:00
azusayamaguchi d4071daf2a Template specialise 2016-12-16 22:28:29 +00:00
azusayamaguchi a2a6329094 AVX512 only for ASM compilation 2016-12-16 22:03:29 +00:00
azusayamaguchi eabc577940 Assembler possibly working 2016-12-16 16:55:36 +00:00
portelli 2e3c5890b6 qed-fvol: build fix 2016-12-15 20:06:46 +00:00
portelli bc6678732f Merge branch 'feature/hadrons' into feature/qed-fvol
# Conflicts:
#	Makefile.am
#	configure.ac
#	lib/qcd/action/gauge/Photon.h
2016-12-15 19:53:00 +00:00
portelli b10ae00c8a Merge commit '6ad73145bc9754a5f26093eee5a34473ba0cff82' into feature/qed-fvol 2016-12-15 19:48:58 +00:00
portelli 67d72000e7 Hadrons: more legal banner fixes 2016-12-15 18:26:39 +00:00
portelli 80cef1c78f Hadrons: legal banner fix 2016-12-15 18:21:52 +00:00
portelli 91e98b1dd5 Merge branch 'feature/hadrons' into develop 2016-12-15 18:15:56 +00:00
portelli b791c274b0 Revert "AVX: uninitialised variable fix"
This reverts commit c22c3db9ad.
2016-12-15 18:15:35 +00:00
portelli 596dd570c7 Linux linking fix 2016-12-15 12:26:53 +00:00
portelli cad158e42f Hadrons: tests improvement 2016-12-14 19:41:51 +00:00
portelli f63fac0c69 Hadrons: the XML runner can use a precomputed schedule 2016-12-14 19:41:30 +00:00
portelli ab92de89ab Hadrons: utility to schedule a run 2016-12-14 19:41:04 +00:00
portelli 846272b037 Hadrons: option to save and load a schedule 2016-12-14 19:40:36 +00:00
portelli f3e49e4b73 Hadrons: module templates update 2016-12-14 18:19:46 +00:00
portelli decbb61ec1 Hadrons: XML driven program is again a binary installed with Grid 2016-12-14 18:19:24 +00:00
portelli 7e2482aad1 Hadrons: cpde cleaning 2016-12-14 18:04:21 +00:00
portelli e1653a9f94 Hadrons: size fix in DWF module 2016-12-14 18:02:36 +00:00
portelli ea40854e0b Hadrons: type names are demangled 2016-12-14 18:02:18 +00:00
portelli 34df71e755 Hadrons: function to save an application as an XML file 2016-12-14 18:01:56 +00:00
portelli 3af663e17b Hadrons: modules remember their factory registration name 2016-12-14 17:59:45 +00:00
Azusa Yamaguchi 0cd6b1858c Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering 2016-12-14 09:23:22 +00:00
Guido Cossu 0bd296dda4 Adding check of the Dag part in the benchmark 2016-12-14 03:15:09 +00:00
Guido Cossu af0ccdd8e9 Moving output order 2016-12-14 02:02:42 +00:00
portelli c22c3db9ad AVX: uninitialised variable fix 2016-12-13 19:05:58 +00:00
portelli 013e710c7d Hadrons: 3pt function test improvement 2016-12-13 19:04:43 +00:00
portelli 16693bd69d Hadrons: scheduler heuristic benchmark 2016-12-13 19:02:32 +00:00
portelli de8f80cf94 Hadrons: genetic operators improvement 2016-12-13 19:02:05 +00:00
Guido Cossu 2fb92dbc6e Cleaning up previous debug lines 2016-12-13 07:53:43 +00:00
Guido Cossu 5c74b6028b Commit for debugging, lot of IO 2016-12-13 06:35:30 +00:00
Guido Cossu e0be2b6e6c Adding a new tests for the Ls vec CG 2016-12-13 04:59:18 +00:00
Guido Cossu ef72f322d2 consistency of tests 2016-12-13 02:24:20 +00:00
Azusa Yamaguchi 426197e446 Nc=3 2016-12-12 09:10:54 +00:00
Azusa Yamaguchi 99e2c1e666 Kernels options 2016-12-12 09:08:53 +00:00
Azusa Yamaguchi 1440565a10 Decrease verbosity 2016-12-12 09:08:04 +00:00
Azusa Yamaguchi e9f0c0ea39 Staggered kernels options 2016-12-12 09:07:38 +00:00
Guido Cossu 7bc2065113 Adding report at the end of the DWF HMC tests 2016-12-12 04:21:34 +00:00
portelli 4a87486365 Hadrons: a bit of cleaning in the scheduler 2016-12-10 21:14:13 +01:00
Peter Boyle fe187e9ed3 Compiles and passes under ZMobius with assembler 2016-12-10 00:47:48 +00:00
Peter Boyle 0091b50f49 Zmobius working -- not asm yet 2016-12-09 22:51:32 +00:00
Peter Boyle fb8d4b2357 Lots of debug on performance Mobius 2016-12-08 17:28:28 +00:00
Peter Boyle ff71a8e847 Ready for sim 2016-12-08 17:00:32 +00:00
Peter Boyle 83fa038bdf Streaming stores 2016-12-08 16:58:42 +00:00
Peter Boyle 7a61feb6d3 Allocator added with caching for Linux VM subsystem optimisation 2016-12-08 16:58:01 +00:00
Peter Boyle 69ae817d1c Updates for supporting Mobius better 2016-12-08 16:43:28 +00:00
Guido Cossu 2bd4233919 Completed testing of the HMC for Ls vectorised version (on AVX2) 2016-12-07 04:56:37 +00:00
Guido Cossu 143c70e29f Debugged the threaded version. Cleaning up 2016-12-07 04:40:25 +00:00
portelli 51322da6f8 Hadrons: genetic scheduler improvement 2016-12-07 09:00:45 +09:00
portelli 49c3eeb378 Hadrons: more verbose genetic parameters 2016-12-07 08:59:58 +09:00
portelli c56707e003 useless debug message removed 2016-12-07 08:59:20 +09:00
Guido Cossu b812d5e39c Added single threaded version of the derivative for the Ls vectorised DWF 2016-12-06 16:31:13 +00:00
portelli 5b3edf08a4 Hadrons: sequential gamma source 2016-12-06 12:13:19 +09:00
portelli bd1d1cca34 Hadrons: code cleaning 2016-12-06 12:12:59 +09:00
portelli 646b11f5c2 Hadrons: exposing scheduler settings 2016-12-06 12:12:05 +09:00
portelli a683a0f55a Hadrons: meson tests renamed spectrum 2016-12-06 12:11:44 +09:00
portelli e6effcfd95 Hadrons: more contractions in the spectrum test 2016-12-05 17:41:58 +09:00
portelli aa016f61b9 Hadrons: empty baryon contractions 2016-12-05 17:26:57 +09:00
portelli d42a1b73c4 Hadrons: code cleaning 2016-12-05 17:26:36 +09:00
portelli d292657ef7 Hadrons: more module templates 2016-12-05 17:26:17 +09:00
portelli d1f7c6b94e Hadrons: templatisation of the fermion implementation 2016-12-05 16:47:29 +09:00
portelli 7ae734103e Hadrons: namespace macro to tackle GCC 5 bug 2016-12-05 14:29:32 +09:00
Guido Cossu 01480da0a8 Merge branch 'develop' into feature/hmc_generalise 2016-12-05 05:10:27 +00:00
portelli 7a1ac45679 Hadrons: configure.ac Linux typo 2016-12-05 14:00:10 +09:00
portelli 320268fe93 Hadrons: code cleaning 2016-12-05 13:57:34 +09:00
portelli dd6fb140c5 Hadrons: big module reorganisation 2016-12-05 13:53:31 +09:00
portelli 0b4f680d28 Hadrons: meson run test 2016-12-05 11:44:58 +09:00
portelli a69086ba1f Hadrons: application run minor fixes 2016-12-05 11:44:36 +09:00
portelli 7433eed274 Hadrons: module creation fix 2016-12-05 11:44:16 +09:00
portelli ee5b1fe043 Hadrons: freeing object message fix 2016-12-05 09:08:45 +09:00
portelli 1540616b22 Hadrons: integer types cleanup 2016-12-05 08:53:48 +09:00
portelli 8190523e4c Hadrons: type fix in module creation 2016-12-02 11:04:34 +09:00
portelli b5555d85a7 Hadrons: generelalised FImpl for actions 2016-12-02 11:04:15 +09:00
Peter Boyle e27c6b217c Updating 2016-12-01 12:42:53 +00:00
portelli 9ad3d3453e Hadrons is now a library, the previous XML driven program is now a test 2016-12-01 21:36:29 +09:00
Peter Boyle f7a6b8e5ed Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2016-12-01 11:39:52 +00:00
paboyle 6adf35da54 Faster Mobius 2016-12-01 11:39:04 +00:00
portelli d8b716d2cd Hadrons: static initialisation fixed 2016-12-01 15:43:16 +09:00
Peter Boyle cd01c1dbe9 Ls 16 more relevant 2016-11-30 22:11:10 +00:00
James Harrison 6ad73145bc Calculate Wilson loop average over multiple configurations. 2016-11-30 15:17:22 +00:00
paboyle bd0430b34f Serialisation in malloc fixed 2016-11-29 22:27:55 +00:00
Azusa Yamaguchi c097fd041a Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering 2016-11-29 13:44:17 +00:00
Azusa Yamaguchi 77fb25fb29 Push 5d tests 2016-11-29 13:43:56 +00:00
Azusa Yamaguchi 389e0a77bd Staggerd Fermion 5D 2016-11-29 13:13:56 +00:00
paboyle 2f92b4860b Test the full Mooee sector 2016-11-29 00:15:08 +00:00
paboyle 4704f2d009 Actions updated 2016-11-29 00:14:36 +00:00
Guido Cossu ae9688e343 Reporting also the total mflops 2016-11-28 11:37:02 +00:00
portelli 43928846f2 first steps to make Hadrons a library 2016-11-28 16:02:15 +09:00
portelli fabcd4179d Hadrons: propagator type coming from the fermion implementation 2016-11-28 14:02:10 +09:00
portelli a8843c9af6 Code cleaning, the fermion implementation can be sepcified using the macro FIMPL 2016-11-27 16:47:22 +09:00
portelli 7a1a7a685e Merge branch 'feature/fft-opt' into feature/hadrons 2016-11-27 15:32:03 +09:00
Guido Cossu 1e44fd3094 Added some details on the mpi flags for Cray machines 2016-11-26 18:30:53 +00:00
Guido Cossu d8258f0758 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2016-11-26 18:25:32 +00:00
Guido Cossu 6c0cc5676b Adding Eigen.inc to the gitignore 2016-11-26 18:25:12 +00:00
portelli f7293f2ddb Merge pull request #69 from jch1g10/feature/qed-fvol 2016-11-26 07:04:04 +09:00
portelli 11dc0b398b Merge pull request #74 from Lanny91/develop 2016-11-26 07:01:51 +09:00
Lanny91 b18950f776 Added simd real divide test with QPX divide fixes 2016-11-25 13:21:33 +00:00
Lanny91 0acbf77bc6 Add QPX Div structure 2016-11-24 13:24:12 +00:00
portelli 3cdf945d84 Test_fftf fix 2016-11-24 09:10:03 +09:00
portelli 5833f247fa more FFt optimisations 2016-11-24 09:09:48 +09:00
Azusa Yamaguchi 95f43d27ae Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering 2016-11-22 13:49:22 +00:00
Azusa Yamaguchi 668ca57702 Merge branch 'develop' of https://github.com/paboyle/Grid into feature/staggering 2016-11-22 13:49:11 +00:00
portelli a2cffb0304 AVXFMA target fixed 2016-11-21 17:47:18 +01:00
portelli bafbac6ac4 Merge branch 'feature/gen-simd' into develop 2016-11-19 13:45:30 +01:00
portelli 595f1ce371 GEN SIMD build fix 2016-11-19 13:45:12 +01:00
portelli 6d7cde4eb4 README update 2016-11-19 13:17:35 +01:00
portelli 97cddda49e Merge branch 'feature/gen-simd' into feature/doxygen
# Conflicts:
#	Makefile.am
#	configure.ac
2016-11-19 13:11:13 +01:00
portelli 433afd36f5 Makefile rule for simple_* objects 2016-11-19 01:33:13 +01:00
portelli b873504b90 fully generic SIMD 2016-11-19 01:32:39 +01:00
Guido Cossu 62749d05a6 Naming the scalar action 2016-11-17 12:26:20 +00:00
Guido Cossu 3834feb4b7 Adding action names 2016-11-16 16:46:49 +00:00
James Harrison 6b8ee7bae0 Merge branch 'feature/feynman-rules' into feature/qed-fvol 2016-11-15 13:08:08 +00:00
James Harrison 739c2308b5 Set imaginary part of stochastic QED field to zero using real() instead of conjugate(). 2016-11-15 13:07:52 +00:00
Guido Cossu 454302414d Small modif at the test hmc 2016-11-15 12:31:13 +00:00
portelli 042ae5b87c generic 256bits SIMD 2016-11-15 12:16:15 +00:00
James Harrison a71b69389b QedFVol: calculate square Wilson loops up to 10x10 2016-11-14 18:23:04 +00:00
James Harrison d49e502f53 Merge branch 'feature/feynman-rules' into feature/qed-fvol 2016-11-14 18:00:33 +00:00
James Harrison 92ec3404f8 Set imaginary part of stochastic QED field to zero after FFT into position space 2016-11-14 17:59:02 +00:00
James Harrison f4ebea3381 QedFVol: add functions for computing spatial and timelike Wilson loops 2016-11-14 17:51:53 +00:00
James Harrison cf167d0cd1 QedFVol: implement exponentiation of photon field 2016-11-14 17:02:29 +00:00
Guido Cossu 6f8b771a37 Adding date of the last commit 2016-11-10 18:52:00 +00:00
Guido Cossu 4e1ffdd17c Adding git info to the configure output 2016-11-10 18:44:36 +00:00
portelli 1aa695cd78 Hadrons: merge typo 2016-11-10 18:38:30 +00:00
Guido Cossu a783282b8b Merge branch 'develop' into feature/hmc_generalise 2016-11-10 18:13:07 +00:00
Guido Cossu 19b85d8486 Some comments in the hmc files 2016-11-10 17:55:58 +00:00
paboyle 58f4950652 Merge branch 'release/v0.6.0' into develop 2016-11-09 12:44:00 +00:00
paboyle c363bdd784 Merge branch 'release/v0.6.0' 2016-11-09 12:43:14 +00:00
paboyle 604f0ea2f6 Merge branch 'develop' into release/v0.6.0 2016-11-09 04:13:01 -08:00
paboyle 42c912f608 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2016-11-09 04:12:15 -08:00
paboyle 33dc1f51b5 Final sign off commits from Cori-1 2016-11-09 04:11:03 -08:00
James Harrison c30d96ea50 QedFVol: x86intrin.h namespace fix 2016-11-09 11:06:20 +00:00
portelli 13a8997789 Merge branch 'release/v0.6.0' into feature/hadrons
# Conflicts:
#	Makefile.am
2016-11-08 20:43:39 +00:00
portelli 9576f0903d namespace fix 2016-11-08 19:07:47 +00:00
portelli 34cf702b24 README is now a symlink to README.md 2016-11-08 17:00:38 +00:00
portelli 8a5e3a917c Merge branch 'develop' into release/v0.6.0
# Conflicts:
#	tests/core/Test_fft_gfix.cc
2016-11-08 16:53:42 +00:00
portelli 65bcf281d0 Merge branch 'develop' of github.com:paboyle/Grid into develop 2016-11-08 16:51:19 +00:00
portelli cd0be8cb24 Test_fft_gfix.c precision fix 2016-11-08 15:32:05 +00:00
portelli 3d2a22a14d include fix for MKL 2016-11-08 15:31:47 +00:00
James Harrison 7ffe17ada1 Merge branch 'feature/feynman-rules' into feature/qed-fvol 2016-11-08 14:52:43 +00:00
portelli a26adfb090 README: only markdown 2016-11-08 14:11:18 +00:00
portelli f6e1a5b348 building tests depends on building the library at the top level 2016-11-08 14:08:33 +00:00
portelli c5a025d421 README typo 2016-11-08 14:07:59 +00:00
azusayamaguchi 50d277d8d9 Merge branch 'develop' into release/v0.6.0 2016-11-08 13:44:14 +00:00
azusayamaguchi 343f3e829f Fixes prerelease to make all tests 2016-11-08 13:42:12 +00:00
azusayamaguchi f85b35314d Fix a routine for single node processor coor from rank 2016-11-08 11:49:13 +00:00
azusayamaguchi 3dc2e05d6e Time as well since MKL returns zero for Mflops 2016-11-08 11:36:18 +00:00
azusayamaguchi 0cff8754d1 Usecs 2016-11-08 11:35:41 +00:00
Guido Cossu afc8d3e524 Adding support for parallel recursive compilation for the tests 2016-11-07 11:13:43 +00:00
azusayamaguchi 692b44dac1 Merge branch 'develop' into release/v0.6.0 2016-11-04 22:48:11 +00:00
azusayamaguchi 96ba42a297 omm buf 2016-11-04 22:47:25 +00:00
portelli 7df940dc3e homemade test recusrive target for old autotools versions 2016-11-04 22:32:25 +00:00
azusayamaguchi f7b60004f3 Merge branch 'develop' into release/v0.6.0 2016-11-04 16:08:07 +00:00
portelli 8af8b047fd tests is now a recusrsive target 2016-11-04 13:44:21 +00:00
portelli 6592078fef Make.inc removed, once again don't commit it! 2016-11-04 13:43:40 +00:00
portelli ad971ca07b fftw3.h is now expected to be an external header 2016-11-04 13:12:35 +00:00
portelli f2f16eb972 fftw3.h removed, please don't commit this file back 2016-11-04 13:11:05 +00:00
azusayamaguchi b7d55f7dfb Fix a typo in reorg of the --dslash-asm 2016-11-04 11:35:08 +00:00
azusayamaguchi 6e548a8ad5 Linux compile needed 2016-11-04 11:34:16 +00:00
Azusa Yamaguchi ee686a7d85 Compiles now 2016-11-03 16:58:23 +00:00
Azusa Yamaguchi 1c5b7a6be5 Staggered phases first cut, c1, c2, u0 2016-11-03 16:26:56 +00:00
portelli a5dd4a9bab Merge branch 'feature/fft-opt' into develop 2016-11-03 14:34:46 +00:00
portelli ec232af851 Photon.h references removed 2016-11-03 14:34:16 +00:00
portelli 17e30281e9 Merge branch 'develop' into feature/fft-opt
# Conflicts:
#	lib/FFT.h
2016-11-03 14:14:03 +00:00
portelli 2854e601e6 FFT test typo 2016-11-03 14:09:47 +00:00
portelli aee44dc694 Photon.h removed from develop branch 2016-11-03 13:54:15 +00:00
portelli 75bbf6a0af Merge branch 'develop' into feature/feynman-rules 2016-11-03 13:52:11 +00:00
portelli c65d23935a README update 2016-11-03 13:48:20 +00:00
portelli 92cd797636 MPI auto configure fix 2016-11-03 13:48:07 +00:00
paboyle 111bfbc6bc notimestamp by default 2016-11-03 11:40:26 +00:00
paboyle f41a230b32 Decrease mpi3l verbose 2016-11-02 19:54:03 +00:00
paboyle c067051d5f Merge branch 'develop' into release/v0.6.0 2016-11-02 13:59:18 +00:00
paboyle afdeb2b13c Merge branch 'feature/mpi3-master-slave' into develop 2016-11-02 13:43:20 +00:00
paboyle 9e2ec2719b Merge branch 'develop' into feature/mpi3-master-slave 2016-11-02 13:02:56 +00:00
paboyle 757a928f9a Improvement to use own SHM_OPEN call to avoid openmpi bug. 2016-11-02 12:37:46 +00:00
Guido Cossu bc248b6948 Merge branch 'release/v0.6.0' into feature/KNL_double_prec
Conflicts:
	lib/simd/Grid_avx512.h
2016-11-02 10:40:49 +00:00
Guido Cossu ae8561892e Eliminating useless defines 2016-11-02 10:21:06 +00:00
paboyle 32375aca65 Semaphore sleep/wake up on remote processes. 2016-11-02 09:27:20 +00:00
paboyle bb94ddd0eb Tidy up of mpi3; also some cleaning of the dslash controls. 2016-11-02 08:07:09 +00:00
portelli 330a9b3f4c Merge pull request #65 from jch1g10/feature/qed-fvol 2016-11-01 19:53:25 +00:00
portelli c2d78493c8 Merge pull request #64 from jch1g10/feature/feynman-rules 2016-11-01 19:53:08 +00:00
James Harrison 28ff66a381 Merge branch 'feature/feynman-rules' into feature/qed-fvol 2016-11-01 16:07:46 +00:00
James Harrison 78c7bcee36 QedFVol: Change variables of type "double" to type "Real". 2016-11-01 16:06:05 +00:00
James Harrison 7f0fc0eff5 Remove explicit use of double-precision types in photon.h 2016-11-01 16:02:35 +00:00
Azusa Yamaguchi 164d3691db Staggered 2016-11-01 14:24:22 +00:00
paboyle 791cb050c8 Comms improvements 2016-11-01 11:35:43 +00:00
portelli 00a7b95631 Merge remote-tracking branch 'gh-james/feature/qed-fvol' into feature/qed-fvol 2016-10-31 18:46:23 +00:00
portelli 94d8321d01 Merge branch 'feature/feynman-rules' into feature/qed-fvol 2016-10-31 18:41:30 +00:00
portelli d5e95bc350 Merge branch 'release/v0.6.0' into feature/feynman-rules 2016-10-31 18:36:21 +00:00
portelli 6efac3a252 Merge pull request #61 from jch1g10/feature/feynman-rules
Add missing volume factor in stochastic QED field
2016-10-31 18:35:22 +00:00
portelli 7a84906b5f Merge branch 'release/v0.6.0' into feature/fft-opt 2016-10-31 18:31:49 +00:00
portelli 07416e4567 README update 2016-10-31 18:21:52 +00:00
portelli 66d832c733 FFTW header fix 2016-10-31 16:39:29 +00:00
portelli e74417ca12 big build system polish 2016-10-31 16:31:27 +00:00
portelli 7bd0084b5d Merge branch 'develop' into release/v0.6.0 2016-10-31 16:30:22 +00:00
Guido Cossu e8c3174ae2 Small change in the defines 2016-10-30 12:23:11 +00:00
Guido Cossu 9b066e94d0 Compilation with both single and double precision 2016-10-30 12:04:06 +00:00
James Harrison ac24cc9f99 Merge branch 'feature/feynman-rules' into feature/qed-fvol 2016-10-29 11:05:26 +01:00
James Harrison 618abdf302 Add missing volume factor in stochastic QED field 2016-10-29 11:04:02 +01:00
Guido Cossu e1042aef77 First version of the doube prec for testing purposes
It does not compile single and double version at the same time
2016-10-28 17:20:04 +01:00
paboyle aa6a839c60 avx512 build fix; detect clang/gcc intrinsics vs. ICPC 2016-10-28 09:13:09 +01:00
portelli ac99a56237 Merge branch 'develop' into release/v0.6.0 2016-10-27 11:53:24 +01:00
portelli b4d2af8c89 threaded FFT 2016-10-26 19:46:36 +01:00
portelli 434af6aeaa Merge branch 'develop' into feature/fft-opt 2016-10-26 18:50:38 +01:00
portelli e90f8ac841 Merge branch 'develop' into feature/feynman-rules 2016-10-26 18:50:21 +01:00
portelli a1705a8d53 debug message removed 2016-10-26 18:50:07 +01:00
portelli ca21003f01 Merge branch 'feature/fft-opt' into feature/feynman-rules
# Conflicts:
#	lib/FFT.h
#	lib/qcd/action/fermion/WilsonFermion5D.h
#	tests/core/Test_fft.cc
2016-10-26 18:44:47 +01:00
portelli 14ddf2c234 more FFT optimisations 2016-10-26 17:36:26 +01:00
Guido Cossu 1d666771f9 Debugging the RNG, eliminate the barrier after broadcast 2016-10-26 16:08:23 +01:00
Guido Cossu d50055cd96 Making the ILDG support optional 2016-10-26 09:48:01 +01:00
Azusa Yamaguchi bca861e112 Note:FFT shoud be GridFFT (Not change yet).
Gauge fix with FFt is added (tests/core)
2016-10-25 14:21:48 +01:00
James Harrison 3ab4c8c0bb QedFVol: calculate plaquette and 2x2 Wilson loop of stochastic QED field 2016-10-25 13:32:02 +01:00
portelli 33d199a0ad temporary thread safety in FFT 2016-10-25 12:56:40 +01:00
paboyle 93896ce59e Roll version number 2016-10-25 06:12:49 +01:00
paboyle b1508e4124 Merge branch 'feature/mpi3' into develop 2016-10-25 06:06:36 +01:00
paboyle b820076b91 Merge branch 'develop' into feature/mpi3 2016-10-25 06:02:33 +01:00
paboyle 09f66100d3 MPI 3 compile on non-linux 2016-10-25 06:01:12 +01:00
azusayamaguchi d7d92af09d Travis fail fix attempt 2016-10-25 01:45:53 +01:00
azusayamaguchi 460d0753a1 Merge branch 'develop' into feature/mpi3
Conflicts:
	lib/simd/Grid_avx512.h
2016-10-25 01:08:51 +01:00
azusayamaguchi 8f8058f8a5 More random bits on parallel seeding 2016-10-25 01:05:52 +01:00
azusayamaguchi d97a27f483 Verbose 2016-10-25 01:05:31 +01:00
azusayamaguchi 7c3363b91e Compiles all comms targets 2016-10-25 00:04:17 +01:00
azusayamaguchi b94478fa51 mpi, mpi3, shmem all compile.
mpi, mpi3 pass single node multi-rank
2016-10-24 23:45:31 +01:00
Guido Cossu 47c7159177 ILDG reader/writer works
Fill the xml header with the required information, todo.
2016-10-24 21:57:54 +01:00
portelli 13bf0482e3 FFT optimisation 2016-10-24 19:25:40 +01:00
portelli a795b5705e memory optimisation 2016-10-24 19:25:15 +01:00
portelli 392e064513 fast local peek-poke 2016-10-24 19:24:21 +01:00
azusayamaguchi b6a65059a2 Update to use shared memory to contain the stencil comms buffers
Tested on 2.1.1.1 1.2.1.1 4.1.1.1 1.4.1.1 2.2.1.1 subnode decompositions
2016-10-24 17:30:43 +01:00
Guido Cossu f415db583a Adding ILDG format 2016-10-24 15:48:22 +01:00
Guido Cossu f55c16f984 Adding a barrier in the RNG save 2016-10-24 11:02:14 +01:00
azusayamaguchi ea25a4d9ac Works 2016-10-23 06:10:05 +01:00
azusayamaguchi c190221fd3 Internal SHM comms in non-simd directions working
Need to fix simd directions
2016-10-22 18:14:27 +01:00
Guido Cossu df67e013ca More debug output for the RNG 2016-10-22 13:34:17 +01:00
Guido Cossu 3e990c9d0a Reverting the broadcast change 2016-10-22 13:26:43 +01:00
Guido Cossu 4b740fc8fd Debugging the RNG state save 2016-10-22 13:06:00 +01:00
azusayamaguchi 0fcd2e7188 Simplify the comms structure prior to implementing Shared memory direct bouncs 2016-10-21 22:44:10 +01:00
azusayamaguchi 910b8dd6a1 use simd type 2016-10-21 22:35:29 +01:00
azusayamaguchi 75ebd3a0d1 Typo fixes and rotate for CLANG 2016-10-21 22:34:29 +01:00
Guido Cossu cccd14b09e Small cleanup 2016-10-21 17:20:54 +01:00
Guido Cossu e6acffdfc2 Fixing the plaquette computation 2016-10-21 16:06:34 +01:00
portelli 26d124283e Merge branch 'feature/feynman-rules' into feature/qed-fvol 2016-10-21 15:23:31 +01:00
portelli 0d889b7041 QedFVol: first attempt at generating a QED field 2016-10-21 15:21:32 +01:00
portelli 7c8f79b147 more stochastic QED fixes 2016-10-21 15:20:12 +01:00
azusayamaguchi 09fd5c43a7 Reasonably fast version 2016-10-21 15:17:39 +01:00
portelli ab31ad006a Merge branch 'feature/feynman-rules' into feature/qed-fvol 2016-10-21 14:42:18 +01:00
portelli 462921e549 QED: fix stochastic field 2016-10-21 14:41:08 +01:00
Guido Cossu 392130a537 Working on the 5d 2016-10-21 14:22:25 +01:00
azusayamaguchi f22317748f Merge branch 'feature/mpi3' of https://github.com/paboyle/Grid into feature/mpi3 2016-10-21 13:36:35 +01:00
azusayamaguchi 6a9eae6b6b Reporting improvements 2016-10-21 13:36:18 +01:00
azusayamaguchi fad96cf250 StencilBufs 2016-10-21 13:36:00 +01:00
azusayamaguchi f331809c27 Use variable type for loop 2016-10-21 13:35:37 +01:00
portelli bd6a228af6 Merge commit '20a091c3eddfdb67a82ece6413740a93650a2f98' into feature/feynman-rules 2016-10-21 13:10:30 +01:00
portelli 63d219498b first (dirty) implementation of Feynman stoctachtic EM field 2016-10-21 13:10:13 +01:00
paboyle 2c54a53d0a Compile verbose reduce 2016-10-21 12:12:14 +01:00
paboyle 306160ad9a bcopy threaded 2016-10-21 12:07:28 +01:00
azusayamaguchi 20a091c3ed Intel vs. Clang intrinsics differences absorbed 2016-10-21 09:08:36 +01:00
azusayamaguchi 202078eb1b Cray / OpenSHMEM ordering differs 2016-10-21 09:07:20 +01:00
paboyle a762b1fb71 MPI3 working with a bounce through shared memory on my laptop.
Longer term plan: make the "u_comm_buf" in Stencil point to the shared region and avoid the
send between ranks on same node.
2016-10-21 09:03:26 +01:00
Guido Cossu deef2673b2 Separating the Lattice theories stub from the QCD.h file 2016-10-20 17:24:08 +01:00
paboyle 5b5925b8e5 Forgot to add 2016-10-20 17:09:40 +01:00
Guido Cossu 977b0a6dd9 Merge branch 'develop' into feature/hmc_generalise 2016-10-20 17:04:41 +01:00
Guido Cossu 977d844394 Few modifications on stdout messages 2016-10-20 17:01:59 +01:00
paboyle b58adc6a4b commVector 2016-10-20 17:00:15 +01:00
paboyle f9d5e95d72 allocator template typedefs moved to AlignedAllocator 2016-10-20 16:59:39 +01:00
paboyle 4f8e636a43 commVector 2016-10-20 16:59:16 +01:00
paboyle 9b39f35ae6 commVector different for SHMEM compat 2016-10-20 16:58:53 +01:00
paboyle 5fe2b85cbd MPI3 and shared memory support 2016-10-20 16:58:01 +01:00
paboyle c7cccaaa69 Comm vector for shmem 2016-10-20 16:57:31 +01:00
paboyle cbcfea466f MPI3 2016-10-20 16:57:14 +01:00
paboyle 4955672fc3 MPI3 2016-10-20 16:57:00 +01:00
paboyle 39f1c880b8 mpi3 2016-10-20 16:56:40 +01:00
paboyle 8c043da5b7 SHMEM and comms allocator made different 2016-10-20 16:56:05 +01:00
paboyle 3cbe974eb4 Layout 2016-10-20 16:55:21 +01:00
portelli 6e4a06e180 qed-fvol: initial commit 2016-10-20 15:04:00 +01:00
portelli 997fd882ff Merge branch 'develop' into feature/feynman-rules
# Conflicts:
#	lib/Threads.h
#	lib/qcd/action/fermion/WilsonFermion.cc
#	lib/qcd/action/fermion/WilsonFermion.h
#	lib/qcd/utils/SUn.h
#	lib/simd/Grid_avx.h
#	lib/simd/Intel512common.h
2016-10-19 18:35:18 +01:00
Guido Cossu 590675e2ca Csum in hex format 2016-10-19 17:26:25 +01:00
Guido Cossu 8c65bdf6d3 Printing checksum for the RNG file 2016-10-19 16:56:11 +01:00
Guido Cossu 74f1ed3bc5 Adding some documentation for HMC 2016-10-19 10:51:13 +01:00
paboyle 7af9b87318 Cache face tables to improve performance.
Extract merge now looking poor.
2016-10-18 09:51:37 +01:00
paboyle 811ca45473 GNU clang hack for AVX512 since there are missing reduce intrinsics in Clang 3.9 and GCC-6 AVX512 support 2016-10-17 16:23:21 +01:00
paboyle bc1a4d40ba Faster integer handling avoid push_back 2016-10-17 16:16:44 +01:00
Guido Cossu 79270ef510 Added a test for EODWF Scaled Shamir with general HMC 2016-10-14 17:34:26 +01:00
Guido Cossu e250e6b7bb Moving parameters outside of the HMCrunner 2016-10-14 17:22:32 +01:00
paboyle c8079e6621 Time the face gateher in x-dir more carefully 2016-10-13 22:28:50 +01:00
Guido Cossu 261342c15f Adding gh-pages 2016-10-13 11:51:25 +01:00
azusayamaguchi 8b0d171c9a 32bit issue on the KNL code variant where byte offsets were stored 2016-10-12 17:49:32 +01:00
azusayamaguchi 1f293b76b4 Merge branch 'feature/knl-stats' into develop 2016-10-12 13:47:58 +01:00
azusayamaguchi 8bbd9ebc27 Reversing changes to Stencil class 2016-10-12 13:47:20 +01:00
azusayamaguchi 6472b431f0 __rdpmc needed for gcc, clang++ 2016-10-12 12:29:08 +01:00
azusayamaguchi bd205a3293 Fixing for non x86 and non KNL 2016-10-12 12:09:15 +01:00
azusayamaguchi 496beffa88 Fix non-KNL build 2016-10-12 12:06:08 +01:00
azusayamaguchi 9b63e97108 align not absolutely required and confuses clang++ 2016-10-12 11:51:21 +01:00
azusayamaguchi 81f2aeaece KNL streaming stores, and KNL performance coutners 2016-10-12 11:45:22 +01:00
paboyle 2d4a45c758 Typecast pointer 2016-10-12 09:14:15 +01:00
paboyle a123dcd7e9 Static required for shmem. Reading same object twice requires csum reset 2016-10-12 00:29:57 +01:00
paboyle 6b27c42dfe Cosmetic 2016-10-12 00:29:39 +01:00
paboyle f7c2aa3ba5 runtime by default 2016-10-12 00:29:13 +01:00
paboyle 0f182f033b Drop macos with gcc 2016-10-11 22:29:06 +01:00
paboyle 7240d73184 Parallelise the x faces; fix the segv on KNL with comms 2016-10-11 22:21:07 +01:00
paboyle 42cd148f5e Base pointer for comms buffer under AVX512 assembly 2016-10-11 16:06:06 +01:00
Guido Cossu eda4dd622e Some more edit 2016-10-11 15:45:20 +01:00
paboyle 6e01264bb7 don't use static by default 2016-10-11 10:03:39 +01:00
paboyle 6f408256bc FMA4 option moved on the align 2016-10-11 10:03:01 +01:00
paboyle 8d11681aac verbose remove 2016-10-10 23:50:42 +01:00
paboyle 3d5c9a1ee9 No compile fix on clang++ 3.9 2016-10-10 23:50:13 +01:00
paboyle db749f103f Add Wilson, DWF, Overlap feynman rule tests 2016-10-10 23:48:35 +01:00
paboyle dc389e467c axpy_ssp for any coeff type via template 2016-10-10 23:48:05 +01:00
paboyle 3619167d62 Mass parameter 2016-10-10 23:47:33 +01:00
paboyle 96f1d1b828 Debugged Domain wall and Overlap feynman rules (infinite Ls, finite mass). 2016-10-10 23:46:45 +01:00
paboyle 657e0a8f4d Mass parameter 2016-10-10 23:46:10 +01:00
paboyle 616e7cd83e Mass parameter 2016-10-10 23:45:48 +01:00
paboyle 6f26d2e8d4 Overlap tree level feynman rule 2016-10-10 23:45:18 +01:00
paboyle c014574504 A "please implement me" feynman rule. If this were abstract virtual it would
require/force implementation
2016-10-10 23:44:00 +01:00
paboyle d7ce164e6e Feynman rule for DWF 2016-10-10 23:43:36 +01:00
paboyle c0d5b99016 Dminus 2016-10-10 23:43:19 +01:00
paboyle 09ca32d678 Dminus added for Cayley 2016-10-10 23:42:55 +01:00
paboyle 082ae350c6 static schedule by default 2016-10-10 23:42:30 +01:00
Guido Cossu 611b5d74ba Fix for AVX+FMA3 compilation 2016-10-10 15:26:17 +01:00
Guido Cossu b56c9ffa52 Fix for AVXFMA 2016-10-10 14:43:37 +01:00
Guido Cossu c68a2b9637 Minor fix 2016-10-10 11:54:58 +01:00
Guido Cossu 293df6cd20 Generalising the HMCRunner and moving parameters to the user level 2016-10-10 11:49:55 +01:00
Guido Cossu 65f61bb3bf Reset QCD colours to 3 2016-10-10 09:46:17 +01:00
Guido Cossu 26b9740d53 Some fix for the GenericHMCrunner 2016-10-10 09:43:05 +01:00
portelli cb02b7088f Merge branch 'develop' into feature/doxygen
# Conflicts:
#	configure.ac
2016-10-09 13:35:44 +01:00
portelli 70c32fa49b Merge branch 'develop' of github.com:paboyle/Grid into develop 2016-10-09 12:55:46 +01:00
portelli 77c8a94dae AVXFMA4 flag fix for Intel Compiler 2016-10-09 12:55:12 +01:00
Guido Cossu 6eb873dd96 Added scalar action phi^4
Check Norm2 output (Complex type assumption)
2016-10-07 17:28:46 +01:00
Guido Cossu 11b4c80b27 Added support for hmc and binary IO for a general field 2016-10-07 13:37:29 +01:00
Guido Cossu 2e453dfbf5 Added some instrumentation to benchmark the force computation 2016-10-06 17:52:45 +01:00
Guido Cossu c065e454c3 Adding Binrary IO, untested 2016-10-06 10:12:11 +01:00
paboyle 4089984431 Timing hooks 2016-10-06 09:25:12 +01:00
portelli 98439847cf configure portability fix 2016-10-05 14:57:20 +01:00
Guido Cossu c78bbd0f8c Fix ASM compilation 2016-10-04 15:37:32 +01:00
Guido Cossu d9b5fbd374 In the middle of adding a general binary writer 2016-10-04 11:24:08 +01:00
Guido Cossu cfbc1a26b8 Now the gauge implementation has to take care of the Nexp 2016-10-03 16:20:06 +01:00
Guido Cossu 257f69f931 One more function to generalise the HMC integrator 2016-10-03 15:50:04 +01:00
Guido Cossu e415260961 First cut on generalised HMC
Backward compatibility OK
2016-10-03 15:28:00 +01:00
portelli 7ea4b959a4 hopefully more portable configure output 2016-09-27 11:54:37 +01:00
portelli 536e2ff073 *.inc removed: please don't commit these files either! 2016-09-27 11:54:03 +01:00
portelli 798ff34d7e configure removed: please don't commit configure! 2016-09-27 11:29:31 +01:00
paboyle 87acd06990 Use streaming stores 2016-09-26 10:11:34 +01:00
paboyle 9353b6edfe Fenv out of grid namespace 2016-09-26 10:09:13 +01:00
paboyle 167cc2650e GNU SOURCE problem on travis 2016-09-26 09:58:09 +01:00
paboyle 34f887ca1c Test_fft not complete; preparing for tests of momentum space DWF and Overlap feynman rules but not there yet. 2016-09-26 09:44:36 +01:00
paboyle 7089b6d5a5 Setting up but not implemented some QED rules 2016-09-26 09:43:40 +01:00
paboyle 2ba7d43ddd Divide handling 2016-09-26 09:43:14 +01:00
paboyle 836e929565 Divide handling improved 2016-09-26 09:42:22 +01:00
paboyle b6713ecb60 Momentum space rules for Overlap, DWF untested to date 2016-09-26 09:39:09 +01:00
paboyle 52a39f0fcd Divide in ET 2016-09-26 09:38:38 +01:00
paboyle 81a7a03076 Integer << 2016-09-26 09:38:17 +01:00
paboyle 16b37b956c divide goes to ET 2016-09-26 09:37:59 +01:00
paboyle 567b6cf23f demangle moves to logging 2016-09-26 09:36:51 +01:00
paboyle 296396646d FPE's on macos set up 2016-09-26 09:36:14 +01:00
Guido Cossu 04a437c92c Minor modification to the filelist script 2016-09-23 11:12:45 +01:00
Guido Cossu 5c190a1b8c Merge branch 'develop' into feature/hirep 2016-09-23 11:06:06 +01:00
Guido Cossu 15d8f5c88c Small change to the configure.ac to include the canonical names 2016-09-23 11:05:36 +01:00
Guido Cossu c4ac6e7e8f Consolidating HMC interface
Uniformed interface for standard action in fundamental rep and Hirep
2016-09-23 10:47:42 +01:00
Guido Cossu 510e340e16 Debugged last commit for the Two index representation 2016-09-22 22:16:21 +01:00
Guido Cossu 6ffadca153 Restored number of colours to 3 2016-09-22 14:22:54 +01:00
Guido Cossu b6597b74e7 Added support for the Two index Symmetric and Antisymmetric representations
Tested for HMC convergence: OK
Added also a test file showing an example for mixed representations
2016-09-22 14:17:37 +01:00
portelli a034e9901b Merge branch 'develop' into feature/hadrons 2016-09-20 13:49:33 +01:00
portelli d2573189d8 build system: FFTW fix 2016-09-20 12:30:24 +01:00
portelli 65ca174dbb gitignore update 2016-09-20 11:25:06 +01:00
Antonin Portelli 0724f7af75 QPX single precision implementation 2016-09-19 18:09:12 +01:00
portelli 2e74520821 removed libtool use (BG/Q compatibility) 2016-09-16 15:25:49 +01:00
Antonin Portelli 6dd75ad9e5 Merge branch 'develop' of github.com:paboyle/Grid into feature/bgq 2016-09-16 15:07:54 +01:00
Guido Cossu fda408ee6f Added first lines for supporting Two Index representations 2016-09-13 10:43:30 +01:00
Guido Cossu b9c80318a2 Merge branch 'develop' into feature/hirep 2016-09-13 10:01:51 +01:00
Guido Cossu 5df5d52d41 Fix for the Intel compiler 2016-09-12 17:17:20 +01:00
Guido Cossu f76f281e58 Cleaning files after fix 2016-09-09 11:34:25 +01:00
Guido Cossu aa20cc8b52 Fixing compilation error with AVX512 flag 2016-09-09 02:58:52 -07:00
Guido Cossu 0fd179fb33 Merge branch 'develop' into feature/hirep 2016-09-01 12:59:53 +01:00
Guido Cossu f45ef8d114 Minor modification in ActionBase.h 2016-09-01 11:46:46 +01:00
paboyle 7422953e36 Poisson solver example 2016-08-31 00:42:47 +01:00
paboyle 8535d433a7 Cold or hot must support any precisoin 2016-08-31 00:27:53 +01:00
paboyle b573d1f35a Wilson tree level added 2016-08-31 00:27:04 +01:00
paboyle 0c1d7e4daf Mom space prop for Wilson action 2016-08-31 00:26:36 +01:00
paboyle 02e983a0cd Momentum space prop and free prop convolution 2016-08-31 00:26:02 +01:00
paboyle d15ab66aae FFT moves higher in include order 2016-08-31 00:25:22 +01:00
paboyle 9005b82c6d Multi dim FFT, and normalisation fix 2016-08-31 00:24:52 +01:00
paboyle 3475f45ce7 Demangle support for typeid stuff 2016-08-31 00:23:48 +01:00
paboyle 0744f38866 Demangle support is useful 2016-08-31 00:23:28 +01:00
paboyle 62febd2823 Wilson prop test 2016-08-31 00:23:09 +01:00
Guido Cossu fd5614738d Merge branch 'develop' into feature/hirep 2016-08-30 18:21:36 +01:00
Guido Cossu 005dcc51aa Reset travis 2016-08-30 14:44:10 +01:00
Guido Cossu 655c893f86 Another test on travis 2016-08-30 14:38:42 +01:00
Guido Cossu 843f5783b4 Again travis test separating single and double 2016-08-30 14:29:09 +01:00
Guido Cossu 8986c9fedd Single and double precision travis matrix 2016-08-30 14:25:24 +01:00
Guido Cossu c80a1d427c Retest original version of travis yaml 2016-08-30 14:05:05 +01:00
Guido Cossu ae57032500 Separate single and double builds in travis 2016-08-30 14:00:34 +01:00
Guido Cossu f75468728f Another error on travis 2016-08-30 13:56:23 +01:00
Guido Cossu 5acd856663 Correction of error in travis 2016-08-30 13:49:49 +01:00
Guido Cossu b0d3e4bb2c Separating travis builds 2016-08-30 13:44:07 +01:00
Guido Cossu b512ccbee6 HMC for Adjoint fermions works
Accepts and reproduces known results

Check initial instability of inverters
when starting from hot configurations
2016-08-30 11:31:25 +01:00
paboyle 8c89391c02 FFTW unresolved fixed when no fftw3.h 2016-08-24 16:41:47 +01:00
paboyle bfac5195b8 tidy up 2016-08-24 16:38:36 +01:00
paboyle a782ca3238 Merge branch 'feature/fft-flop-count' into develop 2016-08-24 15:06:17 +01:00
paboyle 744691097f Printing 2016-08-24 15:05:56 +01:00
paboyle ff6da364e8 FFT double and single precision gives good performance now in multithreaded code. 2016-08-24 15:05:00 +01:00
portelli 4d11a6f5f2 first commit for QPX intrinsics 2016-08-23 14:41:44 +01:00
paboyle 88be3b39bb Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2016-08-22 18:29:36 +01:00
paboyle 8a02824e08 Merge branch 'feature/FFT' into develop 2016-08-22 16:25:04 +01:00
paboyle 356e7940fd fftw can be switched off 2016-08-22 16:24:49 +01:00
paboyle 73ce476890 Include fftw headers 2016-08-22 16:24:21 +01:00
paboyle 29c4ef41de Adding a test for libfftw3 2016-08-22 16:21:01 +01:00
paboyle e423a09974 FFT improved and test_FFT passing under MPI 8 processes, 8^4 for LatticeComplexD and LatticeSpinMatrixD 2016-08-18 02:23:21 +01:00
paboyle 17097a93ec FFTW test ran over 4 mpi processes. 2016-08-17 01:33:55 +01:00
paboyle 94a6373a7f Merge branch 'feature/eigen-cleanup' into develop 2016-08-15 23:58:34 +01:00
paboyle 4ab7dbfd57 Instantiate 2016-08-15 23:00:40 +01:00
paboyle 90e70790f3 Feature for z-Mobius prep 2016-08-15 22:31:29 +01:00
Guido Cossu 9c2e8d5e28 Nc=3 just to let all the test pass in Travis 2016-08-09 15:46:57 +01:00
Guido Cossu 147e2025b9 Added unit tests on the representation transformations
Status: Passing all tests
2016-08-08 16:54:22 +01:00
portelli 573b8c6020 build system: -O3 is not overriden by env CXXFLAGS 2016-08-06 01:26:24 +01:00
portelli 15218ec57f more Travis MPI fix 2016-08-06 00:49:14 +01:00
portelli ec68e08dd2 Travis MPI fix 2016-08-06 00:36:05 +01:00
paboyle fc25d2295c fftw download 2016-08-06 00:28:52 +01:00
paboyle 8dc2cfcedb Adding fftw header pulling 2016-08-06 00:28:28 +01:00
portelli 17c843700e missing doxygen.inc added 2016-08-05 15:38:21 +01:00
portelli 7b56f63a5c configure Doxygen output fix 2016-08-05 15:35:29 +01:00
portelli b1cfb4d661 first try at a nicer Doxygen implementation 2016-08-05 15:29:18 +01:00
portelli 836f93780c first try at including MPI tests in Travis 2016-08-05 13:41:52 +01:00
paboyle 5a68715be3 Richards sweep test 2016-08-05 10:51:57 +01:00
paboyle 32bc7a6ab8 MPI back out of change that hangs
AVX2 for clang, gcc needs the -mfma flag.
2016-08-05 10:36:00 +01:00
portelli b65e72e521 Merge pull request #43 from rprollins/bench/output-format
Benchmark_dwf_sweep and Benchmark_zmm output formats
2016-08-04 16:47:01 +01:00
portelli d1aaff65e8 README update 2016-08-04 16:27:02 +01:00
portelli 7ff7c7d90d Merge branch 'develop' into feature/hadrons 2016-08-04 16:22:10 +01:00
portelli 93d29bb699 build system improvements after discussion with Peter 2016-08-04 16:19:59 +01:00
portelli a2e9430abe Hadrons: fix after build system update 2016-08-03 17:14:32 +01:00
portelli 2485ef9c9c Merge branch 'feature/new-build' into feature/hadrons
# Conflicts:
#	Makefile.am
#	scripts/copyright
2016-08-03 16:49:16 +01:00
portelli 3b376ed54e build system: error if MPI not found 2016-08-03 15:23:38 +01:00
portelli d5c1f614ba gitignore update 2016-08-03 15:14:33 +01:00
portelli 2edc24225d untracking ltmain.sh 2016-08-03 15:12:44 +01:00
portelli 629283726b build system: local Grid link flag moved to configure.ac 2016-08-03 15:07:42 +01:00
portelli 6adb66dd08 build system: finer management of GMP/MPFR dependence 2016-08-03 15:06:45 +01:00
portelli 5be92bb708 link fix in README 2016-08-03 12:40:56 +01:00
portelli f4c049ea6d README update 2016-08-03 12:38:54 +01:00
portelli bc092ad30f build system fix 2016-08-03 11:47:38 +01:00
portelli dad642ed1b various build system fixes and improvements 2016-08-03 11:39:20 +01:00
portelli 63ae39abc7 proper propagation of OpenMP flags 2016-08-02 17:41:32 +01:00
portelli 9e5b934d21 improved LAPACK configuration 2016-08-02 17:26:54 +01:00
portelli a7b483d67a Tests in subdirectories are not built by default 2016-08-02 12:14:28 +01:00
portelli bb99ce0680 bootstrap script fix 2016-08-01 09:51:06 +01:00
portelli 83307df1af travis update for new build system 2016-08-01 09:38:40 +01:00
Guido Cossu 49b5c49851 Checked the hermiticity of the op in derivative, ok
Still CG fails to converge
2016-07-31 12:37:33 +01:00
portelli e9f30cab2c first working version for the new build system 2016-07-30 17:53:18 +01:00
Guido Cossu 089f0ab582 Debugged HMC for Creutz relation 2016-07-28 16:44:41 +01:00
Richard Rollins df6c9f55d1 Use common benchmark output format for dwf_sweep and zmm 2016-07-20 17:38:56 +01:00
Guido Cossu b93e18ed50 Modified the Dirac Kernel class to compile with different number of colours
Added the general push_back functionality to accomodate for all defined representations

Compiles, not tested
2016-07-18 16:36:28 +01:00
Guido Cossu 9c77bb69a5 Added all elements for Hirep HMC
TODO: Test and debug
2016-07-18 12:05:23 +01:00
paboyle 27f3ecc833 Merge branch 'feature/bugfix-ck-cj' into develop 2016-07-16 01:59:52 +01:00
paboyle f9e90eeb1f Sign error on the force for 4d fields fixed 2016-07-16 01:52:44 +01:00
paboyle fad5c675eb sign error on the 4d gparity force 2016-07-16 01:51:56 +01:00
paboyle 4908b77d46 Fixed conflicts. PLEASE avoid making wholesale cosmetic only changes, this created
a HUGE amount of difficult to resolve and understand conflicts .

Wholesale formatting, reordering functions etc... in a central file like Tensor_class
or Grid_vector_types while others are also editing without making substantial functionality
changes creates pain.
2016-07-15 20:59:07 +01:00
paboyle f4dd5062d7 Merge branch 'develop' of https://github.com/paboyle/Grid into develop 2016-07-15 19:26:06 +01:00
paboyle da34d75841 Merge branch 'feature/Ls-vectorised-actions' into develop 2016-07-15 19:09:47 +01:00
paboyle 980ff18956 Solving the instantiation no compile issue 2016-07-15 17:19:44 +01:00
Guido Cossu 7edf4c6c04 Added HMC utitities for the higher representations
TODO: Inherit types for the pseudofermions, Debugging, testing
2016-07-15 13:39:47 +01:00
paboyle 1a6c7204ac Disable instantiation; Use cache version instead 2016-07-15 00:34:39 +01:00
paboyle 49310fbab3 Done with red black change over 2016-07-15 00:08:43 +01:00
paboyle 6049d5ac47 Update 2016-07-15 00:08:32 +01:00
paboyle 35d0d35238 Updated file list 2016-07-15 00:02:53 +01:00
paboyle c0e878705e Updated file list 2016-07-15 00:02:39 +01:00
paboyle 5c0c8efb9e Updated file list 2016-07-15 00:02:11 +01:00
paboyle dfd714e1ef Multiple implementations for the 5d hopping terms, depending on cache friendly
ops and/or the 5th direction being vectorised
All use 4d redblack.
2016-07-15 00:00:09 +01:00
paboyle 79a8ca1a62 Rewrite for performance. Impl dependent instantiations give
4d linalg impls of the 5d hopping terms (and inverse)
Cache friendly loop orderings of the above
Dense matrix stored and apply to the above

-- Switch to Ls vectorised, and use dense matrix approach for the MooeeInv
   and rotate/shift of the Mooee M5D routines.
2016-07-14 23:58:15 +01:00
paboyle fb45eb2eb2 5d ls vec rename of impl class 2016-07-14 23:57:26 +01:00
paboyle a307274c96 Fermion impl rename for ls vectorised 5d approaches 2016-07-14 23:56:13 +01:00
paboyle 3f2c44a5fe Updating the class to 5d selection based on impl type 2016-07-14 23:55:26 +01:00
paboyle 48fb1cdc11 Update domain 5d vectorised impl type, move the type over to 4d redblack with
the dense OO inverse
2016-07-14 23:54:35 +01:00
paboyle 8a79e93cc2 Rename the 5d domain wall fermion vectorised Ls impl class 2016-07-14 23:53:00 +01:00
paboyle 3493b51879 Modest updates 2016-07-14 23:52:13 +01:00
paboyle de3e79d300 red black for Ls vectorised is 4d red black. Update accordingly now I've made this choice 2016-07-14 23:49:42 +01:00
paboyle dd62a61c5c Added broadcast and rotation of simd vectors 2016-07-14 23:49:00 +01:00
paboyle 8f47d0b5ab Rotation needed for hopping term in fifth dim with Ls vectorised fields 2016-07-14 23:45:36 +01:00
paboyle 42af132dab Fix for chris kellys request to peek poke on checkerboarded fields 2016-07-14 23:44:48 +01:00
paboyle 9db2c6525d updating benchmarks for red black 4d for Ls vectorised code 2016-07-14 23:44:02 +01:00
paboyle adbc7c1188 Adding files for multiple implementations (cache opt) and Ls vectorisation
of the 5D cayley form chiral fermions for the 5d matrix. With Ls entirely
in the vector direction, s-hopping terms involve rotations.

The serial dependence of the LDU inversion for Mobius and 4d even odd
checkerboarding is removed by simply applying Ls^2 operations (vectorised
many ways) as a dense matrix operation.

This should give similar throughput but high flops (non-compulsory flops)
but enable use of the KNL cache friendly kernels throughout the code.

Ls is still constrained to be a multiple of Nsimd, which is as much as 8 for AVX512
with single precision.
2016-07-14 22:59:21 +01:00
Guido Cossu 9dc345e8e8 Debugged smearing and adding HMC functions for hirep 2016-07-13 17:51:18 +01:00
Christopher Kelly 8b9301a74c Merge branch 'feature/bugfixes' into develop 2016-07-13 12:31:34 -04:00
Christopher Kelly 6f47fbb1e2 Disabled parallel for loops in ExtractSlice and InsertSlice due to race conditions. Likely will need to do so for localConvert too. 2016-07-13 10:49:18 -04:00
Guido Cossu a9ae30f868 Added representations definitions for the HMC 2016-07-12 13:36:10 +01:00
Christopher Kelly a3c0fb79b6 Fix to iVector and iMatrix pokeIndex and checkerboard local site indexing. 2016-07-11 17:15:22 -04:00
paboyle 62601bb649 Bug fix 2016-07-08 20:46:29 +01:00
paboyle ef97e32152 Adding persistent communicators 2016-07-08 17:16:08 +01:00
Guido Cossu daea5297ee Wrote the projector in the adjoint representation algebra 2016-07-08 16:14:16 +01:00
Guido Cossu 5028969d4b Added generators for the adjoint representation 2016-07-08 15:40:11 +01:00
paboyle c667d9fdcc Trying to make compile clean on travis; seem to have a make -j 4 problem with fftw 2016-07-07 23:26:39 +01:00
paboyle 7dbb94bab2 Update 2016-07-07 22:51:37 +01:00
paboyle 236dcc820b typo fix 2016-07-07 22:46:11 +01:00
paboyle a42a441a6a Rename the reconfigure script to ./autogen.sh 2016-07-07 22:35:45 +01:00
paboyle a0676beeb1 Open up dependency on Eigen and FFTW 2016-07-07 22:31:07 +01:00
Christopher Kelly c5106d0c03 Bugfix 2016-07-07 16:06:30 -04:00
Guido Cossu fbf96b1bbb ]Merge branch 'develop' into feature/hirep 2016-07-07 14:20:10 +01:00
Guido Cossu 3c49ddfaa4 Merge branch 'temporary-smearing' into develop 2016-07-07 14:04:59 +01:00
Guido Cossu ffb8b3116c Tested smeared RHMC Wilson1p1, accepting 2016-07-07 11:49:36 +01:00
Christopher Kelly 290493e162 Merge branch 'feature/multi_prec' into develop 2016-07-06 19:29:57 -04:00
Christopher Kelly dd8cfff111 Another fix for pedantic compilers 2016-07-06 18:22:15 -04:00
Christopher Kelly 184642adb0 Fix for pedantic compilers 2016-07-06 18:15:15 -04:00
Christopher Kelly 4774a3bcd2 Generalized HotConfiguration and functions it calls to accept gauge fields with precision other than the default. 2016-07-06 18:01:08 -04:00
Christopher Kelly 25fafa9a89 Comment 2016-07-06 16:19:41 -04:00
Christopher Kelly 713520d3d2 Added tester for mixed CG 2016-07-06 16:18:19 -04:00
Christopher Kelly 85ed8175cb Implemented mixed precision CG. Fixed filelist to exclude lib/Old directory and include Config.h. 2016-07-06 15:57:04 -04:00
Christopher Kelly df5c788ef2 Merge branch 'develop' into feature/multi_prec 2016-07-06 14:52:28 -04:00
Christopher Kelly 15f22425c8 Added option to prevent CG from exiting when it fails to converge 2016-07-06 14:50:01 -04:00
Guido Cossu e87182cf98 Debugged the copy constructor of the Lattice class 2016-07-06 15:31:00 +01:00
Guido Cossu e3d5319470 Debugged the real() and imag() functions and added tests to Test_Simd 2016-07-06 14:16:03 +01:00
Guido Cossu ffedeb1c58 Minor modifications 2016-07-06 11:41:27 +01:00
Guido Cossu 3e3b367aa9 Small changes in the Log files 2016-07-05 15:05:28 +01:00
Guido Cossu 3e80947c2b Cleaned up HMC output. Tested smeared HMCs for single precision (OK) 2016-07-05 12:03:54 +01:00
Guido Cossu fdfbf11c6d Merge branch 'develop' into temporary-smearing 2016-07-04 18:45:10 +01:00
Guido Cossu 9cb90f714e Merge remote-tracking branch 'origin/develop' into temporary-smearing 2016-07-04 17:28:40 +01:00
Guido Cossu 6ce174cd60 Testing smearing for RHMC routines 2016-07-04 16:36:49 +01:00
Guido Cossu 17ca5240f7 Testet smeared EOWilsonRatio, accepts 2016-07-04 16:25:15 +01:00
Guido Cossu 2daffdf95d Tested smeared WilsonRatio action, accepts 2016-07-04 16:17:28 +01:00
Guido Cossu 149f826601 Tested smearing for Nf2 WilsonFermionAction, non EO: accepts 2016-07-04 16:09:19 +01:00
Guido Cossu cd8ee27080 Simple change in iGamma for smearing 2016-07-04 16:02:57 +01:00
Guido Cossu 0fa66e8f3c Debugged smearing for EOWilson, accepts 2016-07-04 15:35:37 +01:00
Guido Cossu 8dd099267d Corrected a bug in the Expression Templates (acso and asin were wrong) 2016-07-03 12:28:25 +01:00
Guido Cossu 1a6d65c6a4 Converted set_uw and set_fj to all complex functions 2016-07-03 10:27:43 +01:00
paboyle fc4a043663 Colors and banner clean up 2016-07-02 16:15:38 +01:00
paboyle 61ba50665e Merge branch 'hotfix/v0.5.1' into develop 2016-07-01 16:34:30 +01:00
paboyle 446c768cd3 Merge branch 'hotfix/v0.5.1'
Double precision compile fix
2016-07-01 16:33:59 +01:00
paboyle bfe14000a9 Double compile fix 2016-07-01 16:33:51 +01:00
Guido Cossu 092fa0d8da Debugged set_fj,
to be fixed: BUG in imag()
2016-07-01 16:06:20 +01:00
portelli e0b7004f96 Merge branch 'master' into feature/hadrons 2016-07-01 15:54:34 +01:00
paboyle 1ceff48133 Merge branch 'release/v0.5.0' into develop 2016-06-30 15:15:59 -07:00
paboyle 680645f849 Merge branch 'release/v0.5.0' 2016-06-30 15:15:03 -07:00
paboyle 3fc6e03ad1 Version file 2016-06-30 14:44:09 -07:00
paboyle 2d6614f3a1 Merge branch 'feature/knl-cache-opt' into develop 2016-06-30 14:36:20 -07:00
paboyle 4e041b5103 Merge branch 'feature/knl-cache-opt' of https://github.com/paboyle/Grid into feature/knl-cache-opt 2016-06-30 14:36:08 -07:00
paboyle 712b9a3489 Asm only for avx512 2016-06-30 14:35:02 -07:00
paboyle bdaa5b1767 Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking. 2016-06-30 14:35:02 -07:00
paboyle 8fcefc021a Improved the prefetching when using cache blocking codes 2016-06-30 14:35:02 -07:00
paboyle 1445189361 COntrol the prefetch strategy 2016-06-30 14:35:02 -07:00
paboyle 05c884a62a Prefetch change 2016-06-30 14:35:01 -07:00
paboyle a25bec87d9 Prefetch during save 2016-06-30 14:35:01 -07:00
paboyle 2d8bb4c594 Tweaks 2016-06-30 14:35:01 -07:00
paboyle 51cb2d4328 update file lists 2016-06-30 14:35:01 -07:00
paboyle 6d58cb2a68 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-30 14:35:01 -07:00
paboyle c8b35d960c Merge branch 'develop' of https://github.com/paboyle/Grid into feature/knl-cache-opt 2016-06-30 14:30:49 -07:00
paboyle 532f41dd61 Asm only for avx512 2016-06-30 14:00:34 -07:00
paboyle 661b0ab45d Updated to have perfect prefetching for the s-vectorised kernel with any cache blocking. 2016-06-30 13:07:42 -07:00
Guido Cossu 565e9329ba Changed the colouring classes 2016-06-30 16:51:03 +01:00
paboyle 4bc08ed995 Improved the prefetching when using cache blocking codes 2016-06-26 12:54:14 -07:00
paboyle b2933a0557 COntrol the prefetch strategy 2016-06-25 12:55:25 -07:00
paboyle db057cc276 Prefetch change 2016-06-25 12:54:50 -07:00
paboyle 22e88eaf54 Prefetch during save 2016-06-25 12:54:14 -07:00
paboyle 09fe3caebd Tweaks 2016-06-25 11:08:05 -07:00
Guido Cossu 5e02392f9c Fixed compilation error for benchmark_dwf
Some parts were assuming floating point precision
2016-06-20 12:30:51 +01:00
paboyle 17a8f51a9b update file lists 2016-06-19 11:59:10 -07:00
paboyle 1b7f88dd00 Enable reordering of the loops in the assembler for cache friendly.
This gets in the way of L2 prefetching however. Do next next link in stencil
prefetching.
2016-06-19 11:45:58 -07:00
portelli d6737e4bd8 Travis fix for Linux clang builds 2016-06-14 19:15:08 +01:00
portelli 75fc295f6e Merge branch 'hadrons' into feature/hadrons 2016-06-14 17:51:15 +01:00
portelli d539888e57 Merge pull request #37 from rprollins/fix/mpi_communicator
Removed write to stdout in constructor for MPI CartesianCommunicator
2016-06-14 17:25:40 +01:00
Richard Rollins 86187d7cca Removed write to stdout in constructor for MPI CartesianCommunicator 2016-06-14 15:34:20 +01:00
paboyle 87418e7df1 Slightly faster prefetching perf. 2016-06-13 02:32:52 -07:00
paboyle 55f65b81b5 Improvements to the assembler interface that let us move chunks of the
site and s loop into the kernels. This will save on function call overhead and
guarantee L2 prefetching strategy is right since OMP can't distribute the
sub-chunks of work.
2016-06-09 01:12:36 -07:00
Azusa Yamaguchi d9408893b3 Prefetching in the normal kernel implementation. 2016-06-08 05:43:48 -07:00
paboyle 05acc22920 placeholder for non temporal loads optimisation 2016-06-07 13:18:21 -07:00
paboyle 8ac021de73 Added a test an fixed it for red black precon Ls innermost vectorised DWF 2016-06-07 13:16:56 -07:00
paboyle e503ef5590 Cleaned up 2016-06-07 00:11:36 +01:00
paboyle a7682b0060 Only instantiate the one routine to avoid duplicate symbol under g++5/MacOS 2016-06-06 23:48:21 +01:00
portelli 0b731b5d80 Hadrons: genetic scheduler parameter fix 2016-06-06 17:46:53 +01:00
portelli 8e2078be71 Hadrons: environment with fully generic object store 2016-06-06 17:45:37 +01:00
paboyle d4c9d71fc8 Merge branch 'master' of https://github.com/paboyle/Grid 2016-06-06 07:06:54 -07:00
paboyle 786ca52c43 Problems remain in the red black preconditioning of the Ls vectorisation 2016-06-06 07:05:51 -07:00
Peter Boyle 048ac04abc Update Benchmark_dwf.cc 2016-06-03 13:44:41 +01:00
Peter Boyle f78d89bcbe Update Lebesgue.cc
kill verbose
2016-06-03 13:33:42 +01:00
paboyle 53d06046b0 Compiling updates for KNL 2016-06-03 03:47:54 -07:00
paboyle 5d3a1a025d timers flag 2016-06-03 03:25:38 -07:00
paboyle 139cc5f1ae Large change with KNL preparation 2016-06-03 03:24:26 -07:00
portelli 1826ed06a3 Merge branch 'master' into hadrons 2016-05-27 16:50:31 +01:00
portelli 1c0e922585 Merge pull request #35 from aportelli/master
empty SIMD fix
2016-05-27 16:49:13 +01:00
portelli 9d5f693cbe empty SIMD fix 2016-05-24 10:56:27 +01:00
Peter Boyle 5c90c3b457 Merge pull request #34 from aportelli/master
Polymorphic lattices & various small updates
2016-05-24 10:50:04 +01:00
portelli 3ff96c502b Merge branch 'master' into hadrons 2016-05-12 19:24:18 +01:00
portelli 91e04056f9 fix of the empty SIMD 2016-05-12 19:24:10 +01:00
portelli 15a0908bfc Merge branch 'master' into hadrons 2016-05-12 18:35:46 +01:00
portelli 3789e3f31c additional fixed in slice functions 2016-05-12 18:35:38 +01:00
portelli bb2125962b Hadrons: finished implementation of 5D quarks 2016-05-12 18:34:42 +01:00
portelli 232fda5fe1 Hadrons: DWF action 2016-05-12 18:34:10 +01:00
portelli 2b31bf61ff Hadrons: message fix 2016-05-12 18:33:49 +01:00
portelli afe5a94745 Hadrons: getModule with upcast 2016-05-12 18:33:36 +01:00
portelli 7ae667c767 Hadrons: module template update 2016-05-12 18:33:08 +01:00
portelli 07f0b69784 Merge branch 'master' into hadrons 2016-05-12 13:02:18 +01:00
portelli 0c66719210 const fix in slice functions 2016-05-12 13:01:35 +01:00
portelli 5c06e89d69 Hadrons: code cleaning 2016-05-12 12:49:49 +01:00
portelli 3d75e0f0d1 Hadrons: MQuark fix 2016-05-12 12:02:15 +01:00
portelli 362f255100 Hadrons: module parameters can now be accessed from outside 2016-05-12 11:59:28 +01:00
paboyle 3a5b5c8bec Save an old tar of tree 2016-05-12 03:20:17 -07:00
paboyle fdbe071213 space added 2016-05-12 02:59:51 -07:00
portelli 3d78ed03ef Merge branch 'master' into hadrons 2016-05-11 15:21:46 +01:00
portelli 4bc21ec7cb thread CL argument fix 2016-05-11 15:21:29 +01:00
portelli e3083b6dfc Merge commit 'ab894186589224d570e0ecef8eea06443194a8ab' 2016-05-11 15:20:41 +01:00
paboyle ab89418658 Precision change going in; useful for mixed precision algorithms for example. 2016-05-11 15:18:47 +01:00
paboyle 28cd99882c Subslicing 2016-05-11 15:06:54 +01:00
portelli 835003b3c5 Hadrons: removed useless gauge global parameters 2016-05-11 15:01:52 +01:00
portelli 328d213c9e Hadrons: FS case sensitivity fix 2016-05-11 14:44:14 +01:00
paboyle aceaee774c ExtractSlice / InsertSlice for lower dimensional lattices where the lattice is not
distributed in the orthogonal direction.
Useful for fermion 4d/5d etc..
2016-05-11 14:12:02 +01:00
portelli 56a8d7a5bc Hadrons: build system fix 2016-05-11 10:27:14 +01:00
portelli 78198d1b04 Hadrons: size fix for module graph with one vertex 2016-05-10 20:13:28 +01:00
portelli 84fa2bdce6 Hadrons: modules moved in their own directory & utility script to add new modules 2016-05-10 20:12:48 +01:00
portelli 29dfe99e7c Hadrons: more scheduler optimizations 2016-05-10 19:19:38 +01:00
portelli d604580e5a Hadrons: all objects/modules mapped to an integer address system to remove string operations from scheduling 2016-05-10 19:07:41 +01:00
portelli 7dfdc9baa0 Hadrons: lattice dynamic cast fix 2016-05-10 10:41:20 +01:00
portelli 9e986654e6 Hadrons: first version of the genetic scheduler 2016-05-09 14:49:06 +01:00
portelli df3fbc477e Hadrons: code cleaning 2016-05-07 13:26:56 -07:00
portelli bb580ae077 Hadrons: significant overhaul of the object registration system, previous version didn't allow dry runs 2016-05-07 13:19:38 -07:00
portelli 2c226753ab Hadrons: comments on graph theory algorithm complexity 2016-05-06 06:35:11 -07:00
portelli ea0cea668e Hadrons: minor code cleaning 2016-05-05 16:13:14 -07:00
Peter Boyle f8f9fd6f22 Merge pull request #33 from aportelli/master
Travis for clang 3.8 + various updates/fixes
2016-05-05 22:57:13 +01:00
portelli 75cd72a421 Hadrons: memory management for fermion matrices, dynamic ownership in garbage collector 2016-05-04 19:11:03 -07:00
portelli cbe52b0659 Hadrons: debug message removed 2016-05-04 12:20:33 -07:00
portelli 3aa6463ede Hadrons: general lattice store & a lot of code cleaning 2016-05-04 12:17:27 -07:00
portelli 312637e5fb Merge branch 'master' into hadrons
# Conflicts:
#	lib/Log.h
2016-05-04 12:16:18 -07:00
portelli 101aa769eb LatticeBase contain the grid pointer and a virtual destructor to allow polymorphic lattice pointers 2016-05-04 12:15:31 -07:00
portelli 0bf99bfde5 log polish 2016-05-04 12:14:49 -07:00
portelli 64bf6fe54e macro to dump NERSC header to a stream 2016-05-04 12:14:38 -07:00
portelli 798d8f7340 Hadrons: Modules: better log messages 2016-05-03 18:17:58 -07:00
portelli ba878724ce Hadrons: sources are now independent modules 2016-05-03 18:17:28 -07:00
portelli b865dd9da8 Hadrons: solver renaming 2016-05-03 18:16:57 -07:00
portelli 8b313a35ac Hadrons: random and NERSC gauge configurations 2016-05-03 17:08:42 -07:00
portelli 02ec23cdad Hadrons: Fermion actions and gauge fields are modules now 2016-05-03 17:08:42 -07:00
portelli 1161d566b9 minor code cleaning 2016-05-02 19:32:11 -07:00
portelli 6e83b6a203 Hadrons: namespace reorganisation, now everything is in Grid::Hadrons, the 'using Grid::operator<<' statement is used to prevent a very nasty compilation error with GCC. 2016-05-02 19:31:21 -07:00
portelli 48fcc34d72 CMeson: first implementation, still need proper output 2016-05-01 18:31:40 -07:00
portelli d08d93c44c Merge branch 'master' into hadrons 2016-05-01 18:30:44 -07:00
portelli c698b16d75 function to generate Chroma-style gamma matrix products 2016-05-01 18:30:35 -07:00
portelli c4c89336fe SliceSum: shutting down warning about non-threaded code for now 2016-05-01 18:29:57 -07:00
portelli fa59789580 ConjugateGradient: cleaner output 2016-05-01 18:29:20 -07:00
portelli 0ab10cdedb Merge branch 'master' into hadrons 2016-05-01 16:08:05 -07:00
portelli 92c2c7d3b5 SchurRedBlackDiagMooeeSolve: fix: guess was not initialised from input 2016-05-01 16:07:55 -07:00
portelli e99ce0875f directly exit when using '--help' option 2016-05-01 16:05:16 -07:00
portelli 22653edf12 Merge branch 'master' into hadrons 2016-05-01 15:55:58 -07:00
portelli cc1d9eb05b Merge commit '999b3a2e26bdd8300d389699dd299e7e5d951af6' 2016-05-01 15:55:22 -07:00
portelli 12d2a95846 Merge branch 'master' into hadrons 2016-05-01 15:05:02 -07:00
portelli 57c027fea2 Travis update 2016-05-01 15:04:52 -07:00
portelli 207dc439a7 Travis debug 2016-05-01 15:00:35 -07:00
portelli 978cf52f6b Merge branch 'master' into hadrons 2016-05-01 14:53:38 -07:00
portelli 77ef0bba48 Travis update 2016-05-01 14:53:28 -07:00
portelli 63b730de80 Hadrons: for the moment, test with unit gauge 2016-05-01 14:50:57 -07:00
portelli 7905c5b8e5 Hadrons: Z2 source code fix 2016-05-01 14:49:45 -07:00
Peter Boyle 999b3a2e26 Merge pull request #32 from aportelli/master
Proposal for Travis update + minor build system fix
2016-05-01 22:05:02 +01:00
portelli 5e4b58ac40 Hadrons: Z2 source expression fix 2016-05-01 12:49:26 -07:00
portelli 468d8dc682 Merge branch 'master' into hadrons 2016-05-01 12:03:24 -07:00
portelli 7ee577eee6 Travis fix 2016-05-01 11:34:20 -07:00
portelli d27ceb75dd Travis fix 2016-05-01 11:32:28 -07:00
portelli 65c2b794b5 Travis update 2016-05-01 11:23:57 -07:00
portelli de82b08f70 Travis fix 2016-05-01 11:18:58 -07:00
portelli 1d03f515b9 Travis status in README 2016-05-01 11:18:47 -07:00
portelli 1c4c287925 Make.inc generation fix 2016-05-01 11:18:25 -07:00
portelli 10bbfdc3b2 Travis update 2016-05-01 10:58:03 -07:00
portelli e15f0b47c1 Travis fix 2016-05-01 10:54:43 -07:00
portelli 0fd0661be3 Travis fix 2016-05-01 10:49:36 -07:00
portelli 6628806142 Travis debug 2016-05-01 10:47:13 -07:00
portelli 17198a4abd Travis update 2016-05-01 10:43:14 -07:00
portelli beb11fd4ef Merge branch 'master' into hadrons 2016-05-01 10:32:24 -07:00
Peter Boyle 465e6f01b7 Update .travis.yml 2016-04-30 18:04:36 +01:00
Peter Boyle 0eec752216 Update .travis.yml 2016-04-30 17:46:44 +01:00
Peter Boyle 122195384e Update .travis.yml 2016-04-30 17:41:23 +01:00
Peter Boyle 2ae1c14c03 Update .travis.yml 2016-04-30 17:34:38 +01:00
Peter Boyle 0ddb7e707b Update .travis.yml 2016-04-30 17:22:22 +01:00
Peter Boyle e2d8f67f63 Update .travis.yml 2016-04-30 17:11:57 +01:00
Peter Boyle 0d99f62027 Update .travis.yml 2016-04-30 17:08:42 +01:00
paboyle c23375cd65 Testing travis CI integration 2016-04-30 06:30:56 -07:00
paboyle a762a0d9ff Attempt at CIT testing 2016-04-30 06:29:41 -07:00
paboyle f7ca6ca889 Bernoulli reenabled -- using integral type for the discrete_distribution, but
then casts in the fill
2016-04-30 03:48:28 -07:00
paboyle ec4a9b7f6c The Bernoulli gives a no compile due to a static assertion that the type be integral
in 4.7 random.h

Probably need to go through an Integer type, and then conver to real after the random draw
to make clean.
2016-04-30 03:42:24 -07:00
paboyle 5341977948 IMCI fixes. Thought I had committed these. The "real" disambiguation
between std::real and Grid::real shouldn't have been necessary and I don't
know why only the icpc v16.0 on babbage hits it.
May need a longer term rename of Grid::real or some careful EnableIf work.
2016-04-30 03:34:16 -07:00
Peter Boyle f0aed4672e Merge pull request #31 from aportelli/master
Various fixes & updates
2016-04-30 11:31:00 +01:00
portelli d7662b5175 Merge branch 'master' into hadrons 2016-04-30 00:24:59 -07:00
portelli 344d251fc4 re-fix of test Make.inc 2016-04-30 00:24:50 -07:00
portelli dc5f32e5f0 Merge branch 'master' into hadrons 2016-04-30 00:18:31 -07:00
portelli f6c53e5039 Merge commit '1e554350acae0e67fa7177ed0db9d4f684a54af2' 2016-04-30 00:17:52 -07:00
portelli 1869d28429 Hadrons: first prototype with working inversions 2016-04-30 00:17:04 -07:00
portelli 405b175665 Merge branch 'master' into hadrons 2016-04-30 00:16:06 -07:00
portelli ba09cbae3e function to read std::vector from a string (blank separated values) 2016-04-30 00:15:44 -07:00
portelli 6aa000176f Fermion <-> Propagator functions 2016-04-30 00:14:33 -07:00
portelli 23b6172c31 Bernoulli RNG 2016-04-30 00:14:13 -07:00
portelli ca5eebe10c gitignore update 2016-04-30 00:13:53 -07:00
portelli 3f128443ab OS X icpc fix 2016-04-30 00:13:33 -07:00
paboyle 1e554350ac The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug. 2016-04-29 16:49:18 -07:00
paboyle c79ea0dcef Fixingn IMCI 2016-04-22 21:52:54 -07:00
paboyle e3f141f82f Fixed SSE compile with typecasts 2016-04-22 10:30:30 -07:00
paboyle a6dfa2386b GCC choked on intrinsics calls that ICPC did not 2016-04-22 06:33:41 -07:00
Peter Boyle d9b5e66877 Update Make.inc 2016-04-20 18:25:48 +01:00
paboyle 8fd8bc25e9 simd 5th dim with rotation 2016-04-19 15:39:00 -07:00
paboyle ba427abde9 simd 5d 2016-04-19 15:38:39 -07:00
paboyle 9b6ab6db16 simd in 5th dimension support 2016-04-19 15:38:01 -07:00
paboyle 806a83d38b simd in fifth dim support for dwf 2016-04-19 15:36:19 -07:00
paboyle 7223753355 Rotate in a direction > 2 for simd_layout 2016-04-19 15:35:15 -07:00
paboyle b27bac4669 Updates for simd in one dir 2016-04-19 15:34:10 -07:00
paboyle c8a93d6a93 Cartesian changes to allow all simd in one direction 2016-04-19 15:18:12 -07:00
paboyle 04072a5e1f Rotate is a temporary hack. Would like to merge ALL
permutes as rotates of length 2, and make any rotate active
over any subset of lane bits. This is hard, and requires general
permute; current intrinsics mean this is only really possible for specific
case by case encodings as presently performed. Intel could produce a general
permute.. would help. IBM did it in VMX.
2016-04-19 15:15:34 -07:00
paboyle 574ea4f843 const safety 2016-04-19 15:15:11 -07:00
paboyle f2ae9682ff Remove some timing hacks 2016-04-19 15:14:32 -07:00
paboyle 587f80cd93 Updated to compile and pass under intel SDE 2016-04-19 15:13:54 -07:00
paboyle 528eb773ad Merged.
Merge branch 'master' of https://github.com/paboyle/Grid
2016-04-19 22:24:34 +01:00
paboyle e5657510b0 Rotate support for Ls simd-ized 2016-04-19 22:24:18 +01:00
paboyle f473919526 Rotate support 2016-04-19 22:23:51 +01:00
Peter Boyle 8f1b0afc2a Merge pull request #28 from aportelli/master
Build system fix
2016-04-16 09:55:45 +01:00
Peter Boyle 1494b0f397 Merge pull request #29 from giltirn/master
Grid_empty implementation and Lanzcos checkerboard fix
2016-04-16 09:55:24 +01:00
portelli e33b0f6ff7 cleaner output 2016-04-16 08:41:53 +01:00
portelli 9ee54e0db7 debug output removed 2016-04-16 08:41:28 +01:00
portelli feae35d92c Hadrons: pass strings by value 2016-04-16 08:41:12 +01:00
Christopher Kelly ab56ccdd25 -Complete and working implementation of Grid_empty 2016-04-15 13:17:42 -04:00
portelli 3834d81181 Merge branch 'master' into hadrons 2016-04-14 15:15:45 +01:00
portelli cf2f69812b build system fix 2016-04-14 15:13:55 +01:00
neo 339be37dba Debugging smeared HMC 2016-04-13 17:00:14 +09:00
paboyle c323425496 Small change 2016-04-11 10:38:43 +01:00
neo a87b744621 HMC runs but does not accept with smearing on 2016-04-07 16:45:11 +09:00
Christopher Kelly a646260e82 Merge remote-tracking branch 'origin/master' into ckelly-dec12-2015 2016-04-06 13:57:28 -04:00
Christopher Kelly af9c8d1372 -Checkerboard fixes for Lanczos 2016-04-06 13:50:56 -04:00
paboyle 650e02b344 Smaller vols too 2016-04-06 06:52:09 -07:00
paboyle a524ca2a4b New benchmark update 2016-04-06 03:35:56 -07:00
paboyle 23a7176b71 Loop over volumes 2016-04-06 03:22:11 -07:00
paboyle b1192a8908 Benchmark_zmm added 2016-04-06 03:00:07 -07:00
paboyle e8dddb1596 Adding extra benchmark 2016-04-06 10:32:54 +01:00
coppolachan 97d0d56bcb Debugging Smearing routines (set_fj) 2016-04-06 17:58:43 +09:00
paboyle c7ba47bdc7 Merge branch 'master' of https://github.com/paboyle/Grid 2016-04-06 02:56:28 +01:00
coppolachan 7c7ea35ffb Putting the Traceless Antihermitian part outside the deriv in pseudofermion actions 2016-04-05 16:28:09 +09:00
coppolachan 4b1cf580e0 Debugging the Smearing routines 2016-04-05 16:19:30 +09:00
paboyle e67fc2be18 Adding a trial for openmp overhead minimisation 2016-03-31 16:00:37 +01:00
paboyle f473ef7591 Fixing the compile 2016-03-31 07:47:42 -07:00
paboyle f7b1060aed Use headers to clear macros and sub precision 2016-03-31 14:52:37 +01:00
paboyle 8052556275 Cleaning up the single/double kernel implementation switch 2016-03-31 14:51:32 +01:00
paboyle 60d965f79e AVX512 improvements; sigfpe trapping too 2016-03-30 08:42:34 +01:00
paboyle 83b15bfcdd Better Avx512 assembly sequence for SU3 using fmaddsub to get the imag imag sign 2016-03-30 08:39:39 +01:00
paboyle 1ecbf9794d Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-30 08:37:55 +01:00
paboyle 2ded354403 configure 2016-03-30 00:17:43 -07:00
paboyle 340428a1fe Eigen fixes and HDCR work 2016-03-30 00:16:02 -07:00
paboyle c77b7ee897 AddSub based alternate SU3 routine 2016-03-28 17:55:22 -06:00
paboyle b6c3bc574b Moving to a more coherent organisation of the inline assembly and arch dependencies. 2016-03-28 16:24:37 +01:00
paboyle 1e355a51e1 Interface change 2016-03-27 23:46:55 -07:00
paboyle ad80f61fba AVX512 shaken out 2016-03-28 00:38:05 -06:00
paboyle 61469252fe AVX512 shaken out under SDE 2016-03-28 00:37:12 -06:00
paboyle 02198ac5b5 Tolerance and more coverage 2016-03-28 00:36:17 -06:00
paboyle 21abaf7e91 Gamma sign change 2016-03-28 00:35:45 -06:00
paboyle 165bffc2e7 Avx512 changes for assembler kernels 2016-03-26 22:25:45 -06:00
paboyle 644fd6d32e Build avx512 clean 2016-03-25 09:35:33 -07:00
azusa f54e0ec9bd Try lanczos to set up hdcr subspace 2016-03-17 10:36:16 +00:00
paboyle a155a362da Update from HDCR tuning 2016-03-16 02:31:04 -07:00
paboyle 60d4564151 ICC no compile fix 2016-03-16 02:30:40 -07:00
paboyle d4e57f4bc6 IO Bandwidth reporting 2016-03-16 02:30:16 -07:00
paboyle 3920b2c0ab HDCR updates 2016-03-16 02:29:58 -07:00
paboyle 2733c4b93c hdcr updates 2016-03-16 02:29:37 -07:00
paboyle e17c773a0b Longer runs for vtune 2016-03-16 02:29:13 -07:00
paboyle 36a800f26c Microsecond granularity support 2016-03-16 02:28:51 -07:00
paboyle b75da563d9 Resurrect timestamp. Should make optional 2016-03-16 02:28:17 -07:00
paboyle f9faec38be Printing fix under comms none 2016-03-16 02:27:53 -07:00
paboyle d6b64f47d9 Uint64 sum for IO rates 2016-03-16 02:27:22 -07:00
paboyle a359f7a9f5 Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-11 16:07:07 -08:00
paboyle b606deb3f0 Uint64 gsum 2016-03-11 16:06:54 -08:00
portelli 179e82b5ca Merge branch 'master' into hadrons 2016-03-08 12:55:33 +00:00
paboyle 090e7aa930 Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
paboyle 2dce9c3cff HDCR running on 16^3 with 2x-3x speed up. 2016-03-08 01:01:50 -08:00
Jung 1e72bd8b8c Saving Lanczos testing program 2016-03-08 01:49:16 -05:00
paboyle dc72293398 More timing info 2016-03-06 10:46:55 -08:00
paboyle e55c35734b Fix a nocompile 2016-03-03 20:33:28 +00:00
portelli f2c59c8730 Merge branch 'master' into hadrons 2016-03-02 17:15:05 +00:00
paboyle 325e745daa Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-02 07:04:03 -08:00
paboyle 61413565d0 Back off the inlined spin proj as not working 2016-03-02 07:03:09 -08:00
paboyle ff129d9ad9 Redundant operations removed 2016-03-02 07:02:37 -08:00
paboyle 03fcd3b33a Back out of the colour 2016-03-02 07:01:15 -08:00
paboyle 68b02da483 Backing off the colour 2016-03-02 07:00:43 -08:00
paboyle e051119769 extern "C" should have been in the header file, but Cray is apparently not C++ friendly. 2016-03-02 07:00:00 -08:00
portelli fdd0848593 Hadrons: license text update 2016-02-25 12:07:21 +00:00
portelli 92f666905f copyright script update to 80 column text 2016-02-25 12:06:24 +00:00
portelli 5980fa8640 test implementation of DWF inverter 2016-02-25 11:56:16 +00:00
coppolachan 2d8bb356e3 Smearing routines compile (still untested) 2016-02-25 02:43:59 +09:00
Guido Cossu f3661aac4f Merge pull request #24 from aportelli/master
BG/Q compatibility + XML fixes
2016-02-23 18:22:43 +00:00
coppolachan a7251f28c7 Stout smearing compiles (untested) 2016-02-24 03:16:50 +09:00
portelli 1eb169ac0b compatibility fix 2016-02-23 16:36:50 +00:00
portelli a0d8eb2c24 minor code cleaning 2016-02-23 16:33:00 +00:00
portelli 1e10b4571d fix after Grid update 2016-02-23 16:21:45 +00:00
portelli 02f8b84ac9 Merge branch 'master' into hadrons 2016-02-23 16:13:39 +00:00
portelli 5674c3e241 cycle count fix for x86 2016-02-23 16:08:18 +00:00
portelli 62c4ba0d1e gitignore update 2016-02-23 16:01:29 +00:00
Antonin Portelli 497e7e4c53 BG/Q compatibility fix 2016-02-23 15:57:38 +00:00
portelli cfd368596d Merge branch 'master' into hadrons 2016-02-22 15:25:02 +00:00
portelli 19526d09c2 Merge commit '6aeaf6f568a391e34b913f08be6a11beb28d8842' 2016-02-22 15:23:26 +00:00
Peter Boyle 6aeaf6f568 Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
turned up problems on the BlueWaters Cray.

Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00
Peter Boyle 40f2db9bc0 Disable metropolis step until 10 traj covered. Should move to exposing these
in XML input and start having "applications" directory.
2016-02-21 08:01:44 -06:00
Peter Boyle 2cfa20cc4e Improving the logging, got fed up with color so optionally disable.
Backtrace macro used everwhere
2016-02-21 07:58:53 -06:00
Peter Boyle a5f683d124 Machine generated 2016-02-21 07:57:42 -06:00
Peter Boyle 02a57ffa6f machine generated. Should remove from git .. but annoys downloaders 2016-02-21 07:57:02 -06:00
Jung 9f0d9ade68 Added configure flag for LAPACK. Tested ImplicitlyRestartedLanczos::calc()
Checking in before cleaning up
2016-02-20 02:50:32 -05:00
neo c1b1b89d17 More on smearing routines, writing APEsmear (dev) 2016-02-19 17:15:27 +09:00
neo 771235017d Adding smearing routines (development) 2016-02-19 15:30:41 +09:00
paboyle 3425751cb8 Missing return value 2016-02-19 01:06:03 +00:00
paboyle db5e8050a8 Attempts at some optimisation 2016-02-18 22:33:58 +00:00
paboyle a3fbabf404 Bug fix 2016-02-18 18:08:24 +00:00
Peter Boyle 22422a84d9 Small problem in compressor fix 2016-02-17 19:03:09 -06:00
Peter Boyle b6f6da923e Change to the compressor & stencil interface a little. 2016-02-17 18:27:11 -06:00
Peter Boyle c9fadf97a5 Simplify the compressor interface again. 2016-02-17 18:16:45 -06:00
Peter Boyle c650bb3f3d Very small merge speed up. 2016-02-16 18:41:53 -06:00
Peter Boyle 81395e85d1 Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it. 2016-02-16 13:56:44 -06:00
Peter Boyle 340a29b735 More careful sequencing of comms 2016-02-15 16:04:59 -06:00
Peter Boyle f7be108e35 100 iters faster 2016-02-15 16:03:04 -06:00
Peter Boyle a0fc47c6f9 Cheaper implementation 2016-02-15 16:02:36 -06:00
Peter Boyle 42a9ac71d2 BUg fix, wait till complete. 2016-02-14 16:21:21 -06:00
Peter Boyle 41c2b09184 Shmem comms [NO MPI] target added. The dwf test runs and passes.
Not really shaken out to my satisfaction though as I want more tests done, so don't declare as working.
But committing my current while I try a few experimentals.
2016-02-14 14:24:38 -06:00
paboyle 294dbf1bf0 Compile on OpenMPI shmem 2016-02-11 23:45:51 +00:00
Peter Boyle 9548c8b91f Had to break this out for universal access through the code base. 2016-02-11 07:40:09 -06:00
Peter Boyle 7f927a541c Shmem related fixes for shmem compile 2016-02-11 07:37:39 -06:00
paboyle e2f73e3ead Updates for shmem 2016-02-10 16:50:32 -08:00
neo 6371676a75 Correcting some compilation errors for clang-sse 2016-02-10 11:37:03 +09:00
Jung bd84c23298 definitions reconciled. 2016-01-25 16:30:59 -05:00
Jung 7aa8d5e8af Faiing to compile, comparing with master 2016-01-25 16:03:02 -05:00
Jung 6012b0ec23 Checking in changes before changing to chulwoo-dec12-2015 2016-01-25 09:40:58 -05:00
Jung 411ac49dd7 GparityWilsonTM typedef added. Not yet tested
Conflicts:
	configure
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-25 01:36:28 -05:00
Jung b8fb05a422 Addtional routines for Lanczos (SYM2, Chebyshef).. 2016-01-25 01:26:25 -05:00
portelli ae682674e0 Hadrons: first full implementation of the scheduler 2016-01-13 20:23:51 -08:00
portelli 17c43f49ac Hadrons: application class now take parameter file name as argument 2016-01-13 20:22:37 -08:00
portelli 30146e977c gitignore update 2016-01-13 20:20:43 -08:00
Jung 5c57d4f403 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-11 11:36:45 -05:00
paboyle fc6ad65751 Pushed the overlap comms tweaks 2016-01-11 06:34:22 -08:00
paboyle dafc74020c Overlap comms compute improvements in hand op kernels, and better timing from Edison and Cori 2016-01-10 16:54:27 -08:00
paboyle d19321dfde Overlap comms compute changes 2016-01-10 19:20:16 +00:00
Jung 5924e5a562 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	configure
	lib/qcd/action/Actions.h
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-06 03:44:57 -05:00
paboyle c99d748da6 Timing reports in benchmarks now reflect the asynch comms thread statistics 2016-01-04 14:42:16 +00:00
paboyle 02452afd36 Optional overlap of comms with compute 2016-01-04 14:18:40 +00:00
paboyle 331768dcff Added overlap comms compute mode 2016-01-03 01:38:11 +00:00
paboyle 4aac345bea Updated logging to colour code according to message type 2016-01-02 17:21:14 +00:00
paboyle 15c0022042 GPLv2 clarified, and copyright message and banner in Init function.
Color is just showing off....
2016-01-02 15:22:30 +00:00
paboyle aae8bf31a7 Global edit adding copyright and license info to every source file. 2016-01-02 14:51:32 +00:00
paboyle 1e68b1c1bd Create a benign default for gparity twists 2016-01-02 14:06:53 +00:00
paboyle 491a708225 Twist params set up correctly in gparity even odd 2f DWF + Wilson HMC test 2016-01-02 14:02:41 +00:00
paboyle 5a80930dd2 Charge conjugation boundary conditions for gauge fields implemented as a policy
class, changing the nature of covariant Cshifts used in
plaquettes, rectangles and staples.

As a result same code is used for the plaq and rect action independent of the BC type.

Should probably isolate the BC in a separate class that Gimpl takes as a template param.
Do the same with smearing policies.

This would then allow composition of BC with smearing etc....
2016-01-02 13:37:25 +00:00
paboyle 145a295231 Bug fix for stencil with large shifts (3+), would be important to naik term for example but did not
impact Wilson based nearest neighbour stencils.
2015-12-30 19:29:48 +00:00
paboyle 841a37f941 Fix to WilsonCompressor that fixes a bug in comms phase due to the sign change on gamma
matrix in hopping term.
Add logging of time spent in CG.
2015-12-29 23:49:41 +00:00
Azusa Yamaguchi e6cad3821c Logging improvement 2015-12-29 19:51:18 +00:00
Azusa Yamaguchi 98de1cbb6a Optimised version of rectangle term staples.
~3.4x faster than the naive.
2015-12-29 19:22:59 +00:00
Azusa Yamaguchi f7d61b8b81 Plaq plus rectangle and Iwasaki, Symanzik DBW2.
http://arxiv.org/pdf/hep-lat/0610075.pdf plaq and rect regress plausibly over 100 trajectories
and under HMC with average plaq and rectangle coming out ok.
2015-12-28 16:39:26 +00:00
Azusa Yamaguchi 78c4e862ef Plaq, Rectangle, Iwasaki, Symanzik and DBW2 workign and HMC regresses to http://arxiv.org/pdf/hep-lat/0610075.pdf 2015-12-28 16:38:31 +00:00
portelli 54eacec261 Hadrons: namespace std not used anymore in compiled sources 2015-12-23 14:30:33 +00:00
portelli 76c78f04e2 Hadrons: first complete prototype for run loop 2015-12-23 14:21:35 +00:00
portelli 379580cd89 Merge branch 'master' into hadrons 2015-12-23 14:20:22 +00:00
portelli 1e0be161e5 MacroMagic: inline functions to avoid double symbol issues 2015-12-23 14:20:05 +00:00
paboyle 0afcf1cf13 Moved all the HMC tests over to using a single HmcRunner class that manages checkpoint strategies and such like 2015-12-22 11:19:25 +00:00
paboyle 08edbb5cbe HMC bit repro across checkpoints. Fixed parallel RNG issue with threading.
Conclusion: c++11 distributions not thread safe and must us distinct dist as well as distinct engine
per site. Makes sense when you think of box muller. Also added a reset of dist on fill to ensure
repro across checkpoints.
2015-12-22 08:54:40 +00:00
paboyle 0abfbcc8eb Naming of files improvement. 2015-12-21 15:37:26 +00:00
paboyle 1b94253ba4 Logging improvement 2015-12-21 15:36:28 +00:00
paboyle 36e6f9ac7b Bug fix. Guess not initialised in refresh step; didn't hit before due to luck in not having a vector
created with NAN data.
2015-12-21 15:34:35 +00:00
paboyle 2f41691c11 Bug fix. Guess was not zeroed prior to CG call. Was earlier accidentally benign just due to luck. 2015-12-21 15:33:36 +00:00
paboyle 09bfe52840 Remove extraneous variable 2015-12-21 15:30:28 +00:00
paboyle 8c9010d0f4 Isnan check on guess and convergence assert on result 2015-12-21 15:29:46 +00:00
paboyle 42c583265c Remove timestamp 2015-12-21 15:28:03 +00:00
paboyle 539d698492 Prototypes for CML routines 2015-12-21 15:26:42 +00:00
paboyle 31ca609d12 HMC checkpointing .
Need a general HMC framework to work in restart.
2015-12-20 02:29:51 +00:00
paboyle 5710966324 Options to use mersenne twister OR ranlux48 via --enable-rng flag at configure time.
Can save and restore RNG state via new (serial) I/O routines in a NERSC header style file.
Store a Parallel (one per site) and a single serial RNG file.
2015-12-19 18:32:25 +00:00
paboyle e108e708a3 Wilson TM tests and compiles in 2015-12-17 23:06:33 +00:00
paboyle 6f0198d4d9 Merge branch 'master' of https://github.com/paboyle/Grid 2015-12-17 22:34:54 +00:00
paboyle 67ccb043f1 Added TM fermions for DSDR etc.. 2015-12-17 22:34:28 +00:00
Azusa Yamaguchi 24a5a81c53 SSE compile fix 2015-12-16 09:09:37 +00:00
Jung eb1759d7ea Added Gparity instantiation to no HANDOPT case
deleted configure (as intended?)
2015-12-16 00:04:09 -05:00
paboyle 34a0fde2ad Fixes to fermion force terms after sign of gamma_mu (0...3) change.
Thought I had already committed these.

Believe I have got the Gparity fermion force working.

* tests/Test_gpdwf_force.cc     -- correctly predicts dS for two flavour pseudofermion
                                   based on a small dt update of U field.

* tests/Test_hmc_EODWFRatio_Gparity.cc -- ran 1 trajectory on 8^4 with dH=0.21.

Need to accumulate a full plaquette log to believe fully which will take some hours of run time.
2015-12-15 23:14:12 +00:00
Jung bc34b7e808 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	lib/qcd/action/fermion/WilsonKernels.h
	tests/Make.inc
2015-12-15 11:11:59 -05:00
Jung 284453c5e9 Added gparity mobius defs, added params to ScaledShamir
checking in before puling master
2015-12-14 12:15:06 -05:00
paboyle af855cc129 Updating to fix peek poke to checkerboarded arrays since Chulwoo needs this. 2015-12-12 07:11:46 +00:00
paboyle a5314eddbb Update todo list 2015-12-10 23:34:03 +00:00
paboyle 78ca15fdd8 Merge branch 'aportelli-master' 2015-12-10 23:15:16 +00:00
paboyle 47fe6b5a7c Merge branch 'master' of https://github.com/aportelli/Grid into aportelli-master 2015-12-10 23:14:52 +00:00
paboyle b3ef09a54d Merge branch 'master' of https://github.com/paboyle/Grid 2015-12-10 23:05:38 +00:00
paboyle 8ed3940048 New files for Chroma regression 2015-12-10 22:55:59 +00:00
paboyle 3ce10aa975 Fix a regression failure on Mobius; chroma regression added 2015-12-10 22:55:00 +00:00
Azusa Yamaguchi a32a59fc43 Merge branch 'master' of https://github.com/paboyle/Grid 2015-12-09 12:48:44 +00:00
portelli 14a80733f9 Merge branch 'master' into hadrons 2015-12-08 13:57:53 +00:00
portelli 200de272ed IO: serialisable enums 2015-12-08 13:54:00 +00:00
portelli d68a72e28b IO: code cleaning and string binary IO fix 2015-12-08 13:53:33 +00:00
portelli ab45f029f4 ignore Config.h.in 2015-12-08 13:52:44 +00:00
Jung 77054bd61c Added back Test_gparity 2015-12-08 01:41:32 -05:00
portelli 17f9268a55 XmlIO: minor code cleaning 2015-12-07 18:30:00 +00:00
portelli 78f0c2595d autotool file accidentally committed 2015-12-07 18:28:06 +00:00
portelli d4db009a58 Hadrons: starting scheduler implementation 2015-12-07 18:26:38 +00:00
portelli 20ce7e0270 Hadrons: algorithm to determine all possible topological ordering 2015-12-07 15:46:36 +00:00
Jung f2b4edc090 Fixes for Gparity comparison with CPS (Instantiation, Gamma matrix convention) 2015-12-07 02:04:57 -05:00
Jung fb81acca3c Merge branch 'master' of https://github.com/paboyle/Grid 2015-12-03 12:11:10 -05:00
portelli bb195607ab Hadrons: fix in topological sort algorithm name 2015-12-02 19:40:11 +00:00
portelli 6f090e22c0 Hadrons: graph topological sort 2015-12-02 19:33:34 +00:00
portelli 339e983172 Merge branch 'master' into hadrons 2015-12-02 14:38:04 +00:00
portelli 4a7f3d1b7b Merge branch 'master' into hadrons
# Conflicts:
#	configure
2015-12-02 10:57:51 +00:00
Peter Boyle 26161addd0 Warn fix clang 2015-11-29 11:19:12 +00:00
paboyle 93356fd246 No compile fixes on gcc/Cray 2015-11-29 03:14:44 -08:00
Peter Boyle f35fc4b76c No compile fixes 2015-11-29 10:59:11 +00:00
paboyle ca42fe6d32 Merge branch 'master' of github.com:paboyle/Grid
Merge done
Conflicts:
	lib/serialisation/XmlIO.h
	tests/Test_stencil.cc
2015-11-28 17:03:43 -08:00
paboyle b8a38f292d Domain decomposition SAP precon implemented and working but not as fast as I hoped. 2015-11-28 17:01:51 -08:00
paboyle 6b97b271ae Integer divide useful 2015-11-28 17:01:20 -08:00
paboyle fa01ae5980 integer divide 2015-11-28 17:00:34 -08:00
paboyle 113131b01c THis failed for some reason. Suspect Antonin has made more progress. 2015-11-28 16:59:59 -08:00
paboyle b2c02a6106 Runs fastst on cori 2015-11-28 16:58:16 -08:00
paboyle 02d730513a Divide function 2015-11-28 16:54:43 -08:00
paboyle d875c2bd39 More verbose useful 2015-11-28 16:54:19 -08:00
paboyle cc32ba615a Verbose changes 2015-11-28 16:53:54 -08:00
paboyle 6684739452 Better to drop KMP_AFFINITY override 2015-11-28 16:52:44 -08:00
Peter Boyle bc4b252883 Merge branch 'master' of https://github.com/paboyle/Grid 2015-11-29 00:33:01 +00:00
Peter Boyle 11cf0f08f3 This file is not yet debugged. 2015-11-29 00:32:45 +00:00
Peter Boyle fff0f00552 Modest changes 2015-11-29 00:31:57 +00:00
Peter Boyle 42e6055746 Try 1/x for hermitian indef approx 2015-11-29 00:31:19 +00:00
Peter Boyle 01231ce824 Stencil fix 2015-11-29 00:31:02 +00:00
Peter Boyle ef84d54033 precision set 2015-11-29 00:30:44 +00:00
Peter Boyle 41e8038c56 Makefile update 2015-11-29 00:30:19 +00:00
Peter Boyle 8a33846095 No compile fix 2015-11-29 00:29:58 +00:00
Peter Boyle 54f04ee5c9 Perf event interface was linux specfic and use ifdef to protect 2015-11-29 00:24:48 +00:00
Peter Boyle 825875fd48 compile fixes 2015-11-29 00:24:25 +00:00
Peter Boyle f8290bfd58 Compile fixes 2015-11-29 00:24:04 +00:00
Azusa Yamaguchi 967be91692 update merge 2015-11-26 09:51:41 +00:00
azusayamaguchi d43034d3ac Merge pull request #21 from aportelli/master
Overhaul of I/O interface
2015-11-19 11:45:50 +00:00
portelli 06f8ecea04 Merge commit '899ca41cb8c8f47771bfd37cd895cbc2184e5560' 2015-11-16 18:16:25 +00:00
portelli af19118113 new I/O interface 2015-11-16 18:14:37 +00:00
paboyle e9ff25b06b Small threading change makes a difference on Cori. 2015-11-07 00:07:05 -08:00
paboyle 05a7029600 Stencil change 2015-11-07 00:06:31 -08:00
paboyle b04b8914fd EXECINFO change 2015-11-07 00:05:57 -08:00
paboyle 7522e3f0dd Stencil interface change fix no compile 2015-11-07 00:05:10 -08:00
paboyle 1cc0d7b811 Bigger ncall as timing loops got small on cori 2015-11-07 00:04:40 -08:00
paboyle 899ca41cb8 Merge branch 'master' of github.com:paboyle/Grid
Conflicts:
	lib/qcd/action/fermion/WilsonFermion5D.cc
2015-11-06 03:50:04 -08:00
paboyle d29b4c1dee Assembler files 2015-11-06 03:48:48 -08:00
paboyle a2ff068e29 Asm and threading for many core 2015-11-06 03:47:14 -08:00
paboyle b362f8d27b Threading for many core 2015-11-06 03:46:41 -08:00
paboyle 64770d9052 Threading changes for many core and asm calls 2015-11-06 03:46:21 -08:00
paboyle 17af18dcab Changes for AVX512 assembler 2015-11-06 03:45:51 -08:00
Peter Boyle 28022755ae Stencil class name global change to StencilImpl typedef 2015-11-06 05:30:17 -06:00
Peter Boyle 98d8ba6d14 Remove autogen files from CVS 2015-11-06 05:29:07 -06:00
Peter Boyle 27813cf518 More timing detail reported 2015-11-06 05:27:13 -06:00
Peter Boyle 955b482aaf Partial optimisation of the extraction/merger of simd vecs. 2015-11-06 05:26:20 -06:00
Peter Boyle f9b2fce93b Changing whole stencil class to be template and not just single functions 2015-11-06 05:25:10 -06:00
Peter Boyle 473fa28a6c Partial optimisation; comms in x-dir for red black dslash will be slow as the checker skipping block strided
loops are non threadable. Will need to write a kernel for these instead and drive them with a lookup table
to make a look sufficiently simple to thread.
2015-11-06 05:23:23 -06:00
Peter Boyle 5d854c869c Stencil interface changes 2015-11-06 05:22:33 -06:00
Peter Boyle 880ff88362 Comms optimisation 2015-11-06 05:22:18 -06:00
Peter Boyle f85b9ddd97 Remove nonfunctioning lanczos 2015-11-06 05:21:21 -06:00
Azusa Yamaguchi 4690acc3c8 Don't know why peter committed these as they didn't compile 2015-11-06 10:31:48 +00:00
Azusa Yamaguchi 3281745fde Exec info and linux check to stop non-portable code breaking 2015-11-06 10:31:24 +00:00
Azusa Yamaguchi c2d96644a0 EXEC INFO check 2015-11-06 10:31:05 +00:00
paboyle 1159de165c Asm option for AVX512 2015-11-05 22:04:51 -08:00
portelli c4e2202550 First graph class implementation and test 2015-11-05 14:28:14 +00:00
paboyle 16c7993434 Merge branch 'master' of github.com:paboyle/Grid
Conflicts:
	lib/simd/Grid_avx512.h
	lib/simd/Grid_imci.h
2015-11-04 03:32:10 -08:00
paboyle 6be9716e6f New file 2015-11-04 03:26:28 -08:00
paboyle 32762346ad Better run time on KNC 2015-11-04 03:25:34 -08:00
paboyle 4a41c885ed Use Linux kernel interface to hardware performance counters. Dead useful. 2015-11-04 03:24:19 -08:00
paboyle 0f48658a27 Update minor 2015-11-04 03:23:46 -08:00
paboyle 757b31ed42 Threading for KNC mods. 2015-11-04 03:22:14 -08:00
paboyle 5aafdd7e1a Inline asm for KNL, KNC, Skylake Xeon 2015-11-04 03:21:15 -08:00
paboyle ac7d1f26ad Either blocking or lebesgue curve 2015-11-04 03:19:16 -08:00
paboyle 1a8bf938b3 Use either sub-blocking or lebesgue 2015-11-04 03:18:51 -08:00
paboyle 63a2993827 Exec info an cache blocking 2015-11-04 03:16:56 -08:00
paboyle 4e65ad21ac Adding a routine for AVX512 / IMCI with explicit assembly implementations 2015-11-04 03:15:08 -08:00
Peter Boyle dfc1de6f60 Merge branch 'master' of github.com:paboyle/Grid 2015-11-04 05:14:26 -06:00
Peter Boyle f87526a04f Make ICC happy 2015-11-04 05:14:03 -06:00
Peter Boyle 3b7576ad53 Switch off for now 2015-11-04 05:13:29 -06:00
paboyle 9b5d31ffc1 mac , mult routines
Lines# with '#' will be ignored, and an empty message aborts the commit.
2015-11-04 03:10:34 -08:00
paboyle a38762159c Inline assembly hooks for AVX 512. Better way in some ways than BAGEL to generate assembly.
Updated Grid_avx512.h
2015-11-04 03:09:06 -08:00
Peter Boyle ffc5dab17f AMD FMA4 support added for Interlagos/BlueWaters 2015-11-04 04:29:58 -06:00
Peter Boyle 96608c70d1 chrono causing some problems on Cray systems. Suspend use for now 2015-11-04 04:28:31 -06:00
Peter Boyle d35d63b171 Algorithm in 2015-11-04 04:27:44 -06:00
Peter Boyle 9183920e8b Added an even odd stencil test, shook out a problem with spread out x-direction.
Generalise test to allow different types of "Field" to be used.
2015-11-04 10:03:04 +00:00
Peter Boyle 01f286c9fe Better testing for red black cshift which was sufficient to chase down a spread out x-direction problem. 2015-11-04 10:02:17 +00:00
Peter Boyle 24044dbc56 Debugged a problem with checkerboarded cshift in the checker dimension which arose
only when mpi spread out in the checker dimension. Added a test that trapped and helped debug this
2015-11-04 10:00:55 +00:00
Peter Boyle abb23df83f formatting only 2015-11-04 10:00:27 +00:00
Peter Boyle 12c5ec813c Useful debug messages (commented out) are included for preservation in case I need to revisit this 2015-11-04 09:59:27 +00:00
Peter Boyle 1271508ca2 Bug fix for spread out in x (EO) direction.
This is really annoying -- it is very hard to thread the loops with the index
recursion on buffer offset in the red-black case. Must think of a good threading
solution here.
2015-11-04 09:57:57 +00:00
Peter Boyle ec5af35166 EO bug fix when spread out in x-direction 2015-11-04 09:56:58 +00:00
Peter Boyle b3d70a3bb2 Ncall change 2015-11-04 09:55:21 +00:00
Peter Boyle c26220e9ab EO benchmark as well as non-eo 2015-11-04 09:54:48 +00:00
Peter Boyle 0f59356e86 Problem in comms fixed 2015-11-02 00:00:15 +00:00
portelli 538b16610b First commit for measurement software 'Hadrons' 2015-10-27 17:33:18 +00:00
portelli 8709117aea Log: generalised Logger class to allow separate logs in Grid-based applications 2015-10-27 17:31:13 +00:00
portelli 1b22ce5720 tests Make.inc fix 2015-10-27 10:47:52 +00:00
portelli e6b9aa9076 Config.h removed form repository 2015-10-27 10:47:07 +00:00
portelli d9f2e2e06a Merge pull request #2 from paboyle/master
Update from Peter
2015-10-19 14:52:52 +01:00
Peter Boyle 41299da406 files added 2015-10-09 01:01:46 +02:00
Peter Boyle 8889af45ca FMA4 added 2015-10-09 01:00:53 +02:00
Peter Boyle d4289a33b8 AMD FMA4 addition 2015-10-09 00:44:20 +02:00
Peter Boyle 83afb2e26a Poly support for lanczos 2015-10-09 00:43:21 +02:00
Peter Boyle 3726fe7481 Bigger vec length 2015-10-09 00:42:54 +02:00
Peter Boyle 6d06bd9493 Minor change in commented out code 2015-10-09 00:42:21 +02:00
Peter Boyle 6ee23f409e Lanczos addition 2015-10-09 00:41:00 +02:00
Peter Boyle 2d95dac6b6 Lanczos untested/partially tested additions. In middle of shake out but at least compiles 2015-10-09 00:40:25 +02:00
Peter Boyle 44fecd4d8d Lanczos test 2015-10-09 00:39:21 +02:00
Peter Boyle 814c79f38d SIMD improvements for mac and madd use in complex for avx, sse 2015-10-09 00:38:52 +02:00
paboyle 1878bf97d0 Babbage fix 2015-09-30 16:04:01 -07:00
paboyle 3a478e5f2a No compile babbage fix 2015-09-30 16:03:05 -07:00
paboyle a660ce716b No compile babbage fix 2015-09-30 16:02:44 -07:00
paboyle f4b6d1dfea NGO stores reenabled 2015-09-30 16:02:14 -07:00
paboyle 23813ac798 No compile on babbage fix 2015-09-30 16:01:28 -07:00
paboyle af89c40462 Better timing tweaks to give sensible results on 24 threads on Edison dual ivybridge nodes. 2015-09-28 16:09:04 -07:00
Peter Boyle 9f4f65cb46 Added a decoupled memory system benchmark to remove thread synch overhead 2015-09-26 18:23:57 -07:00
Peter Boyle 64d64d1ab6 Updating to modify non-inlining permute routines and hopefully get better reg use and
enhance performance.
2015-09-25 08:55:04 -07:00
Peter Boyle 5ef42add2d Changes to remove warnings under icc; disambiguate AVX512 from IMCI correctly
and drop swizzles in AVX512. Don't know why these compiled.
2015-09-23 05:23:45 -07:00
Peter Boyle 2f38ebc446 Reintroducing the hand unrolled loops 2015-09-08 17:45:30 +01:00
Peter Boyle 638d6675ee Tested rms dH is ~ dt^4 numerically, so believe the ForceGradient is correct now.
Paranoia makes me want to diddle with the FG step to ensure dt^2 reappears.
2015-08-31 16:33:20 +01:00
Peter Boyle 357c6ab46d Reunitarise. Complete the HMC and integrator changes. 2015-08-31 16:32:04 +01:00
Peter Boyle 755dca9533 Added ForceGradient integrator. dH dropped so seems to work. Will only
believe it is right once I have pulled a dt^4 error scaling plot out.
2015-08-31 06:23:02 +01:00
Peter Boyle 29fd004d54 Unified integrator and integrator algorithm into virtual class used as a policy for the
HMC.
2015-08-30 13:39:19 +01:00
Peter Boyle eed889ea05 Update on todo list 2015-08-30 12:23:08 +01:00
Peter Boyle aa52fdadcc Global edit on HMC sector -- making GaugeField a template parameter and
preparing to pass integrator, smearing, bc's as policy classes to hmc.

Propose to unify "integrator" and integrator algorithm in a base/derived
way to override step. Want to read through ForceGradient to ensure
that abstraction covers the force gradient case.
2015-08-30 12:18:34 +01:00
Peter Boyle 76d752585b Started a tidy up in the HMC sector. Now comfortable with the two level integrators;
to a little figure out what Guido had done & why -- but there is a neat saving of force
evaluations across the nesting time boundary making use of linearity of the leapP in dt.

I cleaned up the printing, reduced the volume of code, in the process sharing printing
between all integrators. Placed an assert that the total integration time for all integrators
must match at end of trajectory.

Have now verified e-dH = 1 for nested integrators in Wilson/Wilson runs with both
Omelyan and with Leapfrog so substantial confidence gained.
2015-08-29 17:18:43 +01:00
Peter Boyle dc814f30da Binary IO file for generic Grid array parallel I/O.
Number of IO MPI tasks can be varied by selecting which
dimensions use parallel IO and which dimensions use Serial send to boss
I/O.

Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes
doing the I/O.

Interpolates nicely between ALL nodes write their data, a single boss per time-plane
in processor space [old UKQCD fortran code did this], and a single node doing all I/O.

Not sure I have the transfer sizes big enough and am not overly convinced fstream
is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero.

Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations
on my MacOS + OpenMPI and Clang environment.

It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from
each node in order to gather bigger chunks at the syscall level.

That would push us up to the circa 8x 18*4*8 == 4KB size write chunk, and by taking, say, x/y non
parallel we get to 16MB contiguous chunks written in multi 4KB transactions
per IOnode in 64^3 lattices for configuration I/O.

I suspect this is fine for system performance.
2015-08-26 13:40:29 +01:00
Peter Boyle 612957f057 pull in original license. 2015-08-21 10:19:08 +01:00
Peter Boyle cea8ac9a22 Credits to orig source where I found the macro tricks. 2015-08-21 10:14:53 +01:00
Peter Boyle 476da3ee62 Separated IO reader/writers into a proper abstract base,
derived relationship. Have Text/Binary/Xml versions of
Reader & Writer.

Any new Reader/Writer class inheriting the interface can give object serialisation
to any desired format now.

      new file:   lib/serialisation/BaseIO.h
      modified:   lib/serialisation/BinaryIO.h
      modified:   lib/serialisation/Serialisation.h
      modified:   lib/serialisation/TextIO.h
      modified:   lib/serialisation/XmlIO.h

The test uses the Xml, Binary and Text formats as well as cout << Object.
2015-08-21 10:06:33 +01:00
Peter Boyle 35818fdf6c Text and Binary readers 2015-08-20 23:04:38 +01:00
Peter Boyle 091785e5f5 Better list 2015-08-20 17:19:48 +01:00
Peter Boyle 77d299b414 Cosmetic 2015-08-20 16:30:52 +01:00
Peter Boyle ab81a25073 XMLReader implementation and a virtual Reader/Writer template framework.
Test_serialisation has an example of *code* *free* object serialisation
to both ostream and to XML using macro magic.

Implementing TextReader/TextWriter, YAML, JSON etc.. should be trivial
and we can use configure time options to select the default "Reader" typedef.

Present done with

"using XMLPolicy::Reader"

to pick up the default serialisation strategy.
2015-08-20 16:21:26 +01:00
portelli dd498f993e Merge pull request #1 from paboyle/master
Sync with Peter
2015-08-19 17:27:31 +02:00
Peter Boyle fdfe194c41 Threading bug in RNG fill fixed. 2015-08-19 14:41:05 +01:00
Peter Boyle 8b070ae54c Gparity now accepting twists through constructor 2015-08-19 11:26:01 +01:00
Peter Boyle 4e085dd0ed Domain wall even-odd 2f HMC with wilson gauge and PV 2f ratio now running and giving small dH.
Azusa is working hard on the rectangle term and we'll hopefully start reproducing plaquettes
from RBC-UKQCD parameters soon !

My new laptop is pretty warm and is starting to groan ;)
2015-08-19 10:26:07 +01:00
Peter Boyle e8d63c9178 Merge branch 'master' of https://github.com/paboyle/Grid 2015-08-19 05:49:00 +01:00
Peter Boyle c54c086f17 Even odd preconditioned one flavour ratio
(no support for non-const EE schur block)
2015-08-19 05:46:58 +01:00
Peter Boyle dd6bb73ee0 Added one flavour rational ratios (unprec) 2015-08-19 04:58:40 +01:00
Peter Boyle fc160eeccc Added one flavour rational ratios (unprec) 2015-08-19 04:58:40 +01:00
Peter Boyle 48db72259e EvenOdd schur decomposed mpcdagmpc version of rhmc determinant.
dH is also small and plaquette looks right.
2015-08-18 18:37:39 +01:00
Peter Boyle 570150f1d3 EvenOdd schur decomposed mpcdagmpc version of rhmc determinant.
dH is also small and plaquette looks right.
2015-08-18 18:37:39 +01:00
Peter Boyle 9c7840c3a7 rhmc for 1+1 wilson is conserving dH~0.
A good days work  ;)
2015-08-18 16:58:56 +01:00
Peter Boyle aef98b7226 rhmc for 1+1 wilson is conserving dH~0.
A good days work  ;)
2015-08-18 16:58:56 +01:00
Peter Boyle 5c364f8082 One flavour rational unprec added; untested but does compile.
Moving param structs into a single header for later connection to file I/O using
macromagic.h
2015-08-18 14:40:08 +01:00
Peter Boyle a842a6c94d One flavour rational unprec added; untested but does compile.
Moving param structs into a single header for later connection to file I/O using
macromagic.h
2015-08-18 14:40:08 +01:00
Peter Boyle 2dd9ad7b0f Update TODO list 2015-08-18 10:43:32 +01:00
Peter Boyle cd242a2637 Update TODO list 2015-08-18 10:43:32 +01:00
Peter Boyle bdcbfe9310 Even Odd two flavour ratio added and dH == small 2015-08-18 10:37:08 +01:00
Peter Boyle 9306921ded Even Odd two flavour ratio added and dH == small 2015-08-18 10:37:08 +01:00
Peter Boyle 76f3855629 Merge branch 'master' of https://github.com/paboyle/Grid 2015-08-18 09:23:58 +01:00
Peter Boyle 8621e2409f Merge branch 'master' of https://github.com/paboyle/Grid 2015-08-18 09:23:58 +01:00
Peter Boyle 6212807a77 Small dh obtained in two flavour ratio so looks ok. 2015-08-18 09:21:29 +01:00
Peter Boyle 7622f0c441 Small dh obtained in two flavour ratio so looks ok. 2015-08-18 09:21:29 +01:00
Peter Boyle 0bc38a69ce Adding PV pseudofermion in prep for DWF HMC.
Not compiled this yet, but cloned in from BFM.
2015-08-18 09:19:42 +01:00
Peter Boyle 25d0eae50c Adding PV pseudofermion in prep for DWF HMC.
Not compiled this yet, but cloned in from BFM.
2015-08-18 09:19:42 +01:00
Peter Boyle 24382d77bb Adding PV pseudofermion in prep for DWF HMC.
Not compiled this yet, but cloned in from BFM.
2015-08-17 23:14:48 +01:00
Peter Boyle ef6a9e6b07 Adding PV pseudofermion in prep for DWF HMC.
Not compiled this yet, but cloned in from BFM.
2015-08-17 23:14:48 +01:00
Peter Boyle 353d66def1 Unused apparently 2015-08-16 01:41:05 +01:00
Peter Boyle b8166af92b Unused apparently 2015-08-16 01:41:05 +01:00
Peter Boyle afeabe0d23 Tidying 2015-08-16 00:14:10 +01:00
Peter Boyle 6180487517 Tidying 2015-08-16 00:14:10 +01:00
Peter Boyle 2d6b97be06 Merge branch 'master' of https://github.com/paboyle/Grid 2015-08-16 00:13:14 +01:00
Peter Boyle 0e088d2264 Merge branch 'master' of https://github.com/paboyle/Grid 2015-08-16 00:13:14 +01:00
Peter Boyle 53da927c3c Merge branch 'master' of https://github.com/paboyle/Grid 2015-08-15 23:59:04 +01:00
Peter Boyle f0e32f12cf Merge branch 'master' of https://github.com/paboyle/Grid 2015-08-15 23:59:04 +01:00
Peter Boyle c7b50d18e7 Merge branch 'master' of https://github.com/paboyle/Grid 2015-08-15 23:56:31 +01:00
Peter Boyle 155c164b0c * Finished the template/policy style introduction of gparity, except the gparity force terms.
So valence sector looks ok.

FermionOperatorImpl.h provides the policy classes.

Expect HMC will introduce a smearing policy and a fermion representation change policy template
param. Will also probably need multi-precision work.

* HMC is running even-odd and non-checkerboarded (checked 4^4 wilson fermion/wilson gauge).

There appears to be a bug in the multi-level integrator -- <e-dH> passes with single level but
not with multi-level.

In any case there looks to be quite a bit to clean up.

This is the "const det" style implementation that is not appropriate  yet for clover since
it assumes that Mee is indept of the gauge fields. Easily fixed in future.
2015-08-15 23:25:49 +01:00
Peter Boyle 55cfc89459 * Finished the template/policy style introduction of gparity, except the gparity force terms.
So valence sector looks ok.

FermionOperatorImpl.h provides the policy classes.

Expect HMC will introduce a smearing policy and a fermion representation change policy template
param. Will also probably need multi-precision work.

* HMC is running even-odd and non-checkerboarded (checked 4^4 wilson fermion/wilson gauge).

There appears to be a bug in the multi-level integrator -- <e-dH> passes with single level but
not with multi-level.

In any case there looks to be quite a bit to clean up.

This is the "const det" style implementation that is not appropriate  yet for clover since
it assumes that Mee is indept of the gauge fields. Easily fixed in future.
2015-08-15 23:25:49 +01:00
Peter Boyle f40475f382 Reorganising the Fermion interface 2015-08-14 14:16:45 +01:00
Peter Boyle ba8c09a58e Reorganising the Fermion interface 2015-08-14 14:16:45 +01:00
Peter Boyle 045c85823b Extra test 2015-08-14 13:18:59 +01:00
Peter Boyle b3b46fd456 Extra test 2015-08-14 13:18:59 +01:00
Peter Boyle e8462790a9 Extra test 2015-08-14 13:18:59 +01:00
Peter Boyle cc63078de5 Gparity works now even if simd distributed in a Gparity twist direction.
Tested by doubling lattice in t-direction.
2015-08-14 12:57:42 +01:00
Peter Boyle 59d66eb17a Gparity works now even if simd distributed in a Gparity twist direction.
Tested by doubling lattice in t-direction.
2015-08-14 12:57:42 +01:00
Peter Boyle 4dc7c36aa8 Gparity works now even if simd distributed in a Gparity twist direction.
Tested by doubling lattice in t-direction.
2015-08-14 12:57:42 +01:00
Peter Boyle e6bed000c3 Gparity valence test now working.
Interface in FermionOperator will change a lot in future
2015-08-14 00:01:04 +01:00
Peter Boyle 028e2061e0 Gparity valence test now working.
Interface in FermionOperator will change a lot in future
2015-08-14 00:01:04 +01:00
Peter Boyle 7d3512ab21 Gparity valence test now working.
Interface in FermionOperator will change a lot in future
2015-08-14 00:01:04 +01:00
Peter Boyle fc9b36c769 Gamma5 mult direct 2015-08-13 10:51:29 +01:00
Peter Boyle 2c216a42f9 Gamma5 mult direct 2015-08-13 10:51:29 +01:00
Peter Boyle 45b01858a8 Gamma5 mult direct 2015-08-13 10:51:29 +01:00
Peter Boyle c39078162e Gparity improvements 2015-08-13 10:51:01 +01:00
Peter Boyle 145b807ba2 Gparity improvements 2015-08-13 10:51:01 +01:00
Peter Boyle 1c2d148bfa Gparity improvements 2015-08-13 10:51:01 +01:00
Peter Boyle 7e9203d8e0 Some bug fixes for more complicated types introduced with gparity 2015-08-13 10:50:34 +01:00
Peter Boyle 8d4c43327b Some bug fixes for more complicated types introduced with gparity 2015-08-13 10:50:34 +01:00
Peter Boyle 546513861f Some bug fixes for more complicated types introduced with gparity 2015-08-13 10:50:34 +01:00
Peter Boyle 6ab73c5512 Gparity test added; partial implementation -- this is Chris K's doubled lattice only
and have to regress this with the 2 flavour implementation.
2015-08-12 09:49:33 +01:00
Peter Boyle 8a0be42080 Gparity test added; partial implementation -- this is Chris K's doubled lattice only
and have to regress this with the 2 flavour implementation.
2015-08-12 09:49:33 +01:00
Peter Boyle 9183380946 Gparity test added; partial implementation -- this is Chris K's doubled lattice only
and have to regress this with the 2 flavour implementation.
2015-08-12 09:49:33 +01:00
Peter Boyle c8dca58e6d File list update. 2015-08-11 06:37:42 +01:00
Peter Boyle ded3945467 File list update. 2015-08-11 06:37:42 +01:00
Peter Boyle 04e0e9f5a0 File list update. 2015-08-11 06:37:42 +01:00
Peter Boyle 826fbb18c4 Preconditioned conjugate residual 2015-08-11 06:24:53 +01:00
Peter Boyle 9cd7f9ecad Preconditioned conjugate residual 2015-08-11 06:24:53 +01:00
Peter Boyle 69ce87fbe4 Preconditioned conjugate residual 2015-08-11 06:24:53 +01:00
Peter Boyle 07d672baeb Header 2015-08-11 06:23:38 +01:00
Peter Boyle 26f5ee0621 Header 2015-08-11 06:23:38 +01:00
Peter Boyle f165b1a120 Header 2015-08-11 06:23:38 +01:00
Peter Boyle 3903dfe6a5 Gparity modifications in the Gparity compressor variant. 2015-08-11 06:22:20 +01:00
Peter Boyle 881acaa065 Gparity modifications in the Gparity compressor variant. 2015-08-11 06:22:20 +01:00
Peter Boyle 0a9ebac514 Gparity modifications in the Gparity compressor variant. 2015-08-11 06:22:20 +01:00
Peter Boyle 1b3c93e22a Rework/global edit to enforce type templating of fermion operators.
Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types.
2015-08-10 20:47:44 +01:00
Peter Boyle aeb7442d8f Rework/global edit to enforce type templating of fermion operators.
Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types.
2015-08-10 20:47:44 +01:00
Peter Boyle 84a66476ab Rework/global edit to enforce type templating of fermion operators.
Allows multi-precision work and paves the way for alternate BC's and such like
allowing for example G-parity which is important for K pipi programme.
In particular, can drive an extra flavour index into the fermion fields
using template types.
2015-08-10 20:47:44 +01:00
Peter Boyle 2be8df93ad Adding components for even odd decomposed determinant in HMC.
dH not yet conserved, so something wrong in the eo force code still
2015-08-07 08:37:15 +01:00
Peter Boyle ce34856e32 Adding components for even odd decomposed determinant in HMC.
dH not yet conserved, so something wrong in the eo force code still
2015-08-07 08:37:15 +01:00
Peter Boyle a01aa156b9 Adding components for even odd decomposed determinant in HMC.
dH not yet conserved, so something wrong in the eo force code still
2015-08-07 08:37:15 +01:00
Peter Boyle b5a483ae60 Continued fraction overlap, partial fraction overlap force terms have a successful
test passing.
2015-08-01 22:48:21 +09:00
Peter Boyle d98e8366a0 Continued fraction overlap, partial fraction overlap force terms have a successful
test passing.
2015-08-01 22:48:21 +09:00
Peter Boyle 6ec087d43c Continued fraction overlap, partial fraction overlap force terms have a successful
test passing.
2015-08-01 22:48:21 +09:00
Peter Boyle bb372a6a8a Merge problem fixed 2015-08-01 22:30:00 +09:00
Peter Boyle 742db5d8b4 Merge problem fixed 2015-08-01 22:30:00 +09:00
Peter Boyle 772cd8199d Merge problem fixed 2015-08-01 22:30:00 +09:00
Peter Boyle 5e9bef8a1b Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Make.inc
	lib/qcd/hmc/HMC.h
	tests/Make.inc
	tests/Test_hmc_WilsonFermionGauge.cc
2015-08-01 22:24:54 +09:00
Peter Boyle a1d1dc96d6 Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Make.inc
	lib/qcd/hmc/HMC.h
	tests/Make.inc
	tests/Test_hmc_WilsonFermionGauge.cc
2015-08-01 22:24:54 +09:00
Peter Boyle 35feb93f56 Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Make.inc
	lib/qcd/hmc/HMC.h
	tests/Make.inc
	tests/Test_hmc_WilsonFermionGauge.cc
2015-08-01 22:24:54 +09:00
Peter Boyle bb7d8535aa Test for DWF force term passes 2015-08-01 22:07:05 +09:00
Peter Boyle 4866467e98 Test for DWF force term passes 2015-08-01 22:07:05 +09:00
Peter Boyle f37552a385 Test for DWF force term passes 2015-08-01 22:07:05 +09:00
Peter Boyle 848104b1a9 Changes making force term test for DWF pass. 2015-08-01 22:06:07 +09:00
Peter Boyle 2994274267 Changes making force term test for DWF pass. 2015-08-01 22:06:07 +09:00
Peter Boyle 2157a6919a Changes making force term test for DWF pass. 2015-08-01 22:06:07 +09:00
Peter Boyle 8627e237c8 Jackson smoothed chebyshev and (untested) completion of force terms
for Cayley, Partial and Cont fraction dwf and overlap.
have even odd and unprec forces.
2015-08-01 05:58:35 +09:00
Peter Boyle 1d0be956ae Jackson smoothed chebyshev and (untested) completion of force terms
for Cayley, Partial and Cont fraction dwf and overlap.
have even odd and unprec forces.
2015-08-01 05:58:35 +09:00
Peter Boyle 1d67d29183 Jackson smoothed chebyshev and (untested) completion of force terms
for Cayley, Partial and Cont fraction dwf and overlap.
have even odd and unprec forces.
2015-08-01 05:58:35 +09:00
neo 702ab15b6a Amending a merge mistake 2015-07-30 17:21:42 +09:00
neo f78cea58fa Amending a merge mistake 2015-07-30 17:21:42 +09:00
neo 3dd846c93c Amending a merge mistake 2015-07-30 17:21:42 +09:00
neo bcdc67b152 Small change in the HMC interface.
Example of multiple levels in the WilsonFermion hmc test.

Merge remote-tracking branch 'upstream/master'

Conflicts:
	lib/qcd/hmc/HMC.h
	lib/qcd/hmc/integrators/Integrator.h
	lib/qcd/hmc/integrators/Integrator_algorithm.h
	tests/Test_simd.cc
2015-07-30 17:16:57 +09:00
neo c2aff0ccd4 Small change in the HMC interface.
Example of multiple levels in the WilsonFermion hmc test.

Merge remote-tracking branch 'upstream/master'

Conflicts:
	lib/qcd/hmc/HMC.h
	lib/qcd/hmc/integrators/Integrator.h
	lib/qcd/hmc/integrators/Integrator_algorithm.h
	tests/Test_simd.cc
2015-07-30 17:16:57 +09:00
neo 490009745c Small change in the HMC interface.
Example of multiple levels in the WilsonFermion hmc test.

Merge remote-tracking branch 'upstream/master'

Conflicts:
	lib/qcd/hmc/HMC.h
	lib/qcd/hmc/integrators/Integrator.h
	lib/qcd/hmc/integrators/Integrator_algorithm.h
	tests/Test_simd.cc
2015-07-30 17:16:57 +09:00
Peter Boyle 68d9463be5 Bug in two flav pseudofermion corrected to reimport gauge field upon rejection.
exp(-DeltaH) = 1 now, and plaquette is sensible. Will reproduce an old Wilson Gauge
Wilson Fermion SCRI plaquette with precision in mass matching shortly.
2015-07-29 21:02:07 +09:00
Peter Boyle de153b70ce Bug in two flav pseudofermion corrected to reimport gauge field upon rejection.
exp(-DeltaH) = 1 now, and plaquette is sensible. Will reproduce an old Wilson Gauge
Wilson Fermion SCRI plaquette with precision in mass matching shortly.
2015-07-29 21:02:07 +09:00
Peter Boyle 9ff0b2987c Bug in two flav pseudofermion corrected to reimport gauge field upon rejection.
exp(-DeltaH) = 1 now, and plaquette is sensible. Will reproduce an old Wilson Gauge
Wilson Fermion SCRI plaquette with precision in mass matching shortly.
2015-07-29 21:02:07 +09:00
Peter Boyle 0b603225d1 Two flavour HMC for Wilson/Wilson is conserving energy.
Still to check plaq and <e(-dH)>, but nevertheless this is
progress
2015-07-29 17:53:39 +09:00
Peter Boyle cc4ca48d13 Two flavour HMC for Wilson/Wilson is conserving energy.
Still to check plaq and <e(-dH)>, but nevertheless this is
progress
2015-07-29 17:53:39 +09:00
Peter Boyle 4fe110bd07 Two flavour HMC for Wilson/Wilson is conserving energy.
Still to check plaq and <e(-dH)>, but nevertheless this is
progress
2015-07-29 17:53:39 +09:00
Peter Boyle f4c74e34d8 Committing incomplete work for parameter file I/O.
MacroMagic.h is central. Guido and I plan to move
over to generating virtual (XML, JSON, YAML, text, binary) encoding
from macro based system.
2015-07-27 18:32:28 +09:00
Peter Boyle bc09d7c3bd Committing incomplete work for parameter file I/O.
MacroMagic.h is central. Guido and I plan to move
over to generating virtual (XML, JSON, YAML, text, binary) encoding
from macro based system.
2015-07-27 18:32:28 +09:00
Peter Boyle 4cc2ef84d3 Committing incomplete work for parameter file I/O.
MacroMagic.h is central. Guido and I plan to move
over to generating virtual (XML, JSON, YAML, text, binary) encoding
from macro based system.
2015-07-27 18:32:28 +09:00
Peter Boyle 51031dd46c Files renamed 2015-07-27 18:30:19 +09:00
Peter Boyle 4c3f36b80c Files renamed 2015-07-27 18:30:19 +09:00
Peter Boyle 019f7a802e Files renamed 2015-07-27 18:30:19 +09:00
Peter Boyle 63015324c1 Two flavour pseudofermion action 2015-07-26 12:28:03 +09:00
Peter Boyle 9de40578d3 Two flavour pseudofermion action 2015-07-26 12:28:03 +09:00
Peter Boyle 97b41c41b0 Two flavour pseudofermion action 2015-07-26 12:28:03 +09:00
Peter Boyle 36b8f35eed Elemental force term for Wilson dslash added and tests thereof passing.
Now need to construct pseudofermion two flavour, ratio, one flavour, ratio
action fragments.
2015-07-26 10:54:38 +09:00
Peter Boyle d7e6b65a76 Elemental force term for Wilson dslash added and tests thereof passing.
Now need to construct pseudofermion two flavour, ratio, one flavour, ratio
action fragments.
2015-07-26 10:54:38 +09:00
Peter Boyle d9d4c5916a Elemental force term for Wilson dslash added and tests thereof passing.
Now need to construct pseudofermion two flavour, ratio, one flavour, ratio
action fragments.
2015-07-26 10:54:38 +09:00
Peter Boyle a4953610bb Merge branch 'master' of https://github.com/paboyle/Grid 2015-07-24 01:33:19 +09:00
Peter Boyle ba4989dd45 Merge branch 'master' of https://github.com/paboyle/Grid 2015-07-24 01:33:19 +09:00
Peter Boyle 1d70a45d84 Merge branch 'master' of https://github.com/paboyle/Grid 2015-07-24 01:33:19 +09:00
Peter Boyle 5e370db6c5 Sizable improvement in multigrid for unsquared.
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01

Substantial effort on timing and logging infrastructure
2015-07-24 01:31:13 +09:00
Peter Boyle 28bdc90908 Sizable improvement in multigrid for unsquared.
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01

Substantial effort on timing and logging infrastructure
2015-07-24 01:31:13 +09:00
Peter Boyle d1afebf71e Sizable improvement in multigrid for unsquared.
6000 matmuls CG unprec
2000 matmuls CG prec (4000 eo muls)
1050 matmuls PGCR on 16^3 x 32 x 8 m=.01

Substantial effort on timing and logging infrastructure
2015-07-24 01:31:13 +09:00
paboyle e0ed320827 Bug work around 2015-07-21 22:49:36 -07:00
paboyle c67bc303db Bug work around 2015-07-21 22:49:36 -07:00
paboyle f62f1699cb Bug work around 2015-07-21 22:49:36 -07:00
paboyle ee37130b5e Removed troublesome macros 2015-07-21 22:41:01 -07:00
paboyle 91629a24fe Removed troublesome macros 2015-07-21 22:41:01 -07:00
paboyle 5a68a9bbd4 Removed troublesome macros 2015-07-21 22:41:01 -07:00
Peter Boyle 5b475d5e08 5x speed up now 2015-07-22 00:30:05 +09:00
Peter Boyle d6a2d734d3 5x speed up now 2015-07-22 00:30:05 +09:00
Peter Boyle 11c99d5e66 5x speed up now 2015-07-22 00:30:05 +09:00
neo 479912a5ed Merge remote-tracking branch 'upstream/master' 2015-07-21 17:17:50 +09:00
neo d01310383f Merge remote-tracking branch 'upstream/master' 2015-07-21 17:17:50 +09:00
neo 5fc6af1c77 Merge remote-tracking branch 'upstream/master' 2015-07-21 17:17:50 +09:00
Peter Boyle b382660425 INSTALL 2015-07-21 13:58:57 +09:00
Peter Boyle 1c3ab017e8 INSTALL 2015-07-21 13:58:57 +09:00
Peter Boyle 135998acf5 INSTALL 2015-07-21 13:58:57 +09:00
Peter Boyle 987801c86d Merge 2015-07-21 13:56:22 +09:00
Peter Boyle 8925845684 Merge 2015-07-21 13:56:22 +09:00
Peter Boyle 4e94ddad46 Merge 2015-07-21 13:56:22 +09:00
Peter Boyle f8be0aeed1 No changes shown on git diff 2015-07-21 13:54:09 +09:00
Peter Boyle e34f8adbf4 No changes shown on git diff 2015-07-21 13:54:09 +09:00
Peter Boyle 5ac625f716 No changes shown on git diff 2015-07-21 13:54:09 +09:00
Peter Boyle 9651ab661b Small pretty layout change 2015-07-21 13:53:23 +09:00
Peter Boyle c7925e5c9b Small pretty layout change 2015-07-21 13:53:23 +09:00
Peter Boyle 8d654a86de Small pretty layout change 2015-07-21 13:53:23 +09:00
Peter Boyle fb65953d82 This was needed to compile on gcc 2015-07-21 13:52:59 +09:00
Peter Boyle 9d18773fbc This was needed to compile on gcc 2015-07-21 13:52:59 +09:00
Peter Boyle df2aac01f4 This was needed to compile on gcc 2015-07-21 13:52:59 +09:00
Peter Boyle 98dfc9254f This file is being developed and will remain hacky until the new algorithm
is complete
2015-07-21 13:52:23 +09:00
Peter Boyle 81987b64a6 This file is being developed and will remain hacky until the new algorithm
is complete
2015-07-21 13:52:23 +09:00
Peter Boyle 487fde8496 This file is being developed and will remain hacky until the new algorithm
is complete
2015-07-21 13:52:23 +09:00
Peter Boyle 44cf212720 Printing change 2015-07-21 13:51:56 +09:00
Peter Boyle 821ac7b6f4 Printing change 2015-07-21 13:51:56 +09:00
Peter Boyle 0007669381 Printing change 2015-07-21 13:51:56 +09:00
Peter Boyle cbec16fedd More info 2015-07-21 13:48:57 +09:00
Peter Boyle 01f9b1f6a4 More info 2015-07-21 13:48:57 +09:00
Peter Boyle a700933611 More info 2015-07-21 13:48:57 +09:00
Peter Boyle 64703207d4 Tweaks to subspace set up to put in g5 r5 hermiticity 2015-07-21 12:13:03 +09:00
Peter Boyle 59baa15d9f Tweaks to subspace set up to put in g5 r5 hermiticity 2015-07-21 12:13:03 +09:00
Peter Boyle c515d069cd Tweaks to subspace set up to put in g5 r5 hermiticity 2015-07-21 12:13:03 +09:00
Peter Boyle 021478af3b verbose 2015-07-21 12:12:29 +09:00
Peter Boyle 8a4f9d2367 verbose 2015-07-21 12:12:29 +09:00
Peter Boyle 8a7b7f1e2b verbose 2015-07-21 12:12:29 +09:00
neo 0ea846dcdc Merge remote-tracking branch 'upstream/master'
Conflicts:
	configure
2015-07-21 11:57:34 +09:00
neo 52e9c6b8db Merge remote-tracking branch 'upstream/master'
Conflicts:
	configure
2015-07-21 11:57:34 +09:00
neo a9c15626e1 Merge remote-tracking branch 'upstream/master'
Conflicts:
	configure
2015-07-21 11:57:34 +09:00
Guido Cossu d6489c8bf5 Merge pull request #16 from aportelli/master
AX_GCC_X86_AVX_XGETBV macro fix
2015-07-21 11:55:40 +09:00
Guido Cossu 7fc26258c7 Merge pull request #16 from aportelli/master
AX_GCC_X86_AVX_XGETBV macro fix
2015-07-21 11:55:40 +09:00
Guido Cossu 5c721b0aa0 Merge pull request #16 from aportelli/master
AX_GCC_X86_AVX_XGETBV macro fix
2015-07-21 11:55:40 +09:00
neo ab916d80fd More NEON functionalities 2015-07-21 11:52:15 +09:00
neo 7343a95772 More NEON functionalities 2015-07-21 11:52:15 +09:00
neo 9adaeb061a More NEON functionalities 2015-07-21 11:52:15 +09:00
portelli b0eedfd7ba fix of AX_GCC_X86_AVX_XGETBV macro 2015-07-17 11:15:57 +09:00
portelli 3fc9c00465 fix of AX_GCC_X86_AVX_XGETBV macro 2015-07-17 11:15:57 +09:00
portelli 73c4a1dac9 fix of AX_GCC_X86_AVX_XGETBV macro 2015-07-17 11:15:57 +09:00
portelli 6b8190a032 gitignore update 2015-07-17 11:15:17 +09:00
portelli 807e329a18 gitignore update 2015-07-17 11:15:17 +09:00
portelli ce7c24989d gitignore update 2015-07-17 11:15:17 +09:00
Peter Boyle 2da20f1443 This file drives me crazy 2015-07-11 23:06:31 +09:00
Peter Boyle ab509d3f8e This file drives me crazy 2015-07-11 23:06:31 +09:00
Peter Boyle 7907e4ca03 This file drives me crazy 2015-07-11 23:06:31 +09:00
neo c431816393 Cleaning up files for HMC 2015-07-07 14:59:37 +09:00
neo 48ae886c32 Cleaning up files for HMC 2015-07-07 14:59:37 +09:00
neo 97afe4125f Cleaning up files for HMC 2015-07-07 14:59:37 +09:00
neo 1e9317e5cf Simplifying HMC syntax for the final user 2015-07-06 18:32:20 +09:00
neo 19a1ffedcc Simplifying HMC syntax for the final user 2015-07-06 18:32:20 +09:00
neo 0f21c38ff8 Simplifying HMC syntax for the final user 2015-07-06 18:32:20 +09:00
neo 510f55ba30 Rearranging files in hmc 2015-07-06 16:46:43 +09:00
neo 32e6887d5f Rearranging files in hmc 2015-07-06 16:46:43 +09:00
neo fa42b652e5 Rearranging files in hmc 2015-07-06 16:46:43 +09:00
neo f95db88d19 Added minimum norm integrator
Little rearrangement of HMC and integrator classes
2015-07-06 16:17:32 +09:00
neo 1991852025 Added minimum norm integrator
Little rearrangement of HMC and integrator classes
2015-07-06 16:17:32 +09:00
neo 68fe0769a1 Added minimum norm integrator
Little rearrangement of HMC and integrator classes
2015-07-06 16:17:32 +09:00
neo 12e1682a87 HMC for Wilson Gauge action works
Fixed bug in momenta generation
2015-07-06 12:58:49 +09:00
neo 2718038977 HMC for Wilson Gauge action works
Fixed bug in momenta generation
2015-07-06 12:58:49 +09:00
neo 808f5820fa HMC for Wilson Gauge action works
Fixed bug in momenta generation
2015-07-06 12:58:49 +09:00
neo 6261770f59 Debugged vector version of ProjectOnGroup 2015-07-06 02:24:58 +09:00
neo 62d8952c0a Debugged vector version of ProjectOnGroup 2015-07-06 02:24:58 +09:00
neo 0ffcdf6204 Debugged vector version of ProjectOnGroup 2015-07-06 02:24:58 +09:00
neo 7a4ed7a867 HMC ready but untested 2015-07-04 17:47:50 +09:00
neo b1f94fa292 HMC ready but untested 2015-07-04 17:47:50 +09:00
neo e6087e1820 HMC ready but untested 2015-07-04 17:47:50 +09:00
neo 250965c6ca More progress in the HMC construction 2015-07-04 02:43:14 +09:00
neo 30c9dc473d More progress in the HMC construction 2015-07-04 02:43:14 +09:00
neo 59be55c0ab More progress in the HMC construction 2015-07-04 02:43:14 +09:00
neo 55f05a778f Skeleton of HMC/Integrators 2015-07-03 16:51:41 +09:00
neo 9655d43017 Skeleton of HMC/Integrators 2015-07-03 16:51:41 +09:00
neo ab3ad78ece Skeleton of HMC/Integrators 2015-07-03 16:51:41 +09:00
Peter Boyle 2c9ceaef94 No compile fix 2015-07-02 02:03:09 +01:00
Peter Boyle a666e66e36 No compile fix 2015-07-02 02:03:09 +01:00
Peter Boyle 4deffd1ccb No compile fix 2015-07-02 02:03:09 +01:00
Peter Boyle 55e313bd08 Cleaning up the recursion for traceIndex<n> after the changes the enable G++ to
compile it again.
2015-07-01 23:43:57 +01:00
Peter Boyle 84ec7c40cd Cleaning up the recursion for traceIndex<n> after the changes the enable G++ to
compile it again.
2015-07-01 23:43:57 +01:00
Peter Boyle a5c3edaca9 Cleaning up the recursion for traceIndex<n> after the changes the enable G++ to
compile it again.
2015-07-01 23:43:57 +01:00
Peter Boyle ef0ec1d0b2 Merge branch 'master' of https://github.com/paboyle/Grid 2015-07-01 22:51:04 +01:00
Peter Boyle 69a3d3203c Merge branch 'master' of https://github.com/paboyle/Grid 2015-07-01 22:51:04 +01:00
Peter Boyle 31a0c8d783 Merge branch 'master' of https://github.com/paboyle/Grid 2015-07-01 22:51:04 +01:00
paboyle 1e29d9f778 Some useful XC30 commands 2015-07-01 22:50:13 +01:00
paboyle 4cbcc7fd23 Some useful XC30 commands 2015-07-01 22:50:13 +01:00
paboyle ea5f2fcac4 Some useful XC30 commands 2015-07-01 22:50:13 +01:00
paboyle 46cf661ecd More xc30 config commansd 2015-07-01 22:48:58 +01:00
paboyle d4c4ce49fc More xc30 config commansd 2015-07-01 22:48:58 +01:00
paboyle 993420633b More xc30 config commansd 2015-07-01 22:48:58 +01:00
paboyle cb9da7b371 Temporarily disable gmp dependency simply because Cray XC30's I'm benchmarking
have a downlevel gmp version that chokes on ::max_align_t where gmp had a
bug as far as I recall.
2015-07-01 22:47:33 +01:00
paboyle c0ea404a7c Temporarily disable gmp dependency simply because Cray XC30's I'm benchmarking
have a downlevel gmp version that chokes on ::max_align_t where gmp had a
bug as far as I recall.
2015-07-01 22:47:33 +01:00
paboyle e3456bf559 Temporarily disable gmp dependency simply because Cray XC30's I'm benchmarking
have a downlevel gmp version that chokes on ::max_align_t where gmp had a
bug as far as I recall.
2015-07-01 22:47:33 +01:00
paboyle 71e6733d4d Modified memory bw test to display word size 2015-07-01 22:46:53 +01:00
paboyle 0aec35bfc0 Modified memory bw test to display word size 2015-07-01 22:46:53 +01:00
paboyle 39271b02dd Modified memory bw test to display word size 2015-07-01 22:46:53 +01:00
Peter Boyle 98b84d230a Change the SIMD command correctly with precision = double vs. single and
connect the "Real" default precisoin to a configure flag.
Have RealF, RealD and Real types, where Real is compile target dependent single/double,
RealF is single and RealD is double etc..
2015-07-01 22:45:15 +01:00
Peter Boyle dc66161f47 Change the SIMD command correctly with precision = double vs. single and
connect the "Real" default precisoin to a configure flag.
Have RealF, RealD and Real types, where Real is compile target dependent single/double,
RealF is single and RealD is double etc..
2015-07-01 22:45:15 +01:00
Peter Boyle 638d2cda11 Change the SIMD command correctly with precision = double vs. single and
connect the "Real" default precisoin to a configure flag.
Have RealF, RealD and Real types, where Real is compile target dependent single/double,
RealF is single and RealD is double etc..
2015-07-01 22:45:15 +01:00
paboyle c4777879e6 Remove dependency on wrong file 2015-07-01 13:04:02 +01:00
paboyle b139941423 Remove dependency on wrong file 2015-07-01 13:04:02 +01:00
paboyle 61c3491b8b Remove dependency on wrong file 2015-07-01 13:04:02 +01:00
Peter Boyle e618c609fe Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-30 15:17:46 +01:00
Peter Boyle ef81d6bf06 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-30 15:17:46 +01:00
Peter Boyle 9143f071d7 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-30 15:17:46 +01:00
Peter Boyle e164ed6f12 Big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:17:27 +01:00
Peter Boyle f41c7dffef Big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:17:27 +01:00
Peter Boyle 03ca506a3d Big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:17:27 +01:00
Peter Boyle 95ecf81d42 big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:03:11 +01:00
Peter Boyle 74e397b29c big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:03:11 +01:00
Peter Boyle 98c817df1b big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:03:11 +01:00
Peter Boyle f36dbfffe5 VPGCR updates 2015-06-30 15:02:27 +01:00
Peter Boyle 7cfe432ee2 VPGCR updates 2015-06-30 15:02:27 +01:00
Peter Boyle 8eaf657f95 VPGCR updates 2015-06-30 15:02:27 +01:00
Peter Boyle 8581c05ab3 big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:01:44 +01:00
Peter Boyle 490042f8e1 big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:01:44 +01:00
Peter Boyle 8ad81bed32 big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:01:44 +01:00
Peter Boyle 59cd42c164 big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:01:26 +01:00
Peter Boyle 0971522f43 big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:01:26 +01:00
Peter Boyle cd2fb68905 big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:01:26 +01:00
Peter Boyle a4369e1db6 big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:00:19 +01:00
Peter Boyle 7de5ccb879 big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:00:19 +01:00
Peter Boyle c20fdd45a5 big commit fixing nocompiles in defective C++11 compilers (gcc, icpc). stared getting to
near the bleeding edge I guess
2015-06-30 15:00:19 +01:00
Peter Boyle d6c79bbadb Update Benchmark_comms.cc 2015-06-25 10:59:53 +01:00
Peter Boyle a67b44ffa4 Update Benchmark_comms.cc 2015-06-25 10:59:53 +01:00
Peter Boyle 93916f400d Update Benchmark_comms.cc 2015-06-25 10:59:53 +01:00
Peter Boyle 5f8f0bc792 Some small steps towards a multigrid 2015-06-22 12:49:44 +01:00
Peter Boyle dec68e5c0e Some small steps towards a multigrid 2015-06-22 12:49:44 +01:00
Peter Boyle a17684ebe2 Some small steps towards a multigrid 2015-06-22 12:49:44 +01:00
Azusa Yamaguchi 95538bb8c6 Abstract preconditioner 2015-06-21 11:03:55 +01:00
Azusa Yamaguchi e415587e8f Abstract preconditioner 2015-06-21 11:03:55 +01:00
Azusa Yamaguchi fd1a8abcd1 Abstract preconditioner 2015-06-21 11:03:55 +01:00
Azusa Yamaguchi a265765319 Variable preconditioned GCR with restarting.
Orthogonalisation depth and restart frequency is controllable via constructor
2015-06-21 10:58:46 +01:00
Azusa Yamaguchi 945bb93e48 Variable preconditioned GCR with restarting.
Orthogonalisation depth and restart frequency is controllable via constructor
2015-06-21 10:58:46 +01:00
Azusa Yamaguchi 3b4118f33e Variable preconditioned GCR with restarting.
Orthogonalisation depth and restart frequency is controllable via constructor
2015-06-21 10:58:46 +01:00
Peter Boyle eace9051e8 Merge
Merge branch 'master' of https://github.com/paboyle/Grid
2015-06-20 22:25:31 +01:00
Peter Boyle bcf1d5160f Merge
Merge branch 'master' of https://github.com/paboyle/Grid
2015-06-20 22:25:31 +01:00
Peter Boyle c7d77dfa0f Merge
Merge branch 'master' of https://github.com/paboyle/Grid
2015-06-20 22:25:31 +01:00
Peter Boyle f1916a7515 Will start this as a two level algorithm 2015-06-20 22:24:21 +01:00
Peter Boyle eb5ad2884c Will start this as a two level algorithm 2015-06-20 22:24:21 +01:00
Peter Boyle 960f29c0b1 Will start this as a two level algorithm 2015-06-20 22:24:21 +01:00
Peter Boyle 5ccbac7db1 HDCG but this is not complete and placeholder for later completion 2015-06-20 22:23:57 +01:00
Peter Boyle bce59d9911 HDCG but this is not complete and placeholder for later completion 2015-06-20 22:23:57 +01:00
Peter Boyle fb07ee5781 HDCG but this is not complete and placeholder for later completion 2015-06-20 22:23:57 +01:00
Peter Boyle aba5c8595a Patches for beginnings of an overlap multigrid 2015-06-20 22:22:56 +01:00
Peter Boyle 6ad96f7383 Patches for beginnings of an overlap multigrid 2015-06-20 22:22:56 +01:00
Peter Boyle b4a6dbfa65 Patches for beginnings of an overlap multigrid 2015-06-20 22:22:56 +01:00
Azusa Yamaguchi cb92390825 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-20 14:22:29 +01:00
Azusa Yamaguchi 6cebd006d4 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-20 14:22:29 +01:00
Azusa Yamaguchi dc7c77e1d5 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-20 14:22:29 +01:00
Azusa Yamaguchi 9dbb326061 Add the test_quenched files 2015-06-20 14:09:26 +01:00
Azusa Yamaguchi fd208ca11c Add the test_quenched files 2015-06-20 14:09:26 +01:00
Azusa Yamaguchi 1e8217d880 Add the test_quenched files 2015-06-20 14:09:26 +01:00
Peter Boyle 2534199ee7 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-19 17:24:05 +01:00
Peter Boyle 177b5632fd Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-19 17:24:05 +01:00
Peter Boyle a0d4f832cf Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-19 17:24:05 +01:00
Jung ee9ecb6115 Fixing missing max_align_t error 2015-06-19 00:56:24 -04:00
neo faf8544233 Lattice matrix exponential ok 2015-06-17 20:41:07 +09:00
neo 09757cbf0c Lattice matrix exponential ok 2015-06-17 20:41:07 +09:00
neo 4eb71d2cd2 Lattice matrix exponential ok 2015-06-17 20:41:07 +09:00
Azusa Yamaguchi b78ecd6fb2 merge 2015-06-16 20:47:31 +01:00
Azusa Yamaguchi 92e870b256 merge 2015-06-16 20:47:31 +01:00
Azusa Yamaguchi fd72b64ca3 merge 2015-06-16 20:47:31 +01:00
Azusa Yamaguchi 700614123d add bug-fixed Test_nersc)_io. 2015-06-16 20:23:27 +01:00
Azusa Yamaguchi 06047b83c7 add bug-fixed Test_nersc)_io. 2015-06-16 20:23:27 +01:00
Azusa Yamaguchi 2faf7d95db add bug-fixed Test_nersc)_io. 2015-06-16 20:23:27 +01:00
neo 26ff0f3b50 Merge remote-tracking branch 'upstream/master' 2015-06-17 02:02:51 +09:00
neo 9c846bb0c7 Merge remote-tracking branch 'upstream/master' 2015-06-17 02:02:51 +09:00
neo e31dfa79d1 Merge remote-tracking branch 'upstream/master' 2015-06-17 02:02:51 +09:00
neo 296edfbd95 Check for SUN projection and Exponential 2015-06-17 02:02:06 +09:00
neo 318e244748 Check for SUN projection and Exponential 2015-06-17 02:02:06 +09:00
neo 47159797cb Check for SUN projection and Exponential 2015-06-17 02:02:06 +09:00
neo a7555b41df Corrected bug in integer multiplications for SSE4 and AVX2
Merge remote-tracking branch 'upstream/master'

Conflicts:
	tests/Make.inc
2015-06-16 23:34:45 +09:00
neo c9018d74ac Corrected bug in integer multiplications for SSE4 and AVX2
Merge remote-tracking branch 'upstream/master'

Conflicts:
	tests/Make.inc
2015-06-16 23:34:45 +09:00
neo 6e5db0b1da Corrected bug in integer multiplications for SSE4 and AVX2
Merge remote-tracking branch 'upstream/master'

Conflicts:
	tests/Make.inc
2015-06-16 23:34:45 +09:00
Azusa Yamaguchi 453b28bd81 Heatbath and config related removed 2015-06-16 14:18:48 +01:00
Azusa Yamaguchi ddbfb026d5 Heatbath and config related removed 2015-06-16 14:18:48 +01:00
Azusa Yamaguchi 1c3c795b84 Heatbath and config related removed 2015-06-16 14:18:48 +01:00
Azusa Yamaguchi 77058d9b9f Critical bug fix of sin/cos typo 2015-06-16 14:17:45 +01:00
Azusa Yamaguchi 9c16bccbf4 Critical bug fix of sin/cos typo 2015-06-16 14:17:45 +01:00
Azusa Yamaguchi 20fe866651 Critical bug fix of sin/cos typo 2015-06-16 14:17:45 +01:00
Azusa Yamaguchi 74845cb3dc Quenched works for wilson gauge 2015-06-16 14:17:11 +01:00
Azusa Yamaguchi 79a9f8b9c9 Quenched works for wilson gauge 2015-06-16 14:17:11 +01:00
Azusa Yamaguchi 18d0437f8d Quenched works for wilson gauge 2015-06-16 14:17:11 +01:00
Azusa Yamaguchi c945041067 uninitialised bug fix 2015-06-16 14:07:05 +01:00
Azusa Yamaguchi 173b31ce05 uninitialised bug fix 2015-06-16 14:07:05 +01:00
Azusa Yamaguchi 4e7300b68d uninitialised bug fix 2015-06-16 14:07:05 +01:00
Azusa Yamaguchi a7774e100f Typo fix 2015-06-16 14:06:31 +01:00
Azusa Yamaguchi 633ee06faf Typo fix 2015-06-16 14:06:31 +01:00
Azusa Yamaguchi 73494a4768 Typo fix 2015-06-16 14:06:31 +01:00
Azusa Yamaguchi b11f4a1473 Extra check that failed in quenched test 2015-06-16 14:04:56 +01:00
Azusa Yamaguchi cd7eac062a Extra check that failed in quenched test 2015-06-16 14:04:56 +01:00
Azusa Yamaguchi d7a8921de2 Extra check that failed in quenched test 2015-06-16 14:04:56 +01:00
Azusa Yamaguchi 212c13bb2c pointer cast 2015-06-16 14:04:33 +01:00
Azusa Yamaguchi 625bb1d7ee pointer cast 2015-06-16 14:04:33 +01:00
Azusa Yamaguchi d5edd09beb pointer cast 2015-06-16 14:04:33 +01:00
neo 0bff862876 Merge remote-tracking branch 'upstream/master'
Conflicts:
	lib/Make.inc
	tests/Make.inc
2015-06-15 16:48:50 +09:00
neo 1f2cf5cff4 Merge remote-tracking branch 'upstream/master'
Conflicts:
	lib/Make.inc
	tests/Make.inc
2015-06-15 16:48:50 +09:00
neo f7a1cef15b Merge remote-tracking branch 'upstream/master'
Conflicts:
	lib/Make.inc
	tests/Make.inc
2015-06-15 16:48:50 +09:00
Peter Boyle 8ebff0b7c0 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-14 01:29:41 +01:00
Peter Boyle 308d53858b Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-14 01:29:41 +01:00
Peter Boyle 392399e866 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-14 01:29:41 +01:00
Azusa Yamaguchi 54964dd4bb First cut at SUN support for quenched updates 2015-06-14 01:28:54 +01:00
Azusa Yamaguchi ae0873bc77 First cut at SUN support for quenched updates 2015-06-14 01:28:54 +01:00
Azusa Yamaguchi 55d7483608 First cut at SUN support for quenched updates 2015-06-14 01:28:54 +01:00
Peter Boyle 0f6ed6b633 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-14 01:27:07 +01:00
Peter Boyle 3f7a66328a Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-14 01:27:07 +01:00
Peter Boyle 337bccf27d Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-14 01:27:07 +01:00
Azusa Yamaguchi aa558ab03d Logical elemenet by element on tensors 2015-06-14 01:08:29 +01:00
Azusa Yamaguchi 353c9598c1 Logical elemenet by element on tensors 2015-06-14 01:08:29 +01:00
Azusa Yamaguchi 2b971c14c0 Logical elemenet by element on tensors 2015-06-14 01:08:29 +01:00
Azusa Yamaguchi 459ce22bb3 WHere should always have precisely matching types in ET system 2015-06-14 01:07:58 +01:00
Azusa Yamaguchi a5bb9472b6 WHere should always have precisely matching types in ET system 2015-06-14 01:07:58 +01:00
Azusa Yamaguchi 250cb00c42 WHere should always have precisely matching types in ET system 2015-06-14 01:07:58 +01:00
Azusa Yamaguchi 694dac7ff5 Where and many other functions (sin cos abs log exp) into ET system 2015-06-14 01:07:25 +01:00
Azusa Yamaguchi 84a207f2e6 Where and many other functions (sin cos abs log exp) into ET system 2015-06-14 01:07:25 +01:00
Azusa Yamaguchi f5bcca6cdf Where and many other functions (sin cos abs log exp) into ET system 2015-06-14 01:07:25 +01:00
Azusa Yamaguchi c54299107a Cosmetic 2015-06-14 01:06:56 +01:00
Azusa Yamaguchi 667abe888a Cosmetic 2015-06-14 01:06:56 +01:00
Azusa Yamaguchi be3f4ce201 Cosmetic 2015-06-14 01:06:56 +01:00
Azusa Yamaguchi fe1d8a9dd1 real comps and expression comps 2015-06-14 01:05:57 +01:00
Azusa Yamaguchi bc238177b7 real comps and expression comps 2015-06-14 01:05:57 +01:00
Azusa Yamaguchi 264d0d1735 real comps and expression comps 2015-06-14 01:05:57 +01:00
Azusa Yamaguchi c93c0edcd9 Allow real comparisons and expressions in comparisons 2015-06-14 01:05:39 +01:00
Azusa Yamaguchi 2303a2231b Allow real comparisons and expressions in comparisons 2015-06-14 01:05:39 +01:00
Azusa Yamaguchi 6ca940b5a6 Allow real comparisons and expressions in comparisons 2015-06-14 01:05:39 +01:00
Azusa Yamaguchi 9bdef0888e Allow sparse occupation of vectors in some cases 2015-06-14 01:05:06 +01:00
Azusa Yamaguchi 9bdb8ffb3f Allow sparse occupation of vectors in some cases 2015-06-14 01:05:06 +01:00
Azusa Yamaguchi b66bbed548 Allow sparse occupation of vectors in some cases 2015-06-14 01:05:06 +01:00
Azusa Yamaguchi f8f4b8a249 Moving more into the ET system 2015-06-14 01:04:32 +01:00
Azusa Yamaguchi 38e8a61035 Moving more into the ET system 2015-06-14 01:04:32 +01:00
Azusa Yamaguchi 463b9ca374 Moving more into the ET system 2015-06-14 01:04:32 +01:00
Azusa Yamaguchi 8bafe067eb trying to find a way to remove functions from the ET system using explicit
expression closure statements. Not sure if this works yet
2015-06-14 01:03:28 +01:00
Azusa Yamaguchi 02885ee583 trying to find a way to remove functions from the ET system using explicit
expression closure statements. Not sure if this works yet
2015-06-14 01:03:28 +01:00
Azusa Yamaguchi 611f7ec38c trying to find a way to remove functions from the ET system using explicit
expression closure statements. Not sure if this works yet
2015-06-14 01:03:28 +01:00
Azusa Yamaguchi 53ba8d4926 Transpose always returns self image 2015-06-14 01:02:31 +01:00
Azusa Yamaguchi 6a697be060 Transpose always returns self image 2015-06-14 01:02:31 +01:00
Azusa Yamaguchi f490f320ae Transpose always returns self image 2015-06-14 01:02:31 +01:00
Azusa Yamaguchi ddd93a0576 Extra lattice unaries 2015-06-14 01:01:55 +01:00
Azusa Yamaguchi 4d81d402b9 Extra lattice unaries 2015-06-14 01:01:55 +01:00
Azusa Yamaguchi 58f50b7520 Extra lattice unaries 2015-06-14 01:01:55 +01:00
Azusa Yamaguchi ae2dcfc173 Moving where in to the expression template system; deprecate 2015-06-14 01:01:21 +01:00
Azusa Yamaguchi 0f7cf40867 Moving where in to the expression template system; deprecate 2015-06-14 01:01:21 +01:00
Azusa Yamaguchi e5c980f169 Moving where in to the expression template system; deprecate 2015-06-14 01:01:21 +01:00
Azusa Yamaguchi 58a695b0a9 More TODO as ever 2015-06-14 01:00:46 +01:00
Azusa Yamaguchi 8b9a9aea81 More TODO as ever 2015-06-14 01:00:46 +01:00
Azusa Yamaguchi c185cbdc40 More TODO as ever 2015-06-14 01:00:46 +01:00
Azusa Yamaguchi e529dd9696 Peek poke colour/spin/complex and trace transpose support 2015-06-14 01:00:11 +01:00
Azusa Yamaguchi 87cc0d4ca3 Peek poke colour/spin/complex and trace transpose support 2015-06-14 01:00:11 +01:00
Azusa Yamaguchi 6b8bdf0c6b Peek poke colour/spin/complex and trace transpose support 2015-06-14 01:00:11 +01:00
Azusa Yamaguchi 610450bc0e const safety 2015-06-14 00:59:50 +01:00
Azusa Yamaguchi 66e5718610 const safety 2015-06-14 00:59:50 +01:00
Azusa Yamaguchi 68b82ddd99 const safety 2015-06-14 00:59:50 +01:00
Azusa Yamaguchi 2ba72a91de Binop assist and real/complex improvements 2015-06-14 00:59:07 +01:00
Azusa Yamaguchi 19e8d2809a Binop assist and real/complex improvements 2015-06-14 00:59:07 +01:00
Azusa Yamaguchi 22c8185caa Binop assist and real/complex improvements 2015-06-14 00:59:07 +01:00
Azusa Yamaguchi 4cf04c8583 More functions broken out into element by element 2015-06-14 00:58:14 +01:00
Azusa Yamaguchi 8245585dee More functions broken out into element by element 2015-06-14 00:58:14 +01:00
Azusa Yamaguchi 42f7e5b7f8 More functions broken out into element by element 2015-06-14 00:58:14 +01:00
Azusa Yamaguchi 56d120cb24 typo fix -- remove extra template arg 2015-06-14 00:57:23 +01:00
Azusa Yamaguchi 9006f7a34a typo fix -- remove extra template arg 2015-06-14 00:57:23 +01:00
Azusa Yamaguchi 84485afe6c typo fix -- remove extra template arg 2015-06-14 00:57:23 +01:00
Azusa Yamaguchi 1dc8dfd3ae Logical ops element by element 2015-06-14 00:56:40 +01:00
Azusa Yamaguchi d5810eaef1 Logical ops element by element 2015-06-14 00:56:40 +01:00
Azusa Yamaguchi 3dea03e72b Logical ops element by element 2015-06-14 00:56:40 +01:00
Azusa Yamaguchi f2dafbc4f7 Real complex improved 2015-06-14 00:56:08 +01:00
Azusa Yamaguchi b521155218 Real complex improved 2015-06-14 00:56:08 +01:00
Azusa Yamaguchi dbfb1a69c6 Real complex improved 2015-06-14 00:56:08 +01:00
Azusa Yamaguchi 9c810200e0 Handle case of simd_layout not filling whole vector.
Useful if real complex live on same grid
2015-06-14 00:55:21 +01:00
Azusa Yamaguchi 51ebc4f402 Handle case of simd_layout not filling whole vector.
Useful if real complex live on same grid
2015-06-14 00:55:21 +01:00
Azusa Yamaguchi ef97692622 Handle case of simd_layout not filling whole vector.
Useful if real complex live on same grid
2015-06-14 00:55:21 +01:00
Azusa Yamaguchi 6d6e9a5811 Real/Complex improvements 2015-06-14 00:54:18 +01:00
Azusa Yamaguchi 17de4bce20 Real/Complex improvements 2015-06-14 00:54:18 +01:00
Azusa Yamaguchi a81998f704 Real/Complex improvements 2015-06-14 00:54:18 +01:00
Azusa Yamaguchi 2949e5feca Real handling improvement 2015-06-14 00:53:52 +01:00
Azusa Yamaguchi d74aa0e1e6 Real handling improvement 2015-06-14 00:53:52 +01:00
Azusa Yamaguchi 8fa0c90062 Real handling improvement 2015-06-14 00:53:52 +01:00
Azusa Yamaguchi 96a55c17f1 Unary funcs update 2015-06-14 00:53:18 +01:00
Azusa Yamaguchi d66cab3f01 Unary funcs update 2015-06-14 00:53:18 +01:00
Azusa Yamaguchi c90cf08bae Unary funcs update 2015-06-14 00:53:18 +01:00
Azusa Yamaguchi d47f2fe1e0 File list 2015-06-14 00:52:39 +01:00
Azusa Yamaguchi 5c66b5c712 File list 2015-06-14 00:52:39 +01:00
Azusa Yamaguchi 0fa26e7d68 File list 2015-06-14 00:52:39 +01:00
Azusa Yamaguchi eff0cb3067 Minor 2015-06-14 00:52:26 +01:00
Azusa Yamaguchi 3e261b3d9e Minor 2015-06-14 00:52:26 +01:00
Azusa Yamaguchi 8bac3b57ad Minor 2015-06-14 00:52:26 +01:00
Azusa Yamaguchi ff8bde53df more accurate comment 2015-06-14 00:51:56 +01:00
Azusa Yamaguchi c79d85f763 more accurate comment 2015-06-14 00:51:56 +01:00
Azusa Yamaguchi 171c95d6d8 more accurate comment 2015-06-14 00:51:56 +01:00
Azusa Yamaguchi 58019383f3 Typing 2015-06-14 00:51:37 +01:00
Azusa Yamaguchi ff3db9ceee Typing 2015-06-14 00:51:37 +01:00
Azusa Yamaguchi 5484607ef2 Typing 2015-06-14 00:51:37 +01:00
Azusa Yamaguchi 21f214d6c9 Apply a heatbath sweep 2015-06-14 00:50:59 +01:00
Azusa Yamaguchi f3aebd4b33 Apply a heatbath sweep 2015-06-14 00:50:59 +01:00
Azusa Yamaguchi 31ab4c4c35 Apply a heatbath sweep 2015-06-14 00:50:59 +01:00
Azusa Yamaguchi e57bc34afe Minor change 2015-06-14 00:50:26 +01:00
Azusa Yamaguchi 0c8c44b3e3 Minor change 2015-06-14 00:50:26 +01:00
Azusa Yamaguchi a0dcbc0d16 Minor change 2015-06-14 00:50:26 +01:00
Azusa Yamaguchi 744879a3f5 be more precise on typing 2015-06-14 00:49:57 +01:00
Azusa Yamaguchi e4e91d3042 be more precise on typing 2015-06-14 00:49:57 +01:00
Azusa Yamaguchi fa6117f136 be more precise on typing 2015-06-14 00:49:57 +01:00
Azusa Yamaguchi 45c21a9c8b TensorRemove not needed now 2015-06-14 00:49:26 +01:00
Azusa Yamaguchi 9184cc68da TensorRemove not needed now 2015-06-14 00:49:26 +01:00
Azusa Yamaguchi 2c2112e152 TensorRemove not needed now 2015-06-14 00:49:26 +01:00
Azusa Yamaguchi 19ff065844 fix no compile 2015-06-14 00:48:41 +01:00
Azusa Yamaguchi 46eafa520d fix no compile 2015-06-14 00:48:41 +01:00
Azusa Yamaguchi dcfc189f11 fix no compile 2015-06-14 00:48:41 +01:00
Peter Boyle f0d190024a Updates to ldop tests 2015-06-10 12:26:25 +01:00
Peter Boyle 1cc25837b2 Updates to ldop tests 2015-06-10 12:26:25 +01:00
Peter Boyle 5cce44edb4 Updates to ldop tests 2015-06-10 12:26:25 +01:00
Peter Boyle 0ad72b38ab file list 2015-06-10 12:25:15 +01:00
Peter Boyle 50c4f416b6 file list 2015-06-10 12:25:15 +01:00
Peter Boyle 622261b1ea file list 2015-06-10 12:25:15 +01:00
Azusa Yamaguchi 12d7902ca0 Fix compile 2015-06-10 11:30:27 +01:00
Azusa Yamaguchi e5d30fe5e2 Fix compile 2015-06-10 11:30:27 +01:00
Azusa Yamaguchi f6667801e1 Fix compile 2015-06-10 11:30:27 +01:00
Azusa Yamaguchi e3b4dd32c0 commit file list 2015-06-10 11:26:46 +01:00
Azusa Yamaguchi 89f961faef commit file list 2015-06-10 11:26:46 +01:00
Azusa Yamaguchi 22752f6ff0 commit file list 2015-06-10 11:26:46 +01:00
Azusa Yamaguchi 562cdc805b Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-10 11:25:57 +01:00
Azusa Yamaguchi ca109497a1 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-10 11:25:57 +01:00
Azusa Yamaguchi 0457923179 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-10 11:25:57 +01:00
Azusa Yamaguchi 653a94e8ab Successful generation of general SU N generators
This class will move into a utils class and gain exponentiation and ta projection
to support HMC and heatbath
2015-06-10 11:24:14 +01:00
Azusa Yamaguchi 2f4a0c5de4 Successful generation of general SU N generators
This class will move into a utils class and gain exponentiation and ta projection
to support HMC and heatbath
2015-06-10 11:24:14 +01:00
Azusa Yamaguchi 8cf34ebbff Successful generation of general SU N generators
This class will move into a utils class and gain exponentiation and ta projection
to support HMC and heatbath
2015-06-10 11:24:14 +01:00
neo f3dd829459 Adding several iMatrix utilities 2015-06-10 14:16:33 +09:00
neo 965a92ce40 Adding several iMatrix utilities 2015-06-10 14:16:33 +09:00
neo bb4a916767 Adding several iMatrix utilities 2015-06-10 14:16:33 +09:00
Peter Boyle e79a57b423 Merge branch 'master' of https://github.com/paboyle/Grid
Not sure what changed in master
2015-06-09 22:51:10 +01:00
Peter Boyle 1118c42fcc Merge branch 'master' of https://github.com/paboyle/Grid
Not sure what changed in master
2015-06-09 22:51:10 +01:00
Peter Boyle 02144fb50b Merge branch 'master' of https://github.com/paboyle/Grid
Not sure what changed in master
2015-06-09 22:51:10 +01:00
Peter Boyle 5a53086e41 Solver converges 2015-06-09 22:51:02 +01:00
Peter Boyle 62cb914488 Solver converges 2015-06-09 22:51:02 +01:00
Peter Boyle 7d6be625fa Solver converges 2015-06-09 22:51:02 +01:00
Peter Boyle c4f204f440 Solver converges 2015-06-09 22:50:45 +01:00
Peter Boyle d50cf43e5e Solver converges 2015-06-09 22:50:45 +01:00
Peter Boyle 3f8c7be6f8 Solver converges 2015-06-09 22:50:45 +01:00
Peter Boyle 2022a7205c Remove extra layers of checks now it works 2015-06-09 22:43:41 +01:00
Peter Boyle 60a96e3d0d Remove extra layers of checks now it works 2015-06-09 22:43:41 +01:00
Peter Boyle 0784bbc4bf Remove extra layers of checks now it works 2015-06-09 22:43:41 +01:00
Peter Boyle 4357f5acdc Some extra tests 2015-06-09 22:43:10 +01:00
Peter Boyle 850054d162 Some extra tests 2015-06-09 22:43:10 +01:00
Peter Boyle 516f1a6316 Some extra tests 2015-06-09 22:43:10 +01:00
Peter Boyle 4963f7356a 5d OpDir direction interface refers to the 5d dims, not 4d to present a
sensible and consistent external interface.
2015-06-09 22:41:59 +01:00
Peter Boyle 6abbd35d81 5d OpDir direction interface refers to the 5d dims, not 4d to present a
sensible and consistent external interface.
2015-06-09 22:41:59 +01:00
Peter Boyle b92060f511 5d OpDir direction interface refers to the 5d dims, not 4d to present a
sensible and consistent external interface.
2015-06-09 22:41:59 +01:00
Peter Boyle 3a52bc4ce1 G5R5 update 2015-06-09 22:41:27 +01:00
Peter Boyle 971d224983 G5R5 update 2015-06-09 22:41:27 +01:00
Peter Boyle 3546dcac1a G5R5 update 2015-06-09 22:41:27 +01:00
Peter Boyle 708d4f7533 g5 and g5R5 hermitian are now differentiated 2015-06-09 22:40:58 +01:00
Peter Boyle c7152c520a g5 and g5R5 hermitian are now differentiated 2015-06-09 22:40:58 +01:00
Peter Boyle c133974d67 g5 and g5R5 hermitian are now differentiated 2015-06-09 22:40:58 +01:00
Peter Boyle 61770d4472 Files update 2015-06-09 22:40:12 +01:00
Peter Boyle 930463a226 Files update 2015-06-09 22:40:12 +01:00
Peter Boyle c86c15f95f Files update 2015-06-09 22:40:12 +01:00
Peter Boyle a57ca0bbe7 Prettier reporting 2015-06-09 22:39:37 +01:00
Peter Boyle 2e43d8fe92 Prettier reporting 2015-06-09 22:39:37 +01:00
Peter Boyle accd21dec7 Prettier reporting 2015-06-09 22:39:37 +01:00
Peter Boyle 1f2498918d Got this sorted with the promote working in a test 2015-06-09 22:39:13 +01:00
Peter Boyle 4a10540365 Got this sorted with the promote working in a test 2015-06-09 22:39:13 +01:00
Peter Boyle 7766cc96c1 Got this sorted with the promote working in a test 2015-06-09 22:39:13 +01:00
Peter Boyle 25bfff370a Starting to use 2015-06-09 22:38:13 +01:00
Peter Boyle e19391ef7a Starting to use 2015-06-09 22:38:13 +01:00
Peter Boyle 6fb36c8a51 Starting to use 2015-06-09 22:38:13 +01:00
Peter Boyle 6403e3021f Debugged finally. A silly mistake in permute cost me a day of debug. 2015-06-09 22:37:21 +01:00
Peter Boyle a5b75a095c Debugged finally. A silly mistake in permute cost me a day of debug. 2015-06-09 22:37:21 +01:00
Peter Boyle 2e6986892a Debugged finally. A silly mistake in permute cost me a day of debug. 2015-06-09 22:37:21 +01:00
Peter Boyle 4d11fc0330 silly change 2015-06-09 22:36:48 +01:00
Peter Boyle f0530e9f52 silly change 2015-06-09 22:36:48 +01:00
Peter Boyle eeeaac7147 silly change 2015-06-09 22:36:48 +01:00
neo f8eb862073 Merge remote-tracking branch 'upstream/master' 2015-06-09 19:01:07 +09:00
neo 61b85a0670 Merge remote-tracking branch 'upstream/master' 2015-06-09 19:01:07 +09:00
neo ecf3bae150 Merge remote-tracking branch 'upstream/master' 2015-06-09 19:01:07 +09:00
neo 99ace0a89c Adding support for iMatrix exponentiation 2015-06-09 18:59:45 +09:00
neo d60cfe31a7 Adding support for iMatrix exponentiation 2015-06-09 18:59:45 +09:00
neo e80012896a Adding support for iMatrix exponentiation 2015-06-09 18:59:45 +09:00
Peter Boyle 64e2f36d40 Merge pull request #14 from mspraggs/master
Removed std::string calls from NerscIO map indexing
2015-06-09 10:31:52 +01:00
Peter Boyle e87554296c Merge pull request #14 from mspraggs/master
Removed std::string calls from NerscIO map indexing
2015-06-09 10:31:52 +01:00
Peter Boyle 645120a400 Merge pull request #14 from mspraggs/master
Removed std::string calls from NerscIO map indexing
2015-06-09 10:31:52 +01:00
Peter Boyle e8b43944e7 Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Make.inc
2015-06-09 10:27:10 +01:00
Peter Boyle d8ddec86f7 Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Make.inc
2015-06-09 10:27:10 +01:00
Peter Boyle a73a1c1bc1 Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Make.inc
2015-06-09 10:27:10 +01:00
Peter Boyle 1048304f30 Some unary ops and coarse grid support 2015-06-09 10:26:19 +01:00
Peter Boyle 506dfd1517 Some unary ops and coarse grid support 2015-06-09 10:26:19 +01:00
Peter Boyle 1e5b015ee3 Some unary ops and coarse grid support 2015-06-09 10:26:19 +01:00
Peter Boyle f19518d564 Unary ops and coarse grid support 2015-06-09 10:25:29 +01:00
Peter Boyle 9269126fba Unary ops and coarse grid support 2015-06-09 10:25:29 +01:00
Peter Boyle 21e41638e5 Unary ops and coarse grid support 2015-06-09 10:25:29 +01:00
neo 744ac33e8b Experimental support for ARM 2015-06-09 15:46:21 +09:00
neo 6b8fe04054 Experimental support for ARM 2015-06-09 15:46:21 +09:00
neo 48bf4878c1 Experimental support for ARM 2015-06-09 15:46:21 +09:00
Peter Boyle a6ac2abb64 No compile fix after merge 2015-06-08 12:12:13 +01:00
Peter Boyle 78607950ac No compile fix after merge 2015-06-08 12:12:13 +01:00
Peter Boyle 4ae47a529e No compile fix after merge 2015-06-08 12:12:13 +01:00
Peter Boyle 98b10f587e Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Make.inc
	tests/Make.inc
	tests/Test_remez.cc
2015-06-08 12:08:09 +01:00
Peter Boyle 2f4a4489ce Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Make.inc
	tests/Make.inc
	tests/Test_remez.cc
2015-06-08 12:08:09 +01:00
Peter Boyle 3111f50f2f Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Make.inc
	tests/Make.inc
	tests/Test_remez.cc
2015-06-08 12:08:09 +01:00
Peter Boyle b0873e7ed2 Conjugate residual algorithm; some more unary functions 2015-06-08 12:04:59 +01:00
Peter Boyle 9e7035f5dc Conjugate residual algorithm; some more unary functions 2015-06-08 12:04:59 +01:00
Peter Boyle d6f1ddf99c Conjugate residual algorithm; some more unary functions 2015-06-08 12:04:59 +01:00
Peter Boyle 769ef7b0f5 sqrt 2015-06-08 12:03:36 +01:00
Peter Boyle 690397e7c6 sqrt 2015-06-08 12:03:36 +01:00
Peter Boyle 0cf2037ae1 sqrt 2015-06-08 12:03:36 +01:00
Peter Boyle 42c22d4cae Prep for multigrid 2015-06-08 12:02:53 +01:00
Peter Boyle 5bdf89e3f0 Prep for multigrid 2015-06-08 12:02:53 +01:00
Peter Boyle 5a3bc5250e Prep for multigrid 2015-06-08 12:02:53 +01:00
Peter Boyle aad51ffe3a Prep for mgrid 2015-06-08 12:02:26 +01:00
Peter Boyle ea583e2e53 Prep for mgrid 2015-06-08 12:02:26 +01:00
Peter Boyle 680abafe5d Prep for mgrid 2015-06-08 12:02:26 +01:00
Azusa Yamaguchi a8b9109cc8 multishift conjugate gradient added and a strong test: take a diagonal
but non-identity matrix
l1 0  0  0 ....
0  l2 0  0 ....
0  0  l3 0 ...
.  .   .
.  .   .
.  .   .

And apply the multishift CG to it. Sum the poles and residues.
Insist that this be the same as the exactly taken square root
where l1,l2,l3 >= 0.
2015-06-08 11:52:44 +01:00
Azusa Yamaguchi 54aec05989 multishift conjugate gradient added and a strong test: take a diagonal
but non-identity matrix
l1 0  0  0 ....
0  l2 0  0 ....
0  0  l3 0 ...
.  .   .
.  .   .
.  .   .

And apply the multishift CG to it. Sum the poles and residues.
Insist that this be the same as the exactly taken square root
where l1,l2,l3 >= 0.
2015-06-08 11:52:44 +01:00
Azusa Yamaguchi 8688ff8b3a multishift conjugate gradient added and a strong test: take a diagonal
but non-identity matrix
l1 0  0  0 ....
0  l2 0  0 ....
0  0  l3 0 ...
.  .   .
.  .   .
.  .   .

And apply the multishift CG to it. Sum the poles and residues.
Insist that this be the same as the exactly taken square root
where l1,l2,l3 >= 0.
2015-06-08 11:52:44 +01:00
Matt Spraggs e2e076d307 Removed std::string calls from NerscIO map indexing 2015-06-07 17:06:25 +01:00
Matt Spraggs 7537a2751d Removed std::string calls from NerscIO map indexing 2015-06-07 17:06:25 +01:00
Matt Spraggs cff84f09ba Removed std::string calls from NerscIO map indexing 2015-06-07 17:06:25 +01:00
Peter Boyle a263e78f8d Conjugate residual added 2015-06-05 18:16:25 +01:00
Peter Boyle 50e8b2160e Conjugate residual added 2015-06-05 18:16:25 +01:00
Peter Boyle 1a05882d7c Conjugate residual added 2015-06-05 18:16:25 +01:00
Azusa Yamaguchi 5f33cc3a95 Compile fix 2015-06-05 10:29:42 +01:00
Azusa Yamaguchi ad18df92d0 Compile fix 2015-06-05 10:29:42 +01:00
Azusa Yamaguchi 351c2905f5 Compile fix 2015-06-05 10:29:42 +01:00
Azusa Yamaguchi bb36139fc2 Fix 2015-06-05 10:21:28 +01:00
Azusa Yamaguchi d86c248d05 Fix 2015-06-05 10:21:28 +01:00
Azusa Yamaguchi 33803a3dee Fix 2015-06-05 10:21:28 +01:00
Azusa Yamaguchi cc5f518b21 Endif terminated 2015-06-05 10:19:42 +01:00
Azusa Yamaguchi 1d7f9567ee Endif terminated 2015-06-05 10:19:42 +01:00
Azusa Yamaguchi ee3031c914 Endif terminated 2015-06-05 10:19:42 +01:00
Peter Boyle 337b6e83af Rework the linop support to get different forms of red black schur solver
Moo on diag, or MooInv Moe MeeInv Meo
2015-06-05 10:17:10 +01:00
Peter Boyle f3e60a9feb Rework the linop support to get different forms of red black schur solver
Moo on diag, or MooInv Moe MeeInv Meo
2015-06-05 10:17:10 +01:00
Peter Boyle 7f6304fac3 Rework the linop support to get different forms of red black schur solver
Moo on diag, or MooInv Moe MeeInv Meo
2015-06-05 10:17:10 +01:00
Azusa Yamaguchi 8f9627520b merge to the head 2015-06-05 10:15:31 +01:00
Azusa Yamaguchi a8b86e747b merge to the head 2015-06-05 10:15:31 +01:00
Azusa Yamaguchi 58a4f32298 merge to the head 2015-06-05 10:15:31 +01:00
Azusa Yamaguchi db84b19443 Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-05 10:04:46 +01:00
Azusa Yamaguchi c05fe2706c Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-05 10:04:46 +01:00
Azusa Yamaguchi ac504bea6c Merge branch 'master' of https://github.com/paboyle/Grid 2015-06-05 10:04:46 +01:00
Azusa Yamaguchi 7d984b9547 Adding some wilson loop support 2015-06-05 10:02:36 +01:00
Azusa Yamaguchi 58cdcbb5e4 Adding some wilson loop support 2015-06-05 10:02:36 +01:00
Azusa Yamaguchi 94ea84d83f Adding some wilson loop support 2015-06-05 10:02:36 +01:00
Peter Boyle f22170ad49 comment improvement 2015-06-05 05:31:27 +01:00
Peter Boyle cadd4310f6 comment improvement 2015-06-05 05:31:27 +01:00
Peter Boyle b1b412f63c comment improvement 2015-06-05 05:31:27 +01:00
Peter Boyle 7678fbd30d PartialFraction Hw with Zolo and Tanh approx converged under CG and passed EO breakdown
and hermiticity tests.
2015-06-04 13:28:37 +01:00
Peter Boyle b9e9777912 PartialFraction Hw with Zolo and Tanh approx converged under CG and passed EO breakdown
and hermiticity tests.
2015-06-04 13:28:37 +01:00
Peter Boyle 63a61fcc2a PartialFraction Hw with Zolo and Tanh approx converged under CG and passed EO breakdown
and hermiticity tests.
2015-06-04 13:28:37 +01:00
Peter Boyle 1e4eca8321 Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/tensors/Tensor_trace.h
2015-06-04 12:17:00 +01:00
Peter Boyle 5b1ba66604 Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/tensors/Tensor_trace.h
2015-06-04 12:17:00 +01:00
Peter Boyle 21d193b1c8 Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/tensors/Tensor_trace.h
2015-06-04 12:17:00 +01:00
Peter Boyle 5aa8bf77db Mistaken commit that prevented compile ; fixing 2015-06-04 12:01:51 +01:00
Peter Boyle 51ec6b722e Mistaken commit that prevented compile ; fixing 2015-06-04 12:01:51 +01:00
Peter Boyle f6dc74501b Mistaken commit that prevented compile ; fixing 2015-06-04 12:01:51 +01:00
neo 7e47b0c6eb Corrected small compilation bug in traceIndex for iVectors 2015-06-04 19:01:43 +09:00
neo 769b27b7f2 Corrected small compilation bug in traceIndex for iVectors 2015-06-04 19:01:43 +09:00
neo b306730a57 Corrected small compilation bug in traceIndex for iVectors 2015-06-04 19:01:43 +09:00
neo 41e88c232b Merge remote-tracking branch 'upstream/master' 2015-06-04 18:30:29 +09:00
neo 4dcc5dab93 Merge remote-tracking branch 'upstream/master' 2015-06-04 18:30:29 +09:00
neo b3f871717f Merge remote-tracking branch 'upstream/master' 2015-06-04 18:30:29 +09:00
neo 949b6a7afa Added support for Ta to Lattice types 2015-06-04 18:29:55 +09:00
neo 7fc54fc904 Added support for Ta to Lattice types 2015-06-04 18:29:55 +09:00
neo 4b114fce3d Added support for Ta to Lattice types 2015-06-04 18:29:55 +09:00
neo bb73569fd6 Addedd Ta functionality to the tensor types
Merge remote-tracking branch 'upstream/master'

Conflicts:
	configure
2015-06-04 18:11:32 +09:00
neo b9edadc53e Addedd Ta functionality to the tensor types
Merge remote-tracking branch 'upstream/master'

Conflicts:
	configure
2015-06-04 18:11:32 +09:00
neo 3055d2cf2c Addedd Ta functionality to the tensor types
Merge remote-tracking branch 'upstream/master'

Conflicts:
	configure
2015-06-04 18:11:32 +09:00
Peter Boyle 54c082dc35 Allow traceIndex on a different index to distribute replicated across a vector index 2015-06-04 09:41:16 +01:00
Peter Boyle 201d6d097d Allow traceIndex on a different index to distribute replicated across a vector index 2015-06-04 09:41:16 +01:00
Peter Boyle 4a03054ef4 Allow traceIndex on a different index to distribute replicated across a vector index 2015-06-04 09:41:16 +01:00
neo c6f2ee91f6 Small modification to the configure files 2015-06-04 14:17:58 +09:00
neo ff9340d4d5 Small modification to the configure files 2015-06-04 14:17:58 +09:00
neo 5a5ee83d28 Small modification to the configure files 2015-06-04 14:17:58 +09:00
Peter Boyle 9c1ab656d4 CG Tests work for wilson kernel cont frac zolo and tanh 2015-06-04 06:02:00 +01:00
Peter Boyle 37aa74dfd2 CG Tests work for wilson kernel cont frac zolo and tanh 2015-06-04 06:02:00 +01:00
Peter Boyle dd1f5dd966 CG Tests work for wilson kernel cont frac zolo and tanh 2015-06-04 06:02:00 +01:00
Peter Boyle 1ad689e4d5 Implementing the Hw kernel continued fraction 5d overlap cases 2015-06-04 00:23:16 +01:00
Peter Boyle c327019574 Implementing the Hw kernel continued fraction 5d overlap cases 2015-06-04 00:23:16 +01:00
Peter Boyle a088a65656 Implementing the Hw kernel continued fraction 5d overlap cases 2015-06-04 00:23:16 +01:00
Peter Boyle 802e94e9ca First pass at continued fraction; solver and even odd decomposition tests pass.
Have to make ContFrac class virtual and derive end non-abstract actions for the particular
cases.
2015-06-04 00:00:45 +01:00
Peter Boyle 50bd293527 First pass at continued fraction; solver and even odd decomposition tests pass.
Have to make ContFrac class virtual and derive end non-abstract actions for the particular
cases.
2015-06-04 00:00:45 +01:00
Peter Boyle 03f4fde468 First pass at continued fraction; solver and even odd decomposition tests pass.
Have to make ContFrac class virtual and derive end non-abstract actions for the particular
cases.
2015-06-04 00:00:45 +01:00
Peter Boyle e68d087010 Assist for generating file lists contained in Make.inc files for convenience when things are added 2015-06-03 13:07:00 +01:00
Peter Boyle eaa3e6aaf6 Assist for generating file lists contained in Make.inc files for convenience when things are added 2015-06-03 13:07:00 +01:00
Peter Boyle f07a17ba2c Assist for generating file lists contained in Make.inc files for convenience when things are added 2015-06-03 13:07:00 +01:00
Peter Boyle 3254bb2c8e Make.inc needed in repo 2015-06-03 12:49:36 +01:00
Peter Boyle 4ef11d96e9 Make.inc needed in repo 2015-06-03 12:49:36 +01:00
Peter Boyle 3cfea5a09f Make.inc needed in repo 2015-06-03 12:49:36 +01:00
Peter Boyle 54b56959f5 Convenience script to build the list of headers and .cc files in the library 2015-06-03 12:47:46 +01:00
Peter Boyle 98dcb6831b Convenience script to build the list of headers and .cc files in the library 2015-06-03 12:47:46 +01:00
Peter Boyle a97954230d Convenience script to build the list of headers and .cc files in the library 2015-06-03 12:47:46 +01:00
Peter Boyle f9b070d64d Reorganise of file naming 2015-06-03 12:47:05 +01:00
Peter Boyle 4bcc319e11 Reorganise of file naming 2015-06-03 12:47:05 +01:00
Peter Boyle 1d0df449e8 Reorganise of file naming 2015-06-03 12:47:05 +01:00
Peter Boyle 6cb38dc5dc Overlap Wilson Cayley tanh & zolo 2015-06-03 11:26:54 +01:00
Peter Boyle 8fe3d4f971 Overlap Wilson Cayley tanh & zolo 2015-06-03 11:26:54 +01:00
Peter Boyle a3b599ae30 Overlap Wilson Cayley tanh & zolo 2015-06-03 11:26:54 +01:00
Peter Boyle 2b083ca987 CG test written and passes i.e. converges with small true residual
in RedBlack MpcDagMpc, Unprec MdagM and Schur red black solver for
each of.

DomainWallFermion
MobiusFermion
MobiusZolotarevFermion
ScaledShamirFermion
ScaledShamirZolotarevFermion
2015-06-03 10:54:03 +01:00
Peter Boyle 26e9b04fab CG test written and passes i.e. converges with small true residual
in RedBlack MpcDagMpc, Unprec MdagM and Schur red black solver for
each of.

DomainWallFermion
MobiusFermion
MobiusZolotarevFermion
ScaledShamirFermion
ScaledShamirZolotarevFermion
2015-06-03 10:54:03 +01:00
Peter Boyle 84b5c7217d CG test written and passes i.e. converges with small true residual
in RedBlack MpcDagMpc, Unprec MdagM and Schur red black solver for
each of.

DomainWallFermion
MobiusFermion
MobiusZolotarevFermion
ScaledShamirFermion
ScaledShamirZolotarevFermion
2015-06-03 10:54:03 +01:00
Peter Boyle c659c76053 Scaled Shamir and Scaled Shamir Zolotarev aliases for special cases of Mobius. 2015-06-03 09:51:06 +01:00
Peter Boyle 343d039b37 Scaled Shamir and Scaled Shamir Zolotarev aliases for special cases of Mobius. 2015-06-03 09:51:06 +01:00
Peter Boyle 260011670e Scaled Shamir and Scaled Shamir Zolotarev aliases for special cases of Mobius. 2015-06-03 09:51:06 +01:00
Peter Boyle 68e26140ee Mobius Caley form, Mobius Zolotarev operators. Pass Even Odd vs unprec test and hermiticity checks
in tests/Grid_any_evenodd.cc; will work on inversion tests shortly.
2015-06-03 09:36:26 +01:00
Peter Boyle 5916386242 Mobius Caley form, Mobius Zolotarev operators. Pass Even Odd vs unprec test and hermiticity checks
in tests/Grid_any_evenodd.cc; will work on inversion tests shortly.
2015-06-03 09:36:26 +01:00
Peter Boyle 1fcacef239 Mobius Caley form, Mobius Zolotarev operators. Pass Even Odd vs unprec test and hermiticity checks
in tests/Grid_any_evenodd.cc; will work on inversion tests shortly.
2015-06-03 09:36:26 +01:00
Peter Boyle 494d2b8b61 Reorg; moving prec/unprec/schur CG for Wilson and DWF into tests as these are really tests and not benchmarks
(no performance reports, only convergence test).
2015-06-02 17:25:26 +01:00
Peter Boyle 35fdba81dd Reorg; moving prec/unprec/schur CG for Wilson and DWF into tests as these are really tests and not benchmarks
(no performance reports, only convergence test).
2015-06-02 17:25:26 +01:00
Peter Boyle 69f4d58381 Reorg; moving prec/unprec/schur CG for Wilson and DWF into tests as these are really tests and not benchmarks
(no performance reports, only convergence test).
2015-06-02 17:25:26 +01:00
Peter Boyle 0bc004de7c Domain wall fermions now invert ; have the basis set up for
Tanh/Zolo * (Cayley/PartFrac/ContFrac) * (Mobius/Shamir/Wilson)
Approx        Representation               Kernel.

All are done with space-time taking part in checkerboarding, Ls uncheckerboarded

Have only so far tested the Domain Wall limit of mobius, and at that only checked
that it
i)  Inverts
ii) 5dim DW == Ls copies of 4dim D2
iii) MeeInv Mee == 1
iv) Meo+Mee+Moe+Moo == M unprec.
v) MpcDagMpc is hermitan
vi) Mdag is the adjoint of M between stochastic vectors.

That said, the RB schur solve, RB MpcDagMpc solve, Unprec solve
all converge and the true residual becomes small; so pretty good tests.
2015-06-02 16:57:12 +01:00
Peter Boyle 2583570e17 Domain wall fermions now invert ; have the basis set up for
Tanh/Zolo * (Cayley/PartFrac/ContFrac) * (Mobius/Shamir/Wilson)
Approx        Representation               Kernel.

All are done with space-time taking part in checkerboarding, Ls uncheckerboarded

Have only so far tested the Domain Wall limit of mobius, and at that only checked
that it
i)  Inverts
ii) 5dim DW == Ls copies of 4dim D2
iii) MeeInv Mee == 1
iv) Meo+Mee+Moe+Moo == M unprec.
v) MpcDagMpc is hermitan
vi) Mdag is the adjoint of M between stochastic vectors.

That said, the RB schur solve, RB MpcDagMpc solve, Unprec solve
all converge and the true residual becomes small; so pretty good tests.
2015-06-02 16:57:12 +01:00
Peter Boyle 3845f267cb Domain wall fermions now invert ; have the basis set up for
Tanh/Zolo * (Cayley/PartFrac/ContFrac) * (Mobius/Shamir/Wilson)
Approx        Representation               Kernel.

All are done with space-time taking part in checkerboarding, Ls uncheckerboarded

Have only so far tested the Domain Wall limit of mobius, and at that only checked
that it
i)  Inverts
ii) 5dim DW == Ls copies of 4dim D2
iii) MeeInv Mee == 1
iv) Meo+Mee+Moe+Moo == M unprec.
v) MpcDagMpc is hermitan
vi) Mdag is the adjoint of M between stochastic vectors.

That said, the RB schur solve, RB MpcDagMpc solve, Unprec solve
all converge and the true residual becomes small; so pretty good tests.
2015-06-02 16:57:12 +01:00
Azusa Yamaguchi 8f87950dc1 FIx miistake 2015-06-01 12:26:20 +01:00
Azusa Yamaguchi 8bd9fb4427 FIx miistake 2015-06-01 12:26:20 +01:00
Azusa Yamaguchi c851d0e705 FIx miistake 2015-06-01 12:26:20 +01:00
Azusa Yamaguchi 00bf7f4d42 Const safety 2015-06-01 12:25:59 +01:00
Azusa Yamaguchi 4c617c3643 Const safety 2015-06-01 12:25:59 +01:00
Azusa Yamaguchi b00a40dd65 Const safety 2015-06-01 12:25:59 +01:00
Azusa Yamaguchi eb28a64c3c No compile fix on mpi target 2015-05-31 22:50:03 +01:00
Azusa Yamaguchi 9ea64767b0 No compile fix on mpi target 2015-05-31 22:50:03 +01:00
Azusa Yamaguchi 12c2562b96 No compile fix on mpi target 2015-05-31 22:50:03 +01:00
azusayamaguchi 328aa9ae49 Bug in Makefile.am fixed 2015-05-31 18:50:08 +01:00
azusayamaguchi ce8c7a77b6 Bug in Makefile.am fixed 2015-05-31 18:50:08 +01:00
azusayamaguchi f2c70804ca Bug in Makefile.am fixed 2015-05-31 18:50:08 +01:00
Peter Boyle 6f725748ed Updated line counter 2015-05-31 15:11:09 +01:00
Peter Boyle 8272e15bd6 Updated line counter 2015-05-31 15:11:09 +01:00
Peter Boyle 46d1bae46a Updated line counter 2015-05-31 15:11:09 +01:00
Peter Boyle 66d997e031 Large scale change to support 5d fermion formulations.
Have 5d replicated wilson with 4d gauge working and matrix regressing
to Ls copies of wilson.
2015-05-31 15:09:02 +01:00
Peter Boyle a75b6f6e78 Large scale change to support 5d fermion formulations.
Have 5d replicated wilson with 4d gauge working and matrix regressing
to Ls copies of wilson.
2015-05-31 15:09:02 +01:00
Peter Boyle 5644ab1e19 Large scale change to support 5d fermion formulations.
Have 5d replicated wilson with 4d gauge working and matrix regressing
to Ls copies of wilson.
2015-05-31 15:09:02 +01:00
Peter Boyle 8c357dca8b Integer wrap problem fixed. 2015-05-29 14:11:34 +01:00
Peter Boyle 9cfc180334 Integer wrap problem fixed. 2015-05-29 14:11:34 +01:00
Peter Boyle 59db857ad1 Integer wrap problem fixed. 2015-05-29 14:11:34 +01:00
neo 661c7e3e37 Merge remote-tracking branch 'upstream/master' 2015-05-29 11:41:39 +09:00
neo 575e6001f3 Merge remote-tracking branch 'upstream/master' 2015-05-29 11:41:39 +09:00
neo 727bc32150 Merge remote-tracking branch 'upstream/master' 2015-05-29 11:41:39 +09:00
neo 96ad352741 Some modifications to the configure to check SIMD support 2015-05-29 11:41:02 +09:00
neo 4403e117a7 Some modifications to the configure to check SIMD support 2015-05-29 11:41:02 +09:00
neo f41e4e8b1b Some modifications to the configure to check SIMD support 2015-05-29 11:41:02 +09:00
Peter Boyle 62dccb3247 Weak scale the benchmarks automatically. 2015-05-28 13:47:01 +01:00
Peter Boyle 445e38acf6 Weak scale the benchmarks automatically. 2015-05-28 13:47:01 +01:00
Peter Boyle 67fa5691e5 Weak scale the benchmarks automatically. 2015-05-28 13:47:01 +01:00
Peter Boyle c0c1ebe757 Works now with Clang-avx, Clang-sse and ICPC-avx, ICPC-sse 2015-05-28 11:35:43 +01:00
Peter Boyle a5c3424cfb Works now with Clang-avx, Clang-sse and ICPC-avx, ICPC-sse 2015-05-28 11:35:43 +01:00
Peter Boyle 62a7ca462f Works now with Clang-avx, Clang-sse and ICPC-avx, ICPC-sse 2015-05-28 11:35:43 +01:00
Peter Boyle bd81ac0f17 Improving the reduction to go through our on permute.
Must also do this for avx512
2015-05-27 16:07:17 +01:00
Peter Boyle e8be96bfe7 Improving the reduction to go through our on permute.
Must also do this for avx512
2015-05-27 16:07:17 +01:00
Peter Boyle b72ca15bd2 Improving the reduction to go through our on permute.
Must also do this for avx512
2015-05-27 16:07:17 +01:00
neo d8b05e001d Check at configure time if CPU supports the requested SIMD optimization 2015-05-27 18:30:11 +09:00
neo be66fdcfab Check at configure time if CPU supports the requested SIMD optimization 2015-05-27 18:30:11 +09:00
neo 19bd6f103a Check at configure time if CPU supports the requested SIMD optimization 2015-05-27 18:30:11 +09:00
neo 9fd6506d1f Included Gpermute in the new Grid_simd.h file style.
Now tested for SSE4. OK
2015-05-27 12:11:44 +09:00
neo 28ac219d81 Included Gpermute in the new Grid_simd.h file style.
Now tested for SSE4. OK
2015-05-27 12:11:44 +09:00
neo 64753ea633 Included Gpermute in the new Grid_simd.h file style.
Now tested for SSE4. OK
2015-05-27 12:11:44 +09:00
neo 75442e48ce Added a .gitignore file to eliminate autoconf files from commits 2015-05-27 11:10:51 +09:00
neo 4e3f4104ab Added a .gitignore file to eliminate autoconf files from commits 2015-05-27 11:10:51 +09:00
neo 3cb34af82c Added a .gitignore file to eliminate autoconf files from commits 2015-05-27 11:10:51 +09:00
Guido Cossu 26ec41288d Corrected AVX regression error. Tested. 2015-05-27 10:49:33 +09:00
Guido Cossu 2ccbff6c6c Corrected AVX regression error. Tested. 2015-05-27 10:49:33 +09:00
Guido Cossu 8abf6403d5 Corrected AVX regression error. Tested. 2015-05-27 10:49:33 +09:00
neo 13707e0808 Merge remote-tracking branch 'upstream/master'
Conflicts:
	Makefile.in
2015-05-27 10:41:33 +09:00
neo 9344d41ac5 Merge remote-tracking branch 'upstream/master'
Conflicts:
	Makefile.in
2015-05-27 10:41:33 +09:00
neo b99f2279c3 Merge remote-tracking branch 'upstream/master'
Conflicts:
	Makefile.in
2015-05-27 10:41:33 +09:00
neo 12ae11ef62 Adding support for doxygen generation 2015-05-27 10:34:56 +09:00
neo 538bc41bbb Adding support for doxygen generation 2015-05-27 10:34:56 +09:00
neo da46b56e85 Adding support for doxygen generation 2015-05-27 10:34:56 +09:00
Peter Boyle e468d75286 Auto gen files should never have been committed, but making everyone run
aclocal, automake, autoconf is a pain in the ass.
2015-05-26 22:20:40 +01:00
Peter Boyle b6a28f1de7 Auto gen files should never have been committed, but making everyone run
aclocal, automake, autoconf is a pain in the ass.
2015-05-26 22:20:40 +01:00
Peter Boyle 74f138c5e8 Auto gen files should never have been committed, but making everyone run
aclocal, automake, autoconf is a pain in the ass.
2015-05-26 22:20:40 +01:00
Peter Boyle 22d073eb2b Simd revert to Guido's commit. I edited concurrently and things went bad. 2015-05-26 22:20:09 +01:00
Peter Boyle 6d2e056187 Simd revert to Guido's commit. I edited concurrently and things went bad. 2015-05-26 22:20:09 +01:00
Peter Boyle ccd47011b9 Simd revert to Guido's commit. I edited concurrently and things went bad. 2015-05-26 22:20:09 +01:00
Peter Boyle ccf10a973a Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Grid_simd.h
2015-05-26 20:04:08 +01:00
Peter Boyle fb37b57c2d Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Grid_simd.h
2015-05-26 20:04:08 +01:00
Peter Boyle 48bb3ab4e7 Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/Grid_simd.h
2015-05-26 20:04:08 +01:00
Peter Boyle 6ef0096dc9 Strip out the dslash kernel implementation 2015-05-26 19:55:18 +01:00
Peter Boyle 5e72e4c0d9 Strip out the dslash kernel implementation 2015-05-26 19:55:18 +01:00
Peter Boyle bfb1cd36e2 Strip out the dslash kernel implementation 2015-05-26 19:55:18 +01:00
Peter Boyle 20100d0a40 Hand unrolled version of dslash in a separate class.
Useful to compare; raises Intel compiler from 9GFlop/s to 17.5 Gflops.
                   on ivybridge core. Raises Clang form 14.5 to 17.5
2015-05-26 19:54:03 +01:00
Peter Boyle a32ac287bb Hand unrolled version of dslash in a separate class.
Useful to compare; raises Intel compiler from 9GFlop/s to 17.5 Gflops.
                   on ivybridge core. Raises Clang form 14.5 to 17.5
2015-05-26 19:54:03 +01:00
Peter Boyle 840754dd42 Hand unrolled version of dslash in a separate class.
Useful to compare; raises Intel compiler from 9GFlop/s to 17.5 Gflops.
                   on ivybridge core. Raises Clang form 14.5 to 17.5
2015-05-26 19:54:03 +01:00
neo c04cad92ac More cleanup of Grid_simd.h 2015-05-26 13:54:34 +09:00
neo fb5d72973e More cleanup of Grid_simd.h 2015-05-26 13:54:34 +09:00
neo 500f6ed0c5 More cleanup of Grid_simd.h 2015-05-26 13:54:34 +09:00
neo aff978f60a Cleaning up simd files 2015-05-26 13:31:10 +09:00
neo 3f576830f9 Cleaning up simd files 2015-05-26 13:31:10 +09:00
neo 4dbaa389c8 Cleaning up simd files 2015-05-26 13:31:10 +09:00
neo 9ad6d0c65f Merge remote-tracking branch 'upstream/master'
Conflicts:
	lib/math/Grid_math_tensors.h
	lib/simd/Grid_vector_types.h
2015-05-26 13:14:06 +09:00
neo 257aa92421 Merge remote-tracking branch 'upstream/master'
Conflicts:
	lib/math/Grid_math_tensors.h
	lib/simd/Grid_vector_types.h
2015-05-26 13:14:06 +09:00
neo 48cc816136 Merge remote-tracking branch 'upstream/master'
Conflicts:
	lib/math/Grid_math_tensors.h
	lib/simd/Grid_vector_types.h
2015-05-26 13:14:06 +09:00
neo 377083e6ae checked performance of new vector libaries.
Added check for c++11 support on the configure.ac
2015-05-26 12:02:54 +09:00
neo ece86f717b checked performance of new vector libaries.
Added check for c++11 support on the configure.ac
2015-05-26 12:02:54 +09:00
neo 1a24801246 checked performance of new vector libaries.
Added check for c++11 support on the configure.ac
2015-05-26 12:02:54 +09:00
Peter Boyle c2ffb1a098 Makefile update 2015-05-25 14:43:08 +01:00
Peter Boyle 3a6ff2d7b8 Makefile update 2015-05-25 14:43:08 +01:00
Peter Boyle 37721572e7 Makefile update 2015-05-25 14:43:08 +01:00
Peter Boyle d7f5172860 Schur complement based red-black inversion working 2015-05-25 13:47:12 +01:00
Peter Boyle 2ae6214104 Schur complement based red-black inversion working 2015-05-25 13:47:12 +01:00
Peter Boyle 489b1b9633 Schur complement based red-black inversion working 2015-05-25 13:47:12 +01:00
Peter Boyle 201a110c51 Better EO support letting Schur solver work 2015-05-25 13:46:28 +01:00
Peter Boyle 1a9841a0f1 Better EO support letting Schur solver work 2015-05-25 13:46:28 +01:00
Peter Boyle ea3240ad55 Better EO support letting Schur solver work 2015-05-25 13:46:28 +01:00
Peter Boyle 1d4b1c48cc Most cosmetic 2015-05-25 13:45:32 +01:00
Peter Boyle 55685b7cf5 Most cosmetic 2015-05-25 13:45:32 +01:00
Peter Boyle 956e728b40 Most cosmetic 2015-05-25 13:45:32 +01:00
Peter Boyle f6cade41b4 Better checkerboard tracking. 2015-05-25 13:45:08 +01:00
Peter Boyle 3358a77c7a Better checkerboard tracking. 2015-05-25 13:45:08 +01:00
Peter Boyle 94d679c4e6 Better checkerboard tracking. 2015-05-25 13:45:08 +01:00
Peter Boyle 6e76f0c6cd move constants into red black 2015-05-25 13:44:35 +01:00
Peter Boyle bc947477f3 move constants into red black 2015-05-25 13:44:35 +01:00
Peter Boyle 616f871735 move constants into red black 2015-05-25 13:44:35 +01:00
Peter Boyle 55ad54e0ff Updates now schur red black solver working 2015-05-25 13:43:58 +01:00
Peter Boyle 29f72292ba Updates now schur red black solver working 2015-05-25 13:43:58 +01:00
Peter Boyle 624c0ac3ef Updates now schur red black solver working 2015-05-25 13:43:58 +01:00
Peter Boyle 00ee531005 Herm op 2015-05-25 13:42:36 +01:00
Peter Boyle 9b5633ff4f Herm op 2015-05-25 13:42:36 +01:00
Peter Boyle ac99832d21 Herm op 2015-05-25 13:42:36 +01:00
Peter Boyle ca30116144 red black fix 2015-05-25 13:42:12 +01:00
Peter Boyle 17a06af1ff red black fix 2015-05-25 13:42:12 +01:00
Peter Boyle d30c013721 red black fix 2015-05-25 13:42:12 +01:00
Peter Boyle 41ba13f951 Merge branch 'master' of https://github.com/paboyle/Grid 2015-05-23 09:36:08 +01:00
Peter Boyle c25016030c Merge branch 'master' of https://github.com/paboyle/Grid 2015-05-23 09:36:08 +01:00
Peter Boyle 5cf285bce9 Merge branch 'master' of https://github.com/paboyle/Grid 2015-05-23 09:36:08 +01:00
Peter Boyle 31a40fa37f Added 2015-05-23 09:36:01 +01:00
Peter Boyle 2806273340 Added 2015-05-23 09:36:01 +01:00
Peter Boyle 613a73b1b6 Added 2015-05-23 09:36:01 +01:00
Peter Boyle 602248d5fe Extra targets 2015-05-23 09:35:37 +01:00
Peter Boyle 73ee36c48d Extra targets 2015-05-23 09:35:37 +01:00
Peter Boyle f681baa9cd Extra targets 2015-05-23 09:35:37 +01:00
Peter Boyle 2ba641b25e More targets 2015-05-23 09:34:50 +01:00
Peter Boyle b8fdb65fbf More targets 2015-05-23 09:34:50 +01:00
Peter Boyle d21411ead9 More targets 2015-05-23 09:34:50 +01:00
Peter Boyle 2d30e82dcb Improving even odd sector; lot of work and through required cleaning this 2015-05-23 09:34:16 +01:00
Peter Boyle 65f2e6b269 Improving even odd sector; lot of work and through required cleaning this 2015-05-23 09:34:16 +01:00
Peter Boyle 64fcbd0387 Improving even odd sector; lot of work and through required cleaning this 2015-05-23 09:34:16 +01:00
Peter Boyle 0b165afd9e Rely on default constructors 2015-05-23 09:33:42 +01:00
Peter Boyle d07a5c084d Rely on default constructors 2015-05-23 09:33:42 +01:00
Peter Boyle bef9bf0d38 Rely on default constructors 2015-05-23 09:33:42 +01:00
Peter Boyle 3954792f37 Better pragma use 2015-05-23 09:32:37 +01:00
Peter Boyle a2928321b6 Better pragma use 2015-05-23 09:32:37 +01:00
Peter Boyle eadfb5be67 Better pragma use 2015-05-23 09:32:37 +01:00
Peter Boyle 8c7b5f5d3b Cosmetic 2015-05-23 09:31:15 +01:00
Peter Boyle 764732944f Cosmetic 2015-05-23 09:31:15 +01:00
Peter Boyle 33737ef57a Cosmetic 2015-05-23 09:31:15 +01:00
Peter Boyle be8b4f89d6 Iterator required 2015-05-23 09:30:28 +01:00
Peter Boyle ae58a9ada2 Iterator required 2015-05-23 09:30:28 +01:00
Peter Boyle 32c3f16f95 Iterator required 2015-05-23 09:30:28 +01:00
neo 57feda4328 Completed implementation of new Grid_simd classes
Tested performance for SSE4, Ok.
AVX1/2, AVX512 yet untested
2015-05-22 17:33:15 +09:00
neo 1c862dc15b Completed implementation of new Grid_simd classes
Tested performance for SSE4, Ok.
AVX1/2, AVX512 yet untested
2015-05-22 17:33:15 +09:00
neo 9e29ac6549 Completed implementation of new Grid_simd classes
Tested performance for SSE4, Ok.
AVX1/2, AVX512 yet untested
2015-05-22 17:33:15 +09:00
Peter Boyle a11850d2fb Merge pull request #7 from coppolachan/master
Added full support for SSE4
2015-05-22 05:58:59 +01:00
Peter Boyle 96e5c5c6ca Merge pull request #7 from coppolachan/master
Added full support for SSE4
2015-05-22 05:58:59 +01:00
Peter Boyle 24c68a697b Merge pull request #7 from coppolachan/master
Added full support for SSE4
2015-05-22 05:58:59 +01:00
Peter Boyle e0cc5ba920 Streaming store option ifdef 2015-05-21 06:47:05 +01:00
Peter Boyle d8061afe24 Streaming store option ifdef 2015-05-21 06:47:05 +01:00
Peter Boyle 9601890549 Streaming store option ifdef 2015-05-21 06:47:05 +01:00
Peter Boyle 1b9ecbac3b Compile time select if we do the streaming store copy. Relies on Clang++ eliminating object copies,
and other compliers do not necessarily cope.
2015-05-21 06:39:00 +01:00
Peter Boyle 874b2eb32d Compile time select if we do the streaming store copy. Relies on Clang++ eliminating object copies,
and other compliers do not necessarily cope.
2015-05-21 06:39:00 +01:00
Peter Boyle 1559dd4adc Compile time select if we do the streaming store copy. Relies on Clang++ eliminating object copies,
and other compliers do not necessarily cope.
2015-05-21 06:39:00 +01:00
Peter Boyle ac0941be9a adding two routines containing only a single operation so I can easily see the assembly dump 2015-05-21 06:37:46 +01:00
Peter Boyle f1fb92fd01 adding two routines containing only a single operation so I can easily see the assembly dump 2015-05-21 06:37:46 +01:00
Peter Boyle 22bfbd0f8d adding two routines containing only a single operation so I can easily see the assembly dump 2015-05-21 06:37:46 +01:00
Peter Boyle fb159e1cff Minor change 2015-05-21 06:37:20 +01:00
Peter Boyle 3e1d1aff18 Minor change 2015-05-21 06:37:20 +01:00
Peter Boyle 3a441c3e94 Minor change 2015-05-21 06:37:20 +01:00
Peter Boyle 8bc0033326 useful to dump assembler 2015-05-21 06:36:47 +01:00
Peter Boyle c96af471ee useful to dump assembler 2015-05-21 06:36:47 +01:00
Peter Boyle d4ca8647dc useful to dump assembler 2015-05-21 06:36:47 +01:00
Peter Boyle db786fac13 Didn't like a print statement 2015-05-21 06:36:15 +01:00
Peter Boyle 57a01e6bbb Didn't like a print statement 2015-05-21 06:36:15 +01:00
Peter Boyle d0d41b8bce Didn't like a print statement 2015-05-21 06:36:15 +01:00
Peter Boyle 046485a7bb better comms benchmarking 2015-05-21 06:35:46 +01:00
Peter Boyle d806581666 better comms benchmarking 2015-05-21 06:35:46 +01:00
Peter Boyle 341096dce8 better comms benchmarking 2015-05-21 06:35:46 +01:00
Peter Boyle 9058135da0 Unroll pragma abstraction 2015-05-21 06:34:33 +01:00
Peter Boyle 35055ed5c1 Unroll pragma abstraction 2015-05-21 06:34:33 +01:00
Peter Boyle 34960ca50c Unroll pragma abstraction 2015-05-21 06:34:33 +01:00
neo f8d8958884 Merge remote-tracking branch 'upstream/master'
Conflicts:
	lib/simd/Grid_vector_types.h
	tests/Makefile.am
2015-05-20 17:32:46 +09:00
neo 9098d7d0a3 Merge remote-tracking branch 'upstream/master'
Conflicts:
	lib/simd/Grid_vector_types.h
	tests/Makefile.am
2015-05-20 17:32:46 +09:00
neo d03c4e5901 Merge remote-tracking branch 'upstream/master'
Conflicts:
	lib/simd/Grid_vector_types.h
	tests/Makefile.am
2015-05-20 17:32:46 +09:00
neo e529210f43 Implemented all SSE4 functions.
A test code Grid_simd_new.cc has been created to test the new class.
Tests are all OK.
2015-05-20 17:22:40 +09:00
neo 3a3f54932a Implemented all SSE4 functions.
A test code Grid_simd_new.cc has been created to test the new class.
Tests are all OK.
2015-05-20 17:22:40 +09:00
neo cf7be0e461 Implemented all SSE4 functions.
A test code Grid_simd_new.cc has been created to test the new class.
Tests are all OK.
2015-05-20 17:22:40 +09:00
Peter Boyle 8fdff33b3a Merging in
Merge branch 'master' of https://github.com/paboyle/Grid
2015-05-19 21:30:13 +01:00
Peter Boyle dc4014668d Merging in
Merge branch 'master' of https://github.com/paboyle/Grid
2015-05-19 21:30:13 +01:00
Peter Boyle 221902a882 Merging in
Merge branch 'master' of https://github.com/paboyle/Grid
2015-05-19 21:30:13 +01:00
Peter Boyle 91ed085ca4 Build a simple kernel to compare intel compiler and clang in simple environment 2015-05-19 21:29:40 +01:00
Peter Boyle 3f57662cd0 Build a simple kernel to compare intel compiler and clang in simple environment 2015-05-19 21:29:40 +01:00
Peter Boyle d3931111fb Build a simple kernel to compare intel compiler and clang in simple environment 2015-05-19 21:29:40 +01:00
Peter Boyle efc0d1e0b9 Reworking to keep intel compiler happy 2015-05-19 21:29:07 +01:00
Peter Boyle b562b50196 Reworking to keep intel compiler happy 2015-05-19 21:29:07 +01:00
Peter Boyle a21036e69a Reworking to keep intel compiler happy 2015-05-19 21:29:07 +01:00
Peter Boyle 2d8b5a8191 Optimisation... 2015-05-19 15:50:47 +01:00
Peter Boyle 46ab8edf30 Optimisation... 2015-05-19 15:50:47 +01:00
Peter Boyle 8220794c44 Optimisation... 2015-05-19 15:50:47 +01:00
Peter Boyle b520694b00 Merge branch 'coppolachan-master' 2015-05-19 15:05:32 +01:00
Peter Boyle 5f0530b68a Merge branch 'coppolachan-master' 2015-05-19 15:05:32 +01:00
Peter Boyle 7571a6b021 Merge branch 'coppolachan-master' 2015-05-19 15:05:32 +01:00
Peter Boyle 3fe7275332 Merged
Merge branch 'master' of https://github.com/coppolachan/Grid into coppolachan-master

Conflicts:
	lib/simd/Grid_vector_types.h
2015-05-19 15:05:07 +01:00
Peter Boyle 3d66d00313 Merged
Merge branch 'master' of https://github.com/coppolachan/Grid into coppolachan-master

Conflicts:
	lib/simd/Grid_vector_types.h
2015-05-19 15:05:07 +01:00
Peter Boyle fde7f8d6b9 Merged
Merge branch 'master' of https://github.com/coppolachan/Grid into coppolachan-master

Conflicts:
	lib/simd/Grid_vector_types.h
2015-05-19 15:05:07 +01:00
azusayamaguchi ee8cf77071 Merge branch 'master' of https://github.com/paboyle/Grid 2015-05-19 14:55:26 +01:00
azusayamaguchi a4b3bc7714 Merge branch 'master' of https://github.com/paboyle/Grid 2015-05-19 14:55:26 +01:00
azusayamaguchi 2d2da8364f Merge branch 'master' of https://github.com/paboyle/Grid 2015-05-19 14:55:26 +01:00
azusayamaguchi c8c74e591f Add messages to get the number of threads for openmp 2015-05-19 14:54:42 +01:00
azusayamaguchi 592cec72e2 Add messages to get the number of threads for openmp 2015-05-19 14:54:42 +01:00
azusayamaguchi 91f29d4a68 Add messages to get the number of threads for openmp 2015-05-19 14:54:42 +01:00
Peter Boyle a6e1ea216d Got unpreconditioned conjugate gradient to run and converge on a random (uniform random,
not even SU(3) for now) gauge field. Convergence history is correctly indepdendent of decomposition
on 1,2,4,8,16 mpi tasks.
Found a couple of simd bugs which required fixed and enhanced the Grid_simd.cc test suite.
Implemented the Mdag, M, MdagM, Meooe Mooee schur type stuff in the wilson dop.
2015-05-19 13:57:35 +01:00
Peter Boyle ffc00caea3 Got unpreconditioned conjugate gradient to run and converge on a random (uniform random,
not even SU(3) for now) gauge field. Convergence history is correctly indepdendent of decomposition
on 1,2,4,8,16 mpi tasks.
Found a couple of simd bugs which required fixed and enhanced the Grid_simd.cc test suite.
Implemented the Mdag, M, MdagM, Meooe Mooee schur type stuff in the wilson dop.
2015-05-19 13:57:35 +01:00
Peter Boyle 4dba8522a1 Got unpreconditioned conjugate gradient to run and converge on a random (uniform random,
not even SU(3) for now) gauge field. Convergence history is correctly indepdendent of decomposition
on 1,2,4,8,16 mpi tasks.
Found a couple of simd bugs which required fixed and enhanced the Grid_simd.cc test suite.
Implemented the Mdag, M, MdagM, Meooe Mooee schur type stuff in the wilson dop.
2015-05-19 13:57:35 +01:00
neo 7fb3221d47 Partial implementation of the vector types SIMD
Implementing SSE4 now
A systematic series of tests must be written.
2015-05-19 17:21:17 +09:00
neo b29caead32 Partial implementation of the vector types SIMD
Implementing SSE4 now
A systematic series of tests must be written.
2015-05-19 17:21:17 +09:00
neo 74e91cd925 Partial implementation of the vector types SIMD
Implementing SSE4 now
A systematic series of tests must be written.
2015-05-19 17:21:17 +09:00
neo 639fd05239 Added check of mpfr and gmp at configure time
It generates automatically the linker flags or complains if not found.
2015-05-19 13:54:55 +09:00
neo 4cadf11d1d Added check of mpfr and gmp at configure time
It generates automatically the linker flags or complains if not found.
2015-05-19 13:54:55 +09:00
neo baa382f055 Added check of mpfr and gmp at configure time
It generates automatically the linker flags or complains if not found.
2015-05-19 13:54:55 +09:00
neo d6887beead Merging with upstream 2015-05-19 13:36:03 +09:00
neo b5af3fbe45 Merging with upstream 2015-05-19 13:36:03 +09:00
neo 7ad705066d Merging with upstream 2015-05-19 13:36:03 +09:00
Peter Boyle 6f387b4916 Merge branch 'coppolachan-master' 2015-05-18 16:36:58 +01:00
Peter Boyle c7314e526e Merge branch 'coppolachan-master' 2015-05-18 16:36:58 +01:00
Peter Boyle 05d862782f Merge branch 'coppolachan-master' 2015-05-18 16:36:58 +01:00
Peter Boyle 9bfe0e63f4 lib/algorithms/approx/bigfloat.h 2015-05-18 16:35:48 +01:00
Peter Boyle f9a8377fe6 lib/algorithms/approx/bigfloat.h 2015-05-18 16:35:48 +01:00
Peter Boyle 3f17423d36 lib/algorithms/approx/bigfloat.h 2015-05-18 16:35:48 +01:00
Peter Boyle 30494bd96d Merge branch 'master' of https://github.com/coppolachan/Grid into coppolachan-master
Conflicts:
	lib/algorithms/approx/bigfloat.h
2015-05-18 16:34:21 +01:00
Peter Boyle cf9bbee256 Merge branch 'master' of https://github.com/coppolachan/Grid into coppolachan-master
Conflicts:
	lib/algorithms/approx/bigfloat.h
2015-05-18 16:34:21 +01:00
Peter Boyle 05f1419df4 Merge branch 'master' of https://github.com/coppolachan/Grid into coppolachan-master
Conflicts:
	lib/algorithms/approx/bigfloat.h
2015-05-18 16:34:21 +01:00
Peter Boyle 6f038a7f6d Convience function 2015-05-18 16:28:29 +01:00
Peter Boyle a19deba26c Convience function 2015-05-18 16:28:29 +01:00
Peter Boyle 0b3721502e Convience function 2015-05-18 16:28:29 +01:00
Peter Boyle 193fd5532f Remez tested 2015-05-18 12:09:25 +01:00
Peter Boyle 2843264bd8 Remez tested 2015-05-18 12:09:25 +01:00
Peter Boyle 17835c6f42 Remez tested 2015-05-18 12:09:25 +01:00
neo fa1dc5e448 Minor modification to the configure.ac
Enables silent rules (use make V=1 to override)
Prints a summary after configure is completed
2015-05-18 17:15:14 +09:00
neo 17e4e478cd Minor modification to the configure.ac
Enables silent rules (use make V=1 to override)
Prints a summary after configure is completed
2015-05-18 17:15:14 +09:00
neo 99aecf1f2e Minor modification to the configure.ac
Enables silent rules (use make V=1 to override)
Prints a summary after configure is completed
2015-05-18 17:15:14 +09:00
neo 6d2accba7b Corrected some compilation errors (zolotarev.h) and SSE4 vsplat and conj to make cshift test pass. 2015-05-18 16:48:14 +09:00
neo cee363e28c Corrected some compilation errors (zolotarev.h) and SSE4 vsplat and conj to make cshift test pass. 2015-05-18 16:48:14 +09:00
neo b4cd37276b Corrected some compilation errors (zolotarev.h) and SSE4 vsplat and conj to make cshift test pass. 2015-05-18 16:48:14 +09:00
Peter Boyle 1887c77498 Getting closer to having a wilson solver... introducing a first and untested
cut at Conjugate gradient. Also copied in Remez, Zolotarev, Chebyshev from
Mike Clark, Tony Kennedy and my BFM package respectively since we know we will
need these. I wanted the structure of

algorithms/approx
algorithms/iterative

etc.. to start taking shape.
2015-05-18 07:47:05 +01:00
Peter Boyle d0e4673a3f Getting closer to having a wilson solver... introducing a first and untested
cut at Conjugate gradient. Also copied in Remez, Zolotarev, Chebyshev from
Mike Clark, Tony Kennedy and my BFM package respectively since we know we will
need these. I wanted the structure of

algorithms/approx
algorithms/iterative

etc.. to start taking shape.
2015-05-18 07:47:05 +01:00
Peter Boyle 11cb3e9a01 Getting closer to having a wilson solver... introducing a first and untested
cut at Conjugate gradient. Also copied in Remez, Zolotarev, Chebyshev from
Mike Clark, Tony Kennedy and my BFM package respectively since we know we will
need these. I wanted the structure of

algorithms/approx
algorithms/iterative

etc.. to start taking shape.
2015-05-18 07:47:05 +01:00
Peter Boyle 6eb8dfd902 Working towards solvers 2015-05-17 00:19:03 +01:00
Peter Boyle 8e99e4671f Working towards solvers 2015-05-17 00:19:03 +01:00
Peter Boyle 7992346190 Working towards solvers 2015-05-17 00:19:03 +01:00
Peter Boyle e841395dfd Updating preparing for solvers etc.. 2015-05-16 23:35:08 +01:00
Peter Boyle dc6b6bdc96 Updating preparing for solvers etc.. 2015-05-16 23:35:08 +01:00
Peter Boyle bf7ab0da7a Updating preparing for solvers etc.. 2015-05-16 23:35:08 +01:00
Peter Boyle cf99a1f37d Better build automation 2015-05-16 07:16:45 +01:00
Peter Boyle 9b0aae665f Better build automation 2015-05-16 07:16:45 +01:00
Peter Boyle 1f4e7bbdce Better build automation 2015-05-16 07:16:45 +01:00
Peter Boyle deac65a92d Merge branch 'master' of https://github.com/paboyle/Grid 2015-05-16 06:42:03 +01:00
Peter Boyle 4a0da933f0 Merge branch 'master' of https://github.com/paboyle/Grid 2015-05-16 06:42:03 +01:00
Peter Boyle 1f765e5b59 Merge branch 'master' of https://github.com/paboyle/Grid 2015-05-16 06:42:03 +01:00
Peter Boyle f6e54a7bd4 Moved things around 2015-05-16 06:40:10 +01:00
Peter Boyle e2f6745a0e Moved things around 2015-05-16 06:40:10 +01:00
Peter Boyle 462bafdd2c Moved things around 2015-05-16 06:40:10 +01:00
Peter Boyle 53260e7a39 Typoo xifed 2015-05-16 05:49:32 +01:00
Peter Boyle 39e7ef1243 Typoo xifed 2015-05-16 05:49:32 +01:00
Peter Boyle e9ed288b00 Typoo xifed 2015-05-16 05:49:32 +01:00
Peter Boyle a900790b44 Update Grid_lattice_trace.h 2015-05-16 04:40:28 +01:00
Peter Boyle 9c38a52bad Update Grid_lattice_trace.h 2015-05-16 04:40:28 +01:00
Peter Boyle dda3da45fb Update Grid_lattice_trace.h 2015-05-16 04:40:28 +01:00
Peter Boyle b731bf6976 Pretty syntax 2015-05-16 04:37:26 +01:00
Peter Boyle 1247d7aea8 Pretty syntax 2015-05-16 04:37:26 +01:00
Peter Boyle 2e4ba02443 Pretty syntax 2015-05-16 04:37:26 +01:00
Peter Boyle 5f8b82b90c Optimisation and syntax pretty 2015-05-16 04:36:22 +01:00
Peter Boyle 9f0e990b40 Optimisation and syntax pretty 2015-05-16 04:36:22 +01:00
Peter Boyle a19aa9627d Optimisation and syntax pretty 2015-05-16 04:36:22 +01:00
Peter Boyle 25bfa7e830 more digits 2015-05-16 04:33:40 +01:00
Peter Boyle 56667e9d32 more digits 2015-05-16 04:33:40 +01:00
Peter Boyle aff5254208 more digits 2015-05-16 04:33:40 +01:00
Peter Boyle afda459886 strong inline 2015-05-16 04:33:10 +01:00
Peter Boyle 49f56a25d1 strong inline 2015-05-16 04:33:10 +01:00
Peter Boyle 9e29fb2c6a strong inline 2015-05-16 04:33:10 +01:00
Peter Boyle c43869a83a Extra compile targs 2015-05-15 14:41:59 +01:00
Peter Boyle c2ca396353 Extra compile targs 2015-05-15 14:41:59 +01:00
Peter Boyle bc5ed9acaf Extra compile targs 2015-05-15 14:41:59 +01:00
Peter Boyle 87bc17831d Added su3 matrix benchmark. 2015-05-15 14:41:19 +01:00
Peter Boyle 7a63bdbd72 Added su3 matrix benchmark. 2015-05-15 14:41:19 +01:00
Peter Boyle b4b70702fd Added su3 matrix benchmark. 2015-05-15 14:41:19 +01:00
Peter Boyle 76cbcff2f1 Log the bug report code into the git repo. 2015-05-15 12:39:53 +01:00
Peter Boyle 516aac6518 Log the bug report code into the git repo. 2015-05-15 12:39:53 +01:00
Peter Boyle 8e1b5dda4b Log the bug report code into the git repo. 2015-05-15 12:39:53 +01:00
Peter Boyle f43589369a Compile options tweak 2015-05-15 12:33:18 +01:00
Peter Boyle 675fd1a065 Compile options tweak 2015-05-15 12:33:18 +01:00
Peter Boyle 9386522543 Compile options tweak 2015-05-15 12:33:18 +01:00
Peter Boyle c99922b591 Out of source compile now working 2015-05-15 12:21:40 +01:00
Peter Boyle a98f3e0f5e Out of source compile now working 2015-05-15 12:21:40 +01:00
Peter Boyle 331f832c34 Out of source compile now working 2015-05-15 12:21:40 +01:00
Peter Boyle de0c199604 Convenience multi-compiler build with out of source compile 2015-05-15 12:21:10 +01:00
Peter Boyle f92fda0cfd Convenience multi-compiler build with out of source compile 2015-05-15 12:21:10 +01:00
Peter Boyle 022c12b8e4 Convenience multi-compiler build with out of source compile 2015-05-15 12:21:10 +01:00
Peter Boyle 3ed30169ea clang++ 3.4/5/7 compile happy for AVX and SSE
icpc compiles happy on MacOSX both with -xCOMMON-AV512 and native AVX

gcc-5 does not compile happy; can work around by renaming lattice peek/poke/transpose/trace templates
relative to tensor ones, but gcc goes into a recursive template instantiation due to
matching error. I think this is a gcc bug and have filed a report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66153
2015-05-15 11:52:11 +01:00
Peter Boyle 100323ab4d clang++ 3.4/5/7 compile happy for AVX and SSE
icpc compiles happy on MacOSX both with -xCOMMON-AV512 and native AVX

gcc-5 does not compile happy; can work around by renaming lattice peek/poke/transpose/trace templates
relative to tensor ones, but gcc goes into a recursive template instantiation due to
matching error. I think this is a gcc bug and have filed a report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66153
2015-05-15 11:52:11 +01:00
Peter Boyle 0b4d3544b9 clang++ 3.4/5/7 compile happy for AVX and SSE
icpc compiles happy on MacOSX both with -xCOMMON-AV512 and native AVX

gcc-5 does not compile happy; can work around by renaming lattice peek/poke/transpose/trace templates
relative to tensor ones, but gcc goes into a recursive template instantiation due to
matching error. I think this is a gcc bug and have filed a report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66153
2015-05-15 11:52:11 +01:00
Peter Boyle bc3889ffa1 Remove debug masking 2015-05-15 11:51:15 +01:00
Peter Boyle 6965a136a0 Remove debug masking 2015-05-15 11:51:15 +01:00
Peter Boyle ed8e3b676f Remove debug masking 2015-05-15 11:51:15 +01:00
Peter Boyle 180b06d7e3 GCC and ICPC complained on more careful typeing 2015-05-15 11:50:44 +01:00
Peter Boyle 254dee6ac7 GCC and ICPC complained on more careful typeing 2015-05-15 11:50:44 +01:00
Peter Boyle 882fa27ff5 GCC and ICPC complained on more careful typeing 2015-05-15 11:50:44 +01:00
Peter Boyle 3bd376853c Move platform dependent out to Grid_simd.h 2015-05-15 11:50:00 +01:00
Peter Boyle 264850bc16 Move platform dependent out to Grid_simd.h 2015-05-15 11:50:00 +01:00
Peter Boyle 3346b68ccd Move platform dependent out to Grid_simd.h 2015-05-15 11:50:00 +01:00
Peter Boyle 6bba16ccf7 ngo store 2015-05-15 11:49:39 +01:00
Peter Boyle 9a120cf5ec ngo store 2015-05-15 11:49:39 +01:00
Peter Boyle 0afb64bf24 ngo store 2015-05-15 11:49:39 +01:00
Peter Boyle e8efa6320e Parallel for replace 2015-05-15 11:48:04 +01:00
Peter Boyle 8d77d758c3 Parallel for replace 2015-05-15 11:48:04 +01:00
Peter Boyle 537f47404b Parallel for replace 2015-05-15 11:48:04 +01:00
Peter Boyle e3b61bdfce Forces inlining upon icpc 2015-05-15 11:43:49 +01:00
Peter Boyle 0e7945fe54 Forces inlining upon icpc 2015-05-15 11:43:49 +01:00
Peter Boyle a0d041b522 Forces inlining upon icpc 2015-05-15 11:43:49 +01:00
Peter Boyle 86b9d24b62 Force inlining upon icpc 2015-05-15 11:43:20 +01:00
Peter Boyle bd721ce1c8 Force inlining upon icpc 2015-05-15 11:43:20 +01:00
Peter Boyle 8c57bcaece Force inlining upon icpc 2015-05-15 11:43:20 +01:00
Peter Boyle 3e3a8dc0c0 More elegant enable_if 2015-05-15 11:42:51 +01:00
Peter Boyle a852d13f03 More elegant enable_if 2015-05-15 11:42:51 +01:00
Peter Boyle 519eab8ff0 More elegant enable_if 2015-05-15 11:42:51 +01:00
Peter Boyle 4350c1e0f7 More elegant to do boolean logic inside the enable_if construct
Should have done that from the beginning and should move this into
a global edit
2015-05-15 11:42:03 +01:00
Peter Boyle a26fdab719 More elegant to do boolean logic inside the enable_if construct
Should have done that from the beginning and should move this into
a global edit
2015-05-15 11:42:03 +01:00
Peter Boyle f986e123d2 More elegant to do boolean logic inside the enable_if construct
Should have done that from the beginning and should move this into
a global edit
2015-05-15 11:42:03 +01:00
Peter Boyle 8c59605e05 Force inlining on ICPC because inline apparently is not enoguh 2015-05-15 11:41:31 +01:00
Peter Boyle af6e8f7829 Force inlining on ICPC because inline apparently is not enoguh 2015-05-15 11:41:31 +01:00
Peter Boyle 70638bf1f1 Force inlining on ICPC because inline apparently is not enoguh 2015-05-15 11:41:31 +01:00
Peter Boyle e59b6a805c strong_inline forces ICPC to do it. 2015-05-15 11:40:59 +01:00
Peter Boyle cbfa4097b4 strong_inline forces ICPC to do it. 2015-05-15 11:40:59 +01:00
Peter Boyle 54d8972753 strong_inline forces ICPC to do it. 2015-05-15 11:40:59 +01:00
Peter Boyle 5d8303e94d Force strong_inline to force ipcc's hand 2015-05-15 11:40:31 +01:00
Peter Boyle 8c40dd9c4f Force strong_inline to force ipcc's hand 2015-05-15 11:40:31 +01:00
Peter Boyle 5159b26261 Force strong_inline to force ipcc's hand 2015-05-15 11:40:31 +01:00
Peter Boyle 1339a7f8b0 Switch to strong_inline macro to force icpc's hand 2015-05-15 11:40:00 +01:00
Peter Boyle b38bf82d48 Switch to strong_inline macro to force icpc's hand 2015-05-15 11:40:00 +01:00
Peter Boyle c33ec96fc8 Switch to strong_inline macro to force icpc's hand 2015-05-15 11:40:00 +01:00
Peter Boyle e58cc72fe5 Promote to strong inline to force ICPC's hand. Annoying. 2015-05-15 11:39:25 +01:00
Peter Boyle adc4f86020 Promote to strong inline to force ICPC's hand. Annoying. 2015-05-15 11:39:25 +01:00
Peter Boyle 577325cb7a Promote to strong inline to force ICPC's hand. Annoying. 2015-05-15 11:39:25 +01:00
Peter Boyle 074430af0d Formatting change 2015-05-15 11:38:54 +01:00
Peter Boyle 5b46992a15 Formatting change 2015-05-15 11:38:54 +01:00
Peter Boyle 46c4379592 Formatting change 2015-05-15 11:38:54 +01:00
Peter Boyle 873110d482 Filed bug report Bug 66153 on GCC-5.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66153
2015-05-15 11:38:04 +01:00
Peter Boyle e7d25647e6 Filed bug report Bug 66153 on GCC-5.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66153
2015-05-15 11:38:04 +01:00
Peter Boyle f761ab0f50 Filed bug report Bug 66153 on GCC-5.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66153
2015-05-15 11:38:04 +01:00
Peter Boyle 4cee0e8653 Silly formatting change 2015-05-15 11:37:07 +01:00
Peter Boyle c28551f40f Silly formatting change 2015-05-15 11:37:07 +01:00
Peter Boyle 2a28cfb3a3 Silly formatting change 2015-05-15 11:37:07 +01:00
Peter Boyle 6b2a786779 gcc doesn't like collapse(2) for some reason I can't figure 2015-05-15 11:36:22 +01:00
Peter Boyle 6c7eb60d6f gcc doesn't like collapse(2) for some reason I can't figure 2015-05-15 11:36:22 +01:00
Peter Boyle b00622302b gcc doesn't like collapse(2) for some reason I can't figure 2015-05-15 11:36:22 +01:00
Peter Boyle cf27f22dc0 ICPC and GCC5 fixes 2015-05-15 11:35:02 +01:00
Peter Boyle 051b23fe10 ICPC and GCC5 fixes 2015-05-15 11:35:02 +01:00
Peter Boyle 3057b2762a ICPC and GCC5 fixes 2015-05-15 11:35:02 +01:00
Peter Boyle 40192841a4 Using boolean logic inside enable_if is more elegant 2015-05-15 11:32:45 +01:00
Peter Boyle 4e462209c7 Using boolean logic inside enable_if is more elegant 2015-05-15 11:32:45 +01:00
Peter Boyle 151a6f4e14 Using boolean logic inside enable_if is more elegant 2015-05-15 11:32:45 +01:00
Peter Boyle 1771f97551 Key of mm_malloc.h 2015-05-15 11:32:11 +01:00
Peter Boyle 8d1b26dd4b Key of mm_malloc.h 2015-05-15 11:32:11 +01:00
Peter Boyle a36c974f26 Key of mm_malloc.h 2015-05-15 11:32:11 +01:00
Peter Boyle 2eaf73e8b3 strong inline required to force icpc 2015-05-15 11:31:41 +01:00
Peter Boyle cc6218a692 strong inline required to force icpc 2015-05-15 11:31:41 +01:00
Peter Boyle c0977dcfaa strong inline required to force icpc 2015-05-15 11:31:41 +01:00
Peter Boyle 43bdbb5080 Linear op added 2015-05-13 11:25:34 +01:00
Peter Boyle 5166888c0a Linear op added 2015-05-13 11:25:34 +01:00
Peter Boyle f1255197c2 Linear op added 2015-05-13 11:25:34 +01:00
Peter Boyle 7f3ae64a31 OMP dslash working 2015-05-13 10:59:22 +01:00
Peter Boyle 0097b81778 OMP dslash working 2015-05-13 10:59:22 +01:00
Peter Boyle e179828662 OMP dslash working 2015-05-13 10:59:22 +01:00
Peter Boyle 457cc0d5a3 RNG test 2015-05-13 09:24:30 +01:00
Peter Boyle e6e72d23df RNG test 2015-05-13 09:24:30 +01:00
Peter Boyle 680f4e3636 RNG test 2015-05-13 09:24:30 +01:00
Peter Boyle d388b831b4 cout IO for all types 2015-05-13 09:24:10 +01:00
Peter Boyle add4495a4a cout IO for all types 2015-05-13 09:24:10 +01:00
Peter Boyle a108d5d3b0 cout IO for all types 2015-05-13 09:24:10 +01:00
Peter Boyle b4a570477c I have made the Cshift work successfully with open mp threading in
every routine. Collapse(2) is now working under clang-omp++.
2015-05-13 00:31:00 +01:00
Peter Boyle 541d52ab97 I have made the Cshift work successfully with open mp threading in
every routine. Collapse(2) is now working under clang-omp++.
2015-05-13 00:31:00 +01:00
Peter Boyle 48f425d31c I have made the Cshift work successfully with open mp threading in
every routine. Collapse(2) is now working under clang-omp++.
2015-05-13 00:31:00 +01:00
Peter Boyle 52174da232 Enhanced SIMD interfacing 2015-05-12 20:41:44 +01:00
Peter Boyle 556befaaaa Enhanced SIMD interfacing 2015-05-12 20:41:44 +01:00
Peter Boyle 6cec662ac5 Enhanced SIMD interfacing 2015-05-12 20:41:44 +01:00
Peter Boyle 65c91eae64 Threading support rework.
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
Peter Boyle c6baa3e657 Threading support rework.
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
Peter Boyle 6103c29ee3 Threading support rework.
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
Peter Boyle 8b765be2b1 Moving some things around for pretty 2015-05-11 19:09:49 +01:00
Peter Boyle 6e6843ac69 Moving some things around for pretty 2015-05-11 19:09:49 +01:00
Peter Boyle b1d2c60d07 Moving some things around for pretty 2015-05-11 19:09:49 +01:00
Peter Boyle a411b48a91 Adding a better controlled threading class, preparing to
force in deterministic reduction.
2015-05-11 18:59:03 +01:00
Peter Boyle c8dc8ff891 Adding a better controlled threading class, preparing to
force in deterministic reduction.
2015-05-11 18:59:03 +01:00
Peter Boyle 22d384b07d Adding a better controlled threading class, preparing to
force in deterministic reduction.
2015-05-11 18:59:03 +01:00
Peter Boyle ebcb87abe1 Got command line args working 2015-05-11 14:36:48 +01:00
Peter Boyle b613ed0bb8 Got command line args working 2015-05-11 14:36:48 +01:00
Peter Boyle f5dcca7b1b Got command line args working 2015-05-11 14:36:48 +01:00
paboyle 1576b7837a CML parse 2015-05-11 12:56:27 +01:00
paboyle 4eb08ac9de CML parse 2015-05-11 12:56:27 +01:00
paboyle 43e71ff28c CML parse 2015-05-11 12:56:27 +01:00
paboyle fa5779537c Command line args and a general clean up 2015-05-11 12:43:10 +01:00
paboyle b42453d1fd Command line args and a general clean up 2015-05-11 12:43:10 +01:00
paboyle 379943abf5 Command line args and a general clean up 2015-05-11 12:43:10 +01:00
paboyle 5548fd6928 Updated to do list 2015-05-11 09:44:50 +01:00
paboyle 06dcbed6b1 Updated to do list 2015-05-11 09:44:50 +01:00
paboyle 9f9796b888 Updated to do list 2015-05-11 09:44:50 +01:00
Peter Boyle 242e447bc5 Lots of changes required to compile for MIC under ICPC 2015-05-10 23:29:21 +01:00
Peter Boyle 2203c6e597 Lots of changes required to compile for MIC under ICPC 2015-05-10 23:29:21 +01:00
Peter Boyle 5555a852be Lots of changes required to compile for MIC under ICPC 2015-05-10 23:29:21 +01:00
Peter Boyle 352bccf6ca Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/qcd/Grid_qcd_wilson_dop.cc
2015-05-10 15:37:47 +01:00
Peter Boyle 4da2c2ea00 Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/qcd/Grid_qcd_wilson_dop.cc
2015-05-10 15:37:47 +01:00
Peter Boyle 48b9692845 Merge branch 'master' of https://github.com/paboyle/Grid
Conflicts:
	lib/qcd/Grid_qcd_wilson_dop.cc
2015-05-10 15:37:47 +01:00
Peter Boyle c946e77143 Expression template hack 2015-05-10 15:35:30 +01:00
Peter Boyle 1ec1b4ee44 Expression template hack 2015-05-10 15:35:30 +01:00
Peter Boyle b802abc83f Expression template hack 2015-05-10 15:35:30 +01:00
Peter Boyle 015fbee772 Expression template engin 2015-05-10 15:34:20 +01:00
Peter Boyle 1ab92563b9 Expression template engin 2015-05-10 15:34:20 +01:00
Peter Boyle 14591c72d6 Expression template engin 2015-05-10 15:34:20 +01:00
Peter Boyle 8215893152 Updated TODO list 2015-05-10 15:32:56 +01:00
Peter Boyle 5a7751d9df Updated TODO list 2015-05-10 15:32:56 +01:00
Peter Boyle 933ccfdc4f Updated TODO list 2015-05-10 15:32:56 +01:00
Peter Boyle 5fcf42cb30 Hack; must bring norm2 into the unary operator list.
ET's are still incomplete.
2015-05-10 15:30:29 +01:00
Peter Boyle 79c51ac51f Hack; must bring norm2 into the unary operator list.
ET's are still incomplete.
2015-05-10 15:30:29 +01:00
Peter Boyle 4e596da589 Hack; must bring norm2 into the unary operator list.
ET's are still incomplete.
2015-05-10 15:30:29 +01:00
Peter Boyle e647cf0459 Default to single node. Move to command line args. 2015-05-10 15:27:38 +01:00
Peter Boyle 7119bce9f3 Default to single node. Move to command line args. 2015-05-10 15:27:38 +01:00
Peter Boyle 41c9785f3b Default to single node. Move to command line args. 2015-05-10 15:27:38 +01:00
Peter Boyle 8919bf9e0a Single node default. Should expose this as command line args, but haven't sorted out
Grid_initialize to handle this. Should put this on the TODO list.
2015-05-10 15:26:06 +01:00
Peter Boyle cd90f55536 Single node default. Should expose this as command line args, but haven't sorted out
Grid_initialize to handle this. Should put this on the TODO list.
2015-05-10 15:26:06 +01:00
Peter Boyle 443efd875e Single node default. Should expose this as command line args, but haven't sorted out
Grid_initialize to handle this. Should put this on the TODO list.
2015-05-10 15:26:06 +01:00
Peter Boyle 133493dc79 Small tweak to enable benchmarking to suppress gauge field bandwidth as a test.
This is a short term hack while I benchmark.
2015-05-10 15:25:23 +01:00
Peter Boyle dc7132af71 Small tweak to enable benchmarking to suppress gauge field bandwidth as a test.
This is a short term hack while I benchmark.
2015-05-10 15:25:23 +01:00
Peter Boyle 02ae26d091 Small tweak to enable benchmarking to suppress gauge field bandwidth as a test.
This is a short term hack while I benchmark.
2015-05-10 15:25:23 +01:00
Peter Boyle 58d32a4d0e Assertion should never hit, but did due to a bug 2015-05-10 15:24:37 +01:00
Peter Boyle 961fbb2718 Assertion should never hit, but did due to a bug 2015-05-10 15:24:37 +01:00
Peter Boyle 2ffd941d67 Assertion should never hit, but did due to a bug 2015-05-10 15:24:37 +01:00
Peter Boyle 6bb17502f9 Moving operator stuff into separate file so that we can switch on/off replacement with
expression templates
2015-05-10 15:23:49 +01:00
Peter Boyle 4a8fd55f52 Moving operator stuff into separate file so that we can switch on/off replacement with
expression templates
2015-05-10 15:23:49 +01:00
Peter Boyle ca554f661b Moving operator stuff into separate file so that we can switch on/off replacement with
expression templates
2015-05-10 15:23:49 +01:00
Peter Boyle 8299bc39ea Fixing breakage in the Comms non compile 2015-05-10 15:23:09 +01:00
Peter Boyle e02cbaa016 Fixing breakage in the Comms non compile 2015-05-10 15:23:09 +01:00
Peter Boyle 29be76f958 Fixing breakage in the Comms non compile 2015-05-10 15:23:09 +01:00
Peter Boyle 7f04b85368 Bringing expression templates for faster vector loops 2015-05-10 15:22:31 +01:00
Peter Boyle 463c31ae09 Bringing expression templates for faster vector loops 2015-05-10 15:22:31 +01:00
Peter Boyle e3acb36de6 Bringing expression templates for faster vector loops 2015-05-10 15:22:31 +01:00
Peter Boyle a115f3b086 ET ready benchmark with bytes counted assuming loop interchange 2015-05-10 15:18:04 +01:00
Peter Boyle 3657f2303d ET ready benchmark with bytes counted assuming loop interchange 2015-05-10 15:18:04 +01:00
Peter Boyle b2e0f72a7e ET ready benchmark with bytes counted assuming loop interchange 2015-05-10 15:18:04 +01:00
Peter Boyle 27c2d13968 Updated todo list 2015-05-10 15:13:50 +01:00
Peter Boyle 9ed1fb45e1 Updated todo list 2015-05-10 15:13:50 +01:00
Peter Boyle ebed239e49 Updated todo list 2015-05-10 15:13:50 +01:00
Peter Boyle 5415180676 Wilson perf improvements with Gauge prefetching 2015-05-06 06:37:21 +01:00
Peter Boyle 52403d587c Wilson perf improvements with Gauge prefetching 2015-05-06 06:37:21 +01:00
Peter Boyle 55ccb8ccf4 Wilson perf improvements with Gauge prefetching 2015-05-06 06:37:21 +01:00
Peter Boyle 7b0dd6c5d6 Cleaned up for Linux 2015-05-05 22:09:22 +01:00
Peter Boyle cdd5cdeda2 Cleaned up for Linux 2015-05-05 22:09:22 +01:00
Peter Boyle 35d949cc17 Cleaned up for Linux 2015-05-05 22:09:22 +01:00
Peter Boyle cb4b82b09f streaming store cases 2015-05-05 18:14:09 +01:00
Peter Boyle b9d16a7191 streaming store cases 2015-05-05 18:14:09 +01:00
Peter Boyle cd990ba13d Streaming store option 2015-05-05 18:13:06 +01:00
Peter Boyle 07d57b6d55 Streaming store option 2015-05-05 18:13:06 +01:00
Peter Boyle 249165d1b2 Added streaming stores 2015-05-05 18:09:28 +01:00
Peter Boyle 5ebc7a1756 Added streaming stores 2015-05-05 18:09:28 +01:00
Peter Boyle b720222d98 Updated bandwidth test 2015-05-05 18:08:53 +01:00
Peter Boyle bf60764e4b Updated bandwidth test 2015-05-05 18:08:53 +01:00
Peter Boyle 0e8415de1b Added a makefile 2015-05-05 17:56:42 +01:00
Peter Boyle 890b13dd5b Added a makefile 2015-05-05 17:56:42 +01:00
Peter Boyle 2b46ad38e2 Back to vector for now; cost of init loop is clear in the a*x + y
loop in memory benchmark and must move to better container class.
2015-05-03 09:48:13 +01:00
Peter Boyle aeda7b923d Back to vector for now; cost of init loop is clear in the a*x + y
loop in memory benchmark and must move to better container class.
2015-05-03 09:48:13 +01:00
Peter Boyle 9d93d1e6d4 Comms and memory benchmarks added 2015-05-03 09:44:47 +01:00
Peter Boyle 193860dbc8 Comms and memory benchmarks added 2015-05-03 09:44:47 +01:00
Peter Boyle 253362f978 Added a comms benchmark 2015-05-02 23:51:43 +01:00
Peter Boyle 99a1ff423d Added a comms benchmark 2015-05-02 23:51:43 +01:00
Peter Boyle ea52562527 Added a comms benchmark 2015-05-02 23:42:30 +01:00
Peter Boyle f663be2a6c Added a comms benchmark 2015-05-02 23:42:30 +01:00
Peter Boyle 6a39089a43 Starting a benchmarking sub dir 2015-05-02 17:52:36 +01:00
Peter Boyle 4a1d4f1b3c Starting a benchmarking sub dir 2015-05-02 17:52:36 +01:00
Peter Boyle bdf18941a2 Improving the byte swap support for portability 2015-05-01 10:57:33 +01:00
Peter Boyle 31fd146cc0 Improving the byte swap support for portability 2015-05-01 10:57:33 +01:00
Peter Boyle d904e2b9ac Merge branch 'master' of https://github.com/paboyle/Grid 2015-04-30 16:40:13 +01:00
Peter Boyle c770f96be7 Merge branch 'master' of https://github.com/paboyle/Grid 2015-04-30 16:40:13 +01:00
Peter Boyle c0ead94791 Integrated Lebesgue code and been playing with alternate implementations of the wilson dop without
any particular success in increasing the performance.
2015-04-30 16:39:06 +01:00
Peter Boyle a98c01c86a Integrated Lebesgue code and been playing with alternate implementations of the wilson dop without
any particular success in increasing the performance.
2015-04-30 16:39:06 +01:00
Peter Boyle 7ac997bd58 Merge pull request #1 from mspraggs/patch-1
Added <map> include to GridNerscIO.h
2015-04-30 09:46:48 +01:00
Peter Boyle d5b1bfb4bb Merge pull request #1 from mspraggs/patch-1
Added <map> include to GridNerscIO.h
2015-04-30 09:46:48 +01:00
mspraggs 24fc71b2e9 Added <map> include to GridNerscIO.h
Adding this allows clang to compile Grid to completion.
2015-04-29 23:44:03 +01:00
mspraggs 6f05404cb8 Added <map> include to GridNerscIO.h
Adding this allows clang to compile Grid to completion.
2015-04-29 23:44:03 +01:00
Peter Boyle d8ffa09e3b Benchmark wilson dhop now; 14.6GF on one core, not as fast as SU(3)xSU(3) [23GF] but still not too shabby.
Disassembling output shows ugly sequences in the permute sector. Could comparatively benchmark with and without
the if-else structure to see how much I'm losing.

Drops to 9GF as it falls out of cache. Moving to Lebesgue ordering should help there. Substantive progress.
2015-04-29 06:50:18 +01:00
Peter Boyle b7090ebba4 Benchmark wilson dhop now; 14.6GF on one core, not as fast as SU(3)xSU(3) [23GF] but still not too shabby.
Disassembling output shows ugly sequences in the permute sector. Could comparatively benchmark with and without
the if-else structure to see how much I'm losing.

Drops to 9GF as it falls out of cache. Moving to Lebesgue ordering should help there. Substantive progress.
2015-04-29 06:50:18 +01:00
Peter Boyle dcc23faa4a Fixed the stencil sector and Wilson now agrees between stencil based implementation
and the cshift based implementation. Managed to reduce the volume of code in this
sector a little, but consolidation would be good, perhaps taking common
logic out into simple helper functions
2015-04-29 06:23:56 +01:00
Peter Boyle c72db6c6f6 Fixed the stencil sector and Wilson now agrees between stencil based implementation
and the cshift based implementation. Managed to reduce the volume of code in this
sector a little, but consolidation would be good, perhaps taking common
logic out into simple helper functions
2015-04-29 06:23:56 +01:00
Peter Boyle b0485894b3 Shaken out stencil to the point where I think wilson dslash is correct.
Need to audit code carefully, consolidate between stencil and cshift,
and then benchmark and optimise.
2015-04-28 08:11:59 +01:00
Peter Boyle 25d523c0f4 Shaken out stencil to the point where I think wilson dslash is correct.
Need to audit code carefully, consolidate between stencil and cshift,
and then benchmark and optimise.
2015-04-28 08:11:59 +01:00
Peter Boyle 0b7d389258 Reworking CSHIFT and Stencil. Implementing Wilson and discovered rework is required 2015-04-27 13:45:07 +01:00
Peter Boyle f159495a9d Reworking CSHIFT and Stencil. Implementing Wilson and discovered rework is required 2015-04-27 13:45:07 +01:00
Peter Boyle 35cfef2129 Big updates with progress towards wilson matrix 2015-04-26 15:51:09 +01:00
Peter Boyle 94f728bee4 Big updates with progress towards wilson matrix 2015-04-26 15:51:09 +01:00
Peter Boyle c678f2d255 Starting the implementation of wilson; incomplete and committing non-functional code which
is not yet included from elsewhere or linked to the build system.
2015-04-25 14:33:02 +01:00
Peter Boyle 51f0da7b93 Starting the implementation of wilson; incomplete and committing non-functional code which
is not yet included from elsewhere or linked to the build system.
2015-04-25 14:33:02 +01:00
Peter Boyle d5fd34b6e8 Update to TODO list 2015-04-25 13:04:26 +01:00
Peter Boyle 9dacdc947d Update to TODO list 2015-04-25 13:04:26 +01:00
Peter Boyle 2d8cf9e456 Added two spinor functionality required to support the Wilson hopping term. 2015-04-25 12:54:06 +01:00
Peter Boyle c5fa18eb20 Added two spinor functionality required to support the Wilson hopping term. 2015-04-25 12:54:06 +01:00
Peter Boyle dc970c6442 Dirac done ; remove from TODO 2015-04-24 22:56:37 +01:00
Peter Boyle 8b4073d84c Dirac done ; remove from TODO 2015-04-24 22:56:37 +01:00
Peter Boyle fc32450360 Improved the gamma quite a bit.
Serial rng's which are set on node zero and broadcaste
2015-04-24 20:21:40 +01:00
Peter Boyle 9ec3529864 Improved the gamma quite a bit.
Serial rng's which are set on node zero and broadcaste
2015-04-24 20:21:40 +01:00
Peter Boyle 2a67214f9d static names and enum list 2015-04-24 19:12:14 +01:00
Peter Boyle 42eac283e2 static names and enum list 2015-04-24 19:12:14 +01:00
Peter Boyle 71d5927a66 Vectors now too and right multiple of matrix with gamma 2015-04-24 19:08:29 +01:00
Peter Boyle 38598190c3 Vectors now too and right multiple of matrix with gamma 2015-04-24 19:08:29 +01:00
Peter Boyle f2ac20e7ab Removed summation 2015-04-24 18:42:44 +01:00
Peter Boyle 2e275e1e65 Removed summation 2015-04-24 18:42:44 +01:00
Peter Boyle 750dd5f5fd Cleared the code out from Grid_summation to lattice/Grid_lattice_transfer.h 2015-04-24 18:41:34 +01:00
Peter Boyle 80463ecaea Cleared the code out from Grid_summation to lattice/Grid_lattice_transfer.h 2015-04-24 18:41:34 +01:00
Peter Boyle 74432432b6 Moved code from summation into transfer and reduction 2015-04-24 18:40:44 +01:00
Peter Boyle 128ad0999f Moved code from summation into transfer and reduction 2015-04-24 18:40:44 +01:00
Peter Boyle b8eef54fa7 First implementation of Dirac matrices as a Gamma class. 2015-04-24 18:20:03 +01:00
Peter Boyle d707c4e0a3 First implementation of Dirac matrices as a Gamma class. 2015-04-24 18:20:03 +01:00
Peter Boyle e2e3ea5742 Reorganised the TODO. Really getting somewhere 2015-04-23 20:42:30 +01:00
Peter Boyle b9939e3974 Reorganised the TODO. Really getting somewhere 2015-04-23 20:42:30 +01:00
Peter Boyle 4b4dcc4c13 Rename Grid_QCD 2015-04-23 20:42:09 +01:00
Peter Boyle 3083d2e908 Rename Grid_QCD 2015-04-23 20:42:09 +01:00
Peter Boyle afe6c4f64f move 2015-04-23 20:41:22 +01:00
Peter Boyle 898f64cdd7 move 2015-04-23 20:41:22 +01:00
Peter Boyle 62e8d2d127 Slice summation working. May move this into lattice/Grid_lattice_reduction however 2015-04-23 15:13:00 +01:00
Peter Boyle 52a6ba9767 Slice summation working. May move this into lattice/Grid_lattice_reduction however 2015-04-23 15:13:00 +01:00
Peter Boyle b7416d79e3 Begginings of slice summation and subblocking 2015-04-23 11:04:59 +01:00
Peter Boyle 4d2198ea56 Begginings of slice summation and subblocking 2015-04-23 11:04:59 +01:00
Peter Boyle 2f8431ab03 Consolidate index to coor in a single routine 2015-04-23 11:04:19 +01:00
Peter Boyle 7007d6a176 Consolidate index to coor in a single routine 2015-04-23 11:04:19 +01:00
Peter Boyle a9e574dd27 Snippets from Guido to optimise Reduce 2015-04-23 08:31:40 +01:00
Peter Boyle a37a9789c9 Snippets from Guido to optimise Reduce 2015-04-23 08:31:40 +01:00
Peter Boyle 73c0db82d5 Better description of Intel's many ISA targets 2015-04-23 08:02:51 +01:00
Peter Boyle 5c8858f31b Better description of Intel's many ISA targets 2015-04-23 08:02:51 +01:00
Peter Boyle eb58297a43 Fixing endian on linux I hope 2015-04-23 07:51:15 +01:00
Peter Boyle 47292de769 Fixing endian on linux I hope 2015-04-23 07:51:15 +01:00
Peter Boyle 1851327d19 Got the NERSC IO working and fixed a bug in cshift. 2015-04-22 22:46:48 +01:00
Peter Boyle b32c14b433 Got the NERSC IO working and fixed a bug in cshift. 2015-04-22 22:46:48 +01:00
Peter Boyle a5b0c492d7 Rework of RNG to use C++11 random. Should work correctly maintaining parallel RNG across
a machine. If a "fixedSeed" is used, randoms should be reproducible across different machine
decomposition since the generators are physically indexed and assigned in lexico ordering.
2015-04-19 14:55:58 +01:00
Peter Boyle 42f167ea37 Rework of RNG to use C++11 random. Should work correctly maintaining parallel RNG across
a machine. If a "fixedSeed" is used, randoms should be reproducible across different machine
decomposition since the generators are physically indexed and assigned in lexico ordering.
2015-04-19 14:55:58 +01:00
Peter Boyle 650410cb2f Update to task list 2015-04-19 14:55:16 +01:00
Peter Boyle f6ab726cef Update to task list 2015-04-19 14:55:16 +01:00
Peter Boyle f64d39ab57 Split all OMP directives into lattice subdir for easy maintainance of
parallelism and future OMP 4.0 offload.
2015-04-18 22:17:01 +01:00
Peter Boyle 5483ed641e Split all OMP directives into lattice subdir for easy maintainance of
parallelism and future OMP 4.0 offload.
2015-04-18 22:17:01 +01:00
Peter Boyle 4e1a3aee82 Update 2015-04-18 22:16:31 +01:00
Peter Boyle d929f88421 Update 2015-04-18 22:16:31 +01:00
Peter Boyle 1556c2ba3f Finishing the reorg 2015-04-18 21:24:10 +01:00
Peter Boyle 6bd11d920a Finishing the reorg 2015-04-18 21:24:10 +01:00
Peter Boyle 62fec04419 Reorganisation 2015-04-18 21:23:32 +01:00
Peter Boyle 8ddfa7e6b0 Reorganisation 2015-04-18 21:23:32 +01:00
Peter Boyle aee6669d0b Build reorg with which I am a bit happier 2015-04-18 21:22:50 +01:00
Peter Boyle e5a25dfcb1 Build reorg with which I am a bit happier 2015-04-18 21:22:50 +01:00
Peter Boyle a17ce0695b Clean up 2015-04-18 20:52:40 +01:00
Peter Boyle c94b7cc43c Clean up 2015-04-18 20:52:40 +01:00
Peter Boyle e6ec92d0e4 More files, shorter each. 2015-04-18 20:45:00 +01:00
Peter Boyle 25a8266638 More files, shorter each. 2015-04-18 20:45:00 +01:00
Peter Boyle d964d01d6a Shrinking and organising the files 2015-04-18 20:44:19 +01:00
Peter Boyle 6eae2c1083 Shrinking and organising the files 2015-04-18 20:44:19 +01:00
Peter Boyle 0fce523792 Split up into multiple files 2015-04-18 18:54:30 +01:00
Peter Boyle 354347ce91 Split up into multiple files 2015-04-18 18:54:30 +01:00
Peter Boyle 520af214af splitting into smaller, multiple files for readability and easy find. 2015-04-18 18:47:43 +01:00
Peter Boyle 2eb5ab26bf splitting into smaller, multiple files for readability and easy find. 2015-04-18 18:47:43 +01:00
Peter Boyle 62ee8e1cb3 Cleanup 2015-04-18 18:37:56 +01:00
Peter Boyle af72ade26a Cleanup 2015-04-18 18:37:56 +01:00
Peter Boyle 3931ad65c8 Reorg 2015-04-18 18:37:22 +01:00
Peter Boyle e7661d3b12 Reorg 2015-04-18 18:37:22 +01:00
Peter Boyle 8195d302dc Reorganise to keep files smaller 2015-04-18 18:36:48 +01:00
Peter Boyle cffad66894 Reorganise to keep files smaller 2015-04-18 18:36:48 +01:00
Peter Boyle f7d80aac7f Rename 2015-04-18 17:10:45 +01:00
Peter Boyle df9056eb4b Rename 2015-04-18 17:10:45 +01:00
Peter Boyle 08f20da103 Clean up caps. 2015-04-18 17:09:48 +01:00
Peter Boyle 2ee9322a8f Clean up caps. 2015-04-18 17:09:48 +01:00
Peter Boyle 2c9e5aa054 Clean up capitalisation 2015-04-18 17:09:24 +01:00
Peter Boyle b0ce9e3934 Clean up capitalisation 2015-04-18 17:09:24 +01:00
Peter Boyle ac181abc95 Rename 2015-04-18 17:07:37 +01:00
Peter Boyle 5e9a82b72b Rename 2015-04-18 17:07:37 +01:00
Peter Boyle 18a885d195 Renaming 2015-04-18 17:07:09 +01:00
Peter Boyle b5356935e9 Renaming 2015-04-18 17:07:09 +01:00
Peter Boyle 1674f899e0 Cleaing up 2015-04-18 16:42:47 +01:00
Peter Boyle eb0925d702 Cleaing up 2015-04-18 16:42:47 +01:00
Peter Boyle f678be5f94 Shaken out the peekIndex support.
Hardwire constants "SpinIndex, ColourIndex" and LorentzIndex in Grid_QCD.h
2015-04-18 16:17:41 +01:00
Peter Boyle b47d33c4f1 Shaken out the peekIndex support.
Hardwire constants "SpinIndex, ColourIndex" and LorentzIndex in Grid_QCD.h
2015-04-18 16:17:41 +01:00
Peter Boyle 388b735fd0 Build reorg 2015-04-18 14:56:05 +01:00
Peter Boyle 26148c3323 Build reorg 2015-04-18 14:56:05 +01:00
Peter Boyle 3e3df092bb Reorg of build structure 2015-04-18 14:55:00 +01:00
Peter Boyle c656164015 Reorg of build structure 2015-04-18 14:55:00 +01:00
Peter Boyle e25f10566c peekIndex update 2015-04-18 14:36:01 +01:00
Peter Boyle 57586c8e05 peekIndex update 2015-04-18 14:36:01 +01:00
Peter Boyle 5d1b866e7a typo 2015-04-18 12:40:55 +01:00
Peter Boyle d6c02e72d6 typo 2015-04-18 12:40:55 +01:00
paboyle 56b3631187 Update README.md 2015-04-18 12:21:37 +01:00
paboyle d4aa37112d Update README.md 2015-04-18 12:21:37 +01:00
Peter Boyle 1408a3c0f9 Got traceIndex, transposeIndex fully working.
Need to think about peekIndex interface and () based indexing.
2015-04-18 12:17:13 +01:00
Peter Boyle 81367eaa12 Got traceIndex, transposeIndex fully working.
Need to think about peekIndex interface and () based indexing.
2015-04-18 12:17:13 +01:00
Peter Boyle 3b9110b5db SSE flag changed 2015-04-16 17:22:52 +01:00
Peter Boyle 23df6cf18e SSE flag changed 2015-04-16 17:22:52 +01:00
Peter Boyle 6b04dd4a5d Better code 2015-04-16 15:20:19 +01:00
Peter Boyle 3e41cfecf1 Better code 2015-04-16 15:20:19 +01:00
Peter Boyle 1972eea128 spin trace type work 2015-04-16 14:48:21 +01:00
Peter Boyle 5aac6dc85b spin trace type work 2015-04-16 14:48:21 +01:00
Peter Boyle 933c54d9c4 Improving the trace support to support any index tracing and simplifying
implmentation in some ways
2015-04-16 14:47:28 +01:00
Peter Boyle 6d71ff98e5 Improving the trace support to support any index tracing and simplifying
implmentation in some ways
2015-04-16 14:47:28 +01:00
Peter Boyle fddb904b4c Typo in capital 2015-04-15 12:03:38 +01:00
Peter Boyle 3cb04f555d Typo in capital 2015-04-15 12:03:38 +01:00
Peter Boyle cab7ef9bc2 Some bug fixes 2015-04-14 23:20:16 +01:00
Peter Boyle b59553bd65 Some bug fixes 2015-04-14 23:20:16 +01:00
Peter Boyle ab9a764bb1 Reduce now going through MPI. 2015-04-14 22:40:40 +01:00
Peter Boyle 94f9e781f4 Reduce now going through MPI. 2015-04-14 22:40:40 +01:00
Peter Boyle f1876b7e95 Modified 2015-04-14 20:25:51 +01:00
Peter Boyle 2ae42c40a8 Modified 2015-04-14 20:25:51 +01:00
Peter Boyle 2d54ef2a52 Stencil code pretty much shaken out.
Beginning of inner product and norm2.
2015-04-14 20:22:04 +01:00
Peter Boyle 1eee664092 Stencil code pretty much shaken out.
Beginning of inner product and norm2.
2015-04-14 20:22:04 +01:00
Peter Boyle 977c7721d5 where switched back on 2015-04-10 05:54:02 +02:00
Peter Boyle eb2dd37e3c where switched back on 2015-04-10 05:54:02 +02:00
Peter Boyle 5267658748 Fixing the comms=none compile 2015-04-10 05:53:09 +02:00
Peter Boyle 69d578478a Fixing the comms=none compile 2015-04-10 05:53:09 +02:00
Peter Boyle 6e90038bf6 Fixing nocompile 2015-04-10 05:24:01 +02:00
Peter Boyle d9a454bc9f Fixing nocompile 2015-04-10 05:24:01 +02:00
Peter Boyle 993419d9fb MPI exposed incorrectly in main 2015-04-10 05:22:36 +02:00
Peter Boyle 516bc0c666 MPI exposed incorrectly in main 2015-04-10 05:22:36 +02:00
Peter Boyle 927c62d8a3 Patch for comms none nocompile 2015-04-10 05:21:48 +02:00
Peter Boyle f373517d2e Patch for comms none nocompile 2015-04-10 05:21:48 +02:00
Peter Boyle 31f4f4f1e1 "where" and integer comparisons logic implemented for conditional
assignment. LatticeCoordinate helper to get global (reduced) coordinate.

Some more work of similar type perhaps needed, but the bulk of the required
structure for masked array assignment is now in place.
2015-04-09 08:06:03 +02:00
Peter Boyle 8f5281563e "where" and integer comparisons logic implemented for conditional
assignment. LatticeCoordinate helper to get global (reduced) coordinate.

Some more work of similar type perhaps needed, but the bulk of the required
structure for masked array assignment is now in place.
2015-04-09 08:06:03 +02:00
Peter Boyle 81d5eabf6c Remove stub files 2015-04-06 11:29:55 +01:00
Peter Boyle 4666acfbb0 Remove stub files 2015-04-06 11:29:55 +01:00
Peter Boyle ce6a3a8ed4 Some popular configure commands 2015-04-06 11:28:00 +01:00
Peter Boyle 4cd678ddb4 Some popular configure commands 2015-04-06 11:28:00 +01:00
Peter Boyle 982274e5a0 Major rework of extract/merge/permute processing debugged and working. 2015-04-06 11:26:24 +01:00
Peter Boyle 48a38ef4fd Major rework of extract/merge/permute processing debugged and working. 2015-04-06 11:26:24 +01:00
Peter Boyle 9e597ac50a Removing older file 2015-04-06 09:27:17 +01:00
Peter Boyle 57cd8d87f5 Removing older file 2015-04-06 09:27:17 +01:00
Peter Boyle 02262b0019 Bringing in LatticeInteger with the idea of implemented predicated
assignment, subsets etc.
c.f the QDP++ "where" syntax
2015-04-06 06:30:48 +01:00
Peter Boyle e06a11ee5e Bringing in LatticeInteger with the idea of implemented predicated
assignment, subsets etc.
c.f the QDP++ "where" syntax
2015-04-06 06:30:48 +01:00
Peter Boyle ad31cd0c23 Clean up but no major changes 2015-04-03 22:54:13 +01:00
Peter Boyle d5eee231e0 Clean up but no major changes 2015-04-03 22:54:13 +01:00
Peter Boyle 15dda435e6 TODO list for preparing this for real use and QDP++ replacement. 2015-04-03 09:28:58 +01:00
Peter Boyle d081715504 TODO list for preparing this for real use and QDP++ replacement. 2015-04-03 09:28:58 +01:00
Peter Boyle 0c1f8b70e9 Reorg 2015-04-03 05:51:05 +01:00
Peter Boyle d066450a24 Reorg 2015-04-03 05:51:05 +01:00
Peter Boyle 18b26a7a70 MPI added 2015-04-03 05:34:51 +01:00
Peter Boyle b47314f726 MPI added 2015-04-03 05:34:51 +01:00
Peter Boyle d29733069f Patch 2015-04-03 05:33:13 +01:00
Peter Boyle 2de009d994 Patch 2015-04-03 05:33:13 +01:00
Peter Boyle 48581e8e8b Merge branch 'master' of https://github.com/paboyle/Grid 2015-04-03 05:32:12 +01:00
Peter Boyle cebf90195c Merge branch 'master' of https://github.com/paboyle/Grid 2015-04-03 05:32:12 +01:00
Peter Boyle 154a8700f4 Removing the Xcode project 2015-04-03 05:30:58 +01:00
Peter Boyle cc21f0a709 Removing the Xcode project 2015-04-03 05:30:58 +01:00
Peter Boyle 0c5c974e6d Renamed the namespace to Grid 2015-04-03 05:29:54 +01:00
Peter Boyle d198fcbc1c Renamed the namespace to Grid 2015-04-03 05:29:54 +01:00
Peter Boyle 06843d4574 Rename some files to make naming consistent 2015-04-03 04:58:03 +01:00
Peter Boyle c86f79c4be Rename some files to make naming consistent 2015-04-03 04:58:03 +01:00
Peter Boyle 7b97e50b7b MPI is now working and passing basic tests. Will start to construct a more sensible test suite shortly
since testing requirements now go beyond what a single Grid_main.cc can do.

Will need a more organised src tree for this and will require substantial reorg of build system.
2015-04-03 04:52:53 +01:00
Peter Boyle 9ba89e64dc MPI is now working and passing basic tests. Will start to construct a more sensible test suite shortly
since testing requirements now go beyond what a single Grid_main.cc can do.

Will need a more organised src tree for this and will require substantial reorg of build system.
2015-04-03 04:52:53 +01:00
azusayamaguchi 19372bf5ca COnfig file 2015-03-29 22:16:13 +01:00
azusayamaguchi 98be168396 COnfig file 2015-03-29 22:16:13 +01:00
Peter Boyle 19b9069453 Merge branch 'master' of https://github.com/paboyle/Grid 2015-03-29 22:05:16 +01:00
Peter Boyle 1a9d4d3655 Merge branch 'master' of https://github.com/paboyle/Grid 2015-03-29 22:05:16 +01:00
Peter Boyle e0af0e658d Commit 2015-03-29 22:04:49 +01:00
Peter Boyle 73aaec2df8 Commit 2015-03-29 22:04:49 +01:00
azusayamaguchi 4f57ccc66c No compile fix 2015-03-29 21:50:20 +01:00
azusayamaguchi d46e6adb8f No compile fix 2015-03-29 21:50:20 +01:00
Peter Boyle 9bea4e25ee Make file and configure 2015-03-29 21:44:22 +01:00
Peter Boyle 7f1af07fb3 Make file and configure 2015-03-29 21:44:22 +01:00
Peter Boyle 5560a84705 Merge branch 'master' of https://github.com/paboyle/Grid 2015-03-29 21:38:53 +01:00
Peter Boyle 8801c59cce Merge branch 'master' of https://github.com/paboyle/Grid 2015-03-29 21:38:53 +01:00
Peter Boyle 196fd203e2 Fixing the Checkerboarding cshift.
Implemented "fake" communications in preparation for the leap to MPI.
2015-03-29 20:35:37 +01:00
Peter Boyle 98f14f1030 Fixing the Checkerboarding cshift.
Implemented "fake" communications in preparation for the leap to MPI.
2015-03-29 20:35:37 +01:00
paboyle 886df6ed1e Update README.md 2015-03-07 07:20:12 +00:00
paboyle af0220a046 Update README.md 2015-03-07 07:20:12 +00:00
paboyle f04b6312c8 Update README 2015-03-07 07:19:01 +00:00
paboyle aef11025aa Update README 2015-03-07 07:19:01 +00:00
paboyle 1e08841fa5 Update INSTALL 2015-03-07 07:09:09 +00:00
paboyle 37a3b39a61 Update INSTALL 2015-03-07 07:09:09 +00:00
paboyle 483cab34ab Update AUTHORS 2015-03-07 07:00:39 +00:00
paboyle dd36baf3ce Update AUTHORS 2015-03-07 07:00:39 +00:00
Peter Boyle 82eeb9d07e Improving 2015-03-04 13:44:33 +00:00
Peter Boyle 613706dca2 Improving 2015-03-04 13:44:33 +00:00
Peter Boyle d9054f67fd remove 2015-03-04 13:43:48 +00:00
Peter Boyle e6e396b3f7 remove 2015-03-04 13:43:48 +00:00
Peter Boyle a8238d0b25 Move to better name 2015-03-04 13:43:19 +00:00
Peter Boyle b234865d77 Move to better name 2015-03-04 13:43:19 +00:00
Azusa Yamaguchi 4e8b9c6928 Changes for MIC 2015-03-04 13:25:23 +00:00
Azusa Yamaguchi 83f0dc19ff Changes for MIC 2015-03-04 13:25:23 +00:00
Peter Boyle 59eff71fc5 Extra files 2015-03-04 12:03:07 +00:00
Peter Boyle 2734107d5c Extra files 2015-03-04 12:03:07 +00:00
Peter Boyle d4a93ec7b4 missing 2015-03-04 11:58:45 +00:00
Peter Boyle b16e70dabc missing 2015-03-04 11:58:45 +00:00
Peter Boyle 96260b60b0 files 2015-03-04 11:57:14 +00:00
Peter Boyle a0b1e3afc9 files 2015-03-04 11:57:14 +00:00
Peter Boyle 63c7eb262e file 2015-03-04 11:55:44 +00:00
Peter Boyle 3ffaedce8e file 2015-03-04 11:55:44 +00:00
Peter Boyle 110ddd1900 install-sh distro 2015-03-04 11:54:46 +00:00
Peter Boyle e6df221165 install-sh distro 2015-03-04 11:54:46 +00:00
Peter Boyle 523abad40f Place them in to avoid forced autoreconf on user 2015-03-04 11:53:59 +00:00
Peter Boyle 3901b7d77a Place them in to avoid forced autoreconf on user 2015-03-04 11:53:59 +00:00
Peter Boyle d9cff588c5 Better openMP for Cshift 2015-03-04 11:50:59 +00:00
Peter Boyle 808066fdf1 Better openMP for Cshift 2015-03-04 11:50:59 +00:00
Peter Boyle 9e200b9d7a Merge branch 'master' of https://github.com/paboyle/Grid 2015-03-04 11:44:18 +00:00
Peter Boyle 3530000ea6 Merge branch 'master' of https://github.com/paboyle/Grid 2015-03-04 11:44:18 +00:00
Peter Boyle 09582509e5 Improving benchmark to include Cshift 2015-03-04 11:44:05 +00:00
Peter Boyle b01fa51db0 Improving benchmark to include Cshift 2015-03-04 11:44:05 +00:00
Azusa Yamaguchi eb15a8dacb AVX2 fix 2015-03-04 11:38:10 +00:00
Azusa Yamaguchi d7f1aa522e AVX2 fix 2015-03-04 11:38:10 +00:00
Peter Boyle 5aefa56e5c Better organisation 2015-03-04 05:34:15 +00:00
Peter Boyle 4ba6d36ad6 Better organisation 2015-03-04 05:34:15 +00:00
Peter Boyle b1cb3f255d Update organisation 2015-03-04 05:33:26 +00:00
Peter Boyle 0021b72da9 Update organisation 2015-03-04 05:33:26 +00:00
Peter Boyle 1a1474b323 Better organisation 2015-03-04 05:31:44 +00:00
Peter Boyle fdb886fe1a Better organisation 2015-03-04 05:31:44 +00:00
Peter Boyle ad1b9b6ccf Cleaning
Merge branch 'master' of https://github.com/paboyle/Grid
2015-03-04 05:12:51 +00:00
Peter Boyle 3916caf7c4 Cleaning
Merge branch 'master' of https://github.com/paboyle/Grid
2015-03-04 05:12:51 +00:00
Peter Boyle b7e2881014 Better organisation 2015-03-04 05:12:19 +00:00
paboyle b0c9282fe9 Delete COPYING 2015-03-04 04:55:04 +00:00
paboyle 3cb295ece1 Delete COPYING 2015-03-04 04:55:04 +00:00
paboyle c802a03e07 Delete INSTALL 2015-03-04 04:54:50 +00:00
paboyle 46b66c933e Delete INSTALL 2015-03-04 04:54:50 +00:00
Peter Boyle c80c881db0 Updating build system 2015-03-04 04:53:40 +00:00
Peter Boyle 74f28edbe5 Build progressing 2015-03-04 04:34:51 +00:00
Peter Boyle 3c5f08a1d6 Build system progressing 2015-03-04 04:13:07 +00:00
Peter Boyle 523e3bd8d5 Autoconf starting 2015-03-04 03:42:59 +00:00
Peter Boyle 8b17cbf9d7 Initial commit of Grid to GitHub 2015-03-04 03:12:19 +00:00
paboyle 8144438023 Initial commit 2015-03-04 02:30:11 +00:00
1213 changed files with 204238 additions and 24879 deletions
+114 -6
View File
@@ -1,8 +1,116 @@
# Exclude directories # Compiled Object files #
_site #########################
.sass-cache *.slo
.jekyll-metadata *.lo
pdf *.o
*.obj
# Exclude backup files # Editor files #
################
*~ *~
*#
*.sublime-*
# Precompiled Headers #
#######################
*.gch
*.pch
# Compiled Dynamic libraries #
##############################
*.so
*.dylib
*.dll
# Fortran module files #
########################
*.mod
# Compiled Static libraries #
#############################
*.lai
*.la
*.a
*.lib
# Executables #
###############
*.exe
*.out
*.app
# http://www.gnu.org/software/automake #
########################################
Makefile.in
Makefile
Config.h
Config.h.in
config.log
config.status
.deps
Make.inc
eigen.inc
Eigen.inc
# http://www.gnu.org/software/autoconf #
########################################
autom4te.cache
aclocal.m4
compile
configure
depcomp
install-sh
missing
stamp-h1
config.sub
config.guess
INSTALL
.dirstamp
ltmain.sh
# Logs and databases #
######################
*.log
*.sql
*.sqlite
# OS generated files #
######################
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
.dirstamp
# build directory #
###################
build*/*
# IDE related files #
#####################
*.xcodeproj/*
build.sh
.vscode
*.code-workspace
# Eigen source #
################
Grid/Eigen
Eigen/*
# libtool macros #
##################
m4/lt*
m4/libtool.m4
# github pages #
################
gh-pages/
# generated sources #
#####################
Grid/qcd/spin/gamma-gen/*.h
Grid/qcd/spin/gamma-gen/*.cc
+61
View File
@@ -0,0 +1,61 @@
language: cpp
cache:
directories:
- clang
matrix:
include:
- os: osx
osx_image: xcode8.3
compiler: clang
env: PREC=single
- os: osx
osx_image: xcode8.3
compiler: clang
env: PREC=double
before_install:
- export GRIDDIR=`pwd`
- if [[ "$TRAVIS_OS_NAME" == "linux" ]] && [[ "$CC" == "clang" ]] && [ ! -e clang/bin ]; then wget $CLANG_LINK; tar -xf `basename $CLANG_LINK`; mkdir clang; mv clang+*/* clang/; fi
- if [[ "$TRAVIS_OS_NAME" == "linux" ]] && [[ "$CC" == "clang" ]]; then export PATH="${GRIDDIR}/clang/bin:${PATH}"; fi
- if [[ "$TRAVIS_OS_NAME" == "linux" ]] && [[ "$CC" == "clang" ]]; then export LD_LIBRARY_PATH="${GRIDDIR}/clang/lib:${LD_LIBRARY_PATH}"; fi
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew update; fi
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install libmpc openssl; fi
install:
- export CWD=`pwd`
- echo $CWD
- export CC=$CC$VERSION
- export CXX=$CXX$VERSION
- echo $PATH
- which autoconf
- autoconf --version
- which automake
- automake --version
- which $CC
- $CC --version
- which $CXX
- $CXX --version
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then export LDFLAGS='-L/usr/local/lib'; fi
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then export EXTRACONF='--with-openssl=/usr/local/opt/openssl'; fi
script:
- ./bootstrap.sh
- mkdir build
- cd build
- mkdir lime
- cd lime
- mkdir build
- cd build
- wget http://usqcd-software.github.io/downloads/c-lime/lime-1.3.2.tar.gz
- tar xf lime-1.3.2.tar.gz
- cd lime-1.3.2
- ./configure --prefix=$CWD/build/lime/install
- make -j4
- make install
- cd $CWD/build
- ../configure --enable-precision=$PREC --enable-simd=SSE4 --enable-comms=none --with-lime=$CWD/build/lime/install ${EXTRACONF}
- make -j4
- ./benchmarks/Benchmark_dwf --threads 1 --debug-signals
- make check
+4
View File
@@ -0,0 +1,4 @@
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: Peter Boyle <peterboyle@MacBook-Pro.local>
Author: paboyle <paboyle@ph.ed.ac.uk>
-571
View File
@@ -1,571 +0,0 @@
## [3.4.8](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.4.8)
### Enhancements
- Improve type readability for larger viewports by bumping up base `font-size`. [#533](https://github.com/mmistakes/minimal-mistakes/issues/533)
- Update Portuguese localized UI text. [#541](https://github.com/mmistakes/minimal-mistakes/pull/541)
- Add `page.title` and via parameter to Twitter share link. [#538](https://github.com/mmistakes/minimal-mistakes/pull/538)
### Bug Fixes
- Fix Last.fm author profile URL. [#540](https://github.com/mmistakes/minimal-mistakes/pull/540)
### Maintenance
- Move Brazilian Portuguese localized text under `pt-BR` key.
## [3.4.7](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.4.7)
### Enhancements
- Add `layout` based and user-defined class names to `<body>` element for added CSS hooks. [#526](https://github.com/mmistakes/minimal-mistakes/pull/526)
- Add simplified Chinese localized UI text. [#532](https://github.com/mmistakes/minimal-mistakes/pull/532)
### Bug Fixes
- Remove duplicate include of `base_path` in category-list.html [#522](https://github.com/mmistakes/minimal-mistakes/pull/522)
## [3.4.6](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.4.6)
### Enhancements
- Add Italian "comments" related localized UI text. [#514](https://github.com/mmistakes/minimal-mistakes/pull/514)
### Bug Fixes
- Disable `compress` HTML layout by default. To enable add `layout: compress` to `_layouts/default.html`.
## [3.4.5](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.4.5)
### Enhancements
- Improve line numbered code block styling when using `{% highlight linenos %}` tag. [#513](https://github.com/mmistakes/minimal-mistakes/issues/513)
- Add English fallback to "Follow" button label. [#496](https://github.com/mmistakes/minimal-mistakes/pull/496)
### Bug Fixes
- Fix Firefox alignment issues with code blocks generated with the `{% highlight %}` tag. [#512](https://github.com/mmistakes/minimal-mistakes/issues/512)
### Maintenance
- Clarified comment for `author.stackoverflow` value used in author sidebar links. [#487](https://github.com/mmistakes/minimal-mistakes/pull/487)
- Add list of localized text strings. [#488](https://github.com/mmistakes/minimal-mistakes/pull/488)
- Add `{% highlight %}` code block examples to demo site.
- Add documentation for using custom sidebar navigation menus. [#476](https://github.com/mmistakes/minimal-mistakes/issues/476)
## [3.4.4](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.4.4)
### Enhancements
- Add French "comments" related localized UI text. [#472](https://github.com/mmistakes/minimal-mistakes/pull/472)
### Bug Fixes
- Exclude `vendor` in Jekyll config file.
- Fix Liquid syntax error for offending parenthesis. [#479](https://github.com/mmistakes/minimal-mistakes/issues/479)
### Maintenance
- Update gems: `colorator` (1.1.0), `forwardable-extended` (2.6.0), `github-pages` (93), `jekyll` (= 3.2.1), `minima` (= 1.0.1).
## [3.4.3](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.4.3)
### Enhancements
- Make ["honeypot" `input`](https://github.com/mmistakes/minimal-mistakes/commit/06a8249a69a37dddda7e2a5bfbe32056c1a9a607) in Staticman comment form less obvious to spam bots
- Add padding to `.highlight` code blocks to better [align `overflow` scrollbar](https://github.com/mmistakes/minimal-mistakes/commit/e4abec0a6f7f8cff72505ca0754615df294fd5b3) to the bottom.
- Add additional image options for Twitter card social sharing meta tags. [#466](https://github.com/mmistakes/minimal-mistakes/pull/466)
- Add structured data markup for Staticman comments. [#458](https://github.com/mmistakes/minimal-mistakes/issues/458)
### Bug Fixes
- Format `og:locale` tag with `_` instead of `-`. [#462](https://github.com/mmistakes/minimal-mistakes/issues/462)
### Maintenance
- Add note to docs about using `url: http://localhost:4000` when working locally.
## [3.4.2](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.4.2)
### Enhancements
- Improve UX of static comment forms. [#448](https://github.com/mmistakes/minimal-mistakes/issues/448)
## [3.4.1](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.4.1)
### Enhancements
- Add `staticman.filename` configuration with UNIX timestamp for sorting data files. example ~> `comment-1470943149`.
### Bug Fixes
- Don't add `<a>` to author name if URL is blank.
## [3.4.0](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.4.0)
### Enhancements
- Support static-based commenting via [Staticman](https://staticman.net/) for sites hosted with GitHub Pages. [#424](https://github.com/mmistakes/minimal-mistakes/issues/424)
## [3.3.7](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.3.7)
### Bug Fixes
- Re-enabled Jekyll plugins in `_config.yml` in case they aren't autoloaded in `Gemfile`. [#417](https://github.com/mmistakes/minimal-mistakes/issues/417)
### Enhancements
- Fallback to `site.github.url` for use in `{{ base_path }}` when `site.url` is `nil`.
- Replace Sass and Autoprefixer `npm` build scripts with [Jekyll's built-in asset support](https://jekyllrb.com/docs/assets/). [#333](https://github.com/mmistakes/minimal-mistakes/issues/333)
### Maintenance
- Document `site.repository` and its role with [`github-metadata`](https://github.com/jekyll/github-metadata) gem.
- Add sample [archive page with content](https://mmistakes.github.io/minimal-mistakes/archive-layout-with-content/) for testing styles on demo site.
## [3.3.6](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.3.6)
### Bug Fixes
- Fix blank `site.teaser` bug. [#412](https://github.com/mmistakes/minimal-mistakes/issues/412)
## [3.3.5](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.3.5)
### Enhancements
- Add English default text `site.locale` strings. [#407](https://github.com/mmistakes/minimal-mistakes/issues/407)
- Add Portuguese localized UI text. [#411](https://github.com/mmistakes/minimal-mistakes/pull/411)
- Add Italian localized UI text. [#409](https://github.com/mmistakes/minimal-mistakes/pull/409)
### Maintenance
- Remove unused Google AdSense variables in `_config.yml`. [#404](https://github.com/mmistakes/minimal-mistakes/issues/404)
- Update `Gemfile` instructions for using `github-pages` vs. native `jekyll` gems.
- Disable `gems:` in `_config.yml` and enable plugins with Bundler instead.
- Add `repository` to `_config.yml` to suppress GitHub Pages error `Liquid Exception: No repo name found.`
## [3.3.4](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.3.4)
### Enhancements
- Add support for configurable feed URL to use a service like FeedBurner instead of linking directly to `feed.xml` in `<head>` and the site footer. [#378](https://github.com/mmistakes/minimal-mistakes/issues/378), [#379](https://github.com/mmistakes/minimal-mistakes/pull/379), [#406](https://github.com/mmistakes/minimal-mistakes/pull/406)
- Add Turkish localized UI text. [#403](https://github.com/mmistakes/minimal-mistakes/pull/403)
### Maintenance
- Update gems: `activesupport` (4.2.7), `ffi` (1.9.14), `github-pages` (88), `jekyll-redirect-from` (0.11.0), `jekyll-watch` (1.5.0).
## [3.3.3](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.3.3)
### Enhancements
- Make footer stick to the bottom of the page.
### Bug Fixes
- Fix `gallery` size bug [#402](https://github.com/mmistakes/minimal-mistakes/issues/402)
### Maintenance
- Set default `lang` to `en`.
## [3.3.2](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.3.2)
### Bug Fixes
- Fix JavaScript that triggers "sticky" sidebar to avoid layout issues on screen sizes < `1024px`. [#396](https://github.com/mmistakes/minimal-mistakes/issues/396)
## [3.3.1](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.3.1)
### Enhancements
- Enable image popup on < 500px wide screens. [#385](https://github.com/mmistakes/minimal-mistakes/issues/385)
- Indicate the relationship between component URLs in a paginated series by applying `rel="prev"` and `rel="next"` to pages that use `site.paginator`. [#253](https://github.com/mmistakes/minimal-mistakes/issues/253)
- Improve link posts in archive listings. [#276](https://github.com/mmistakes/minimal-mistakes/issues/276)
### Maintenance
- Update gems: `github-pages` (86), `ffi` 1.9.13, `jekyll-mentions` 1.1.3, and `rouge` 1.11.1
- Fix note about custom sidebar content appearing below author profile. [#388](https://github.com/mmistakes/minimal-mistakes/issues/388)
## [3.2.13](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.13)
### Enhancements
- Add English default UI text for Canada, Great Britain, and Australia. [#377](https://github.com/mmistakes/minimal-mistakes/issues/377)
- Switch default locale from `en-US` to `en`.
## [3.2.12](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.12)
### Enhancements
- Remove window width "magic number" from sticky sidebar check in `main.js` for improved flexibility. [#375](https://github.com/mmistakes/minimal-mistakes/pull/375)
### Bug Fixes
- Fix author override conditional where a missing `authors.yml` would show broken sidebar content. Defaults to `site.author`. [#376](https://github.com/mmistakes/minimal-mistakes/pull/376)
## [3.2.11](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.11)
### Bug Fixes
- Fix disappearing author sidebar links [#372](https://github.com/mmistakes/minimal-mistakes/issues/372)
### Maintenance
- Update gems: `github-pages` (84), `jekyll-github-metadata` 2.0.2, and `kramdown` 1.11.1
- Update vendor JavaScript: jQuery 1.12.4, Stickyfill.js 1.1.4
- Update Font Awesome 4.6.3
## [3.2.10](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.10)
### Maintenance
- Add `CONTRIBUTING.md`
## [3.2.9](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.9)
### Enhancements
- Add support for [header overlay images](https://mmistakes.github.io/minimal-mistakes/docs/layouts/#header-overlay) for Open Graph images. [#358](https://github.com/mmistakes/minimal-mistakes/pull/358)
### Bug Fixes
- Fix `Person` typo Schema.org type [#358](https://github.com/mmistakes/minimal-mistakes/pull/358)
### Maintenance
- Update `github-pages` gem and dependencies.
- Remove `minutes_read` to avoid awkward reading time wording [#356](https://github.com/mmistakes/minimal-mistakes/issues/356)
## [3.2.8](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.8)
### Bug Fixes
- Remove `cursor: pointer` that appears on white-space surrounding author side list items and links. [#354](https://github.com/mmistakes/minimal-mistakes/pull/354)
### Maintenance
- Add contributing information to `README.md`. [#357](https://github.com/mmistakes/minimal-mistakes/issues/357)
## [3.2.7](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.7)
### Enhancements
- Add French localized UI text. [#346](https://github.com/mmistakes/minimal-mistakes/pull/346)
### Bug Fixes
- Fix branch logic for Yandex and Alexa in `seo.html`. [#348](https://github.com/mmistakes/minimal-mistakes/pull/348)
## [3.2.6](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.6)
### Bug Fixes
- Fix error `Liquid Exception: divided by 0 in _includes/archive-single.html, included in _layouts/single.html` caused by null `words_per_minute` in `_config.yml`. [#345](https://github.com/mmistakes/minimal-mistakes/pull/345)
## [3.2.5](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.5)
### Bug Fixes
- Fix link color in hero overlay to be white.
- Remove underlines from archive item titles.
## [3.2.4](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.4)
### Enhancements
- Improve text alignment of masthead, hero overlay, page footer to be flush left and remove awkward white-space gaps. [#342](https://github.com/mmistakes/minimal-mistakes/issues/342)
- Add Spanish localized UI text. [#338](https://github.com/mmistakes/minimal-mistakes/pull/338)
### Bug Fixes
- Fix alignment of icons in author sidebar [#341](https://github.com/mmistakes/minimal-mistakes/issues/341)
### Maintenance
- Add background color to page footer to set it apart from main content. [#342](https://github.com/mmistakes/minimal-mistakes/issues/342)
- Add terms and privacy policy to theme's demo site. [#343](https://github.com/mmistakes/minimal-mistakes/issues/343)
- Update screenshots found in theme documentation.
## [3.2.3](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.3)
### Enhancements
- Add [Discourse](https://www.discourse.org/) as a commenting provider. [#335](https://github.com/mmistakes/minimal-mistakes/pull/335)
## [3.2.2](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.2)
### Enhancements
- Add support for image captions in Magnific Popup overlays via the [`gallery`](https://mmistakes.github.io/minimal-mistakes/docs/helpers/#gallery) helper. [#334](https://github.com/mmistakes/minimal-mistakes/issues/334)
## [3.2.1](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.1)
### Bug Fixes
- Remove need for "double tapping" masthead menu links on iOS devices. [#315](https://github.com/mmistakes/minimal-mistakes/issues/315)
### Maintenance
- Add `ISSUE_TEMPLATE.md` for improve issue submission process.
## [3.2.0](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.2.0)
### Bug Fixes
- Fix missing category/tag links in post footer due to possible conflict with `site.tags` and `site.categories`. [#329](https://github.com/mmistakes/minimal-mistakes/issues/329#issuecomment-222375568)
## [3.1.8](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.1.8)
### Bug Fixes
- Fix `Liquid Exception: undefined method 'gsub' for nil:NilClass in _layouts/single.html` error when `page.title` is null. `<h1>` element is now conditional if `title: ` is not set for a `page` or collection item. [#312](https://github.com/mmistakes/minimal-mistakes/issues/312)
### Maintenance
- Remove duplicate `fa-twitter` and `fa-twitter-square` classes from `_utilities.scss`. [#302](https://github.com/mmistakes/minimal-mistakes/issues/302)
- Document installing additional Jekyll gem dependencies when using `gem "jekyll"` instead of `gem "github-pages"` to avoid any errors on run. [#305](https://github.com/mmistakes/minimal-mistakes/issues/305)
## [3.1.7](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.1.7)
### Enhancements
- Add translation key for "Recent Posts" used in home page `index.html`. [#316](https://github.com/mmistakes/minimal-mistakes/pull/316)
### Maintenance
- Small fix to avoid underlying the whitespace between icons and related text when hovering. [#303](https://github.com/mmistakes/minimal-mistakes/pull/303)
## [3.1.6](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.1.6)
### Maintenance
- Update gem dependencies. Run `bundle` to update `Gemfile.lock`.
## [3.1.5](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.1.5)
### Maintenance
- Fix `www` and `https` links in author profile include [#293](https://github.com/mmistakes/minimal-mistakes/pull/293)
## [3.1.4](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.1.4)
### Enhancements
- Add overlay_filter param to hero headers [#298](https://github.com/mmistakes/minimal-mistakes/pull/298)
## [3.1.3](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.1.3)
### Enhancements
- Improve `site.locale` documentation [#284](https://github.com/mmistakes/minimal-mistakes/issues/284)
- Remove ProTip note about protocol-less `site.url` as it is an anti-pattern [#288](https://github.com/mmistakes/minimal-mistakes/issues/288)
### Bug Fixes
- Fix `og_image` URL in seo.html [#277](https://github.com/mmistakes/minimal-mistakes/issues/277)
- Fix `author_profile` toggle when assigned in a `_layout` [#285](https://github.com/mmistakes/minimal-mistakes/issues/285)
- Fix typo in `build:all` npm script [#283](https://github.com/mmistakes/minimal-mistakes/pull/283)
- Fix URL typo documentation [#287](https://github.com/mmistakes/minimal-mistakes/issues/287)
- SEO author bug. If `twitter.username` is set and `author.twitter` is `nil` bad things happen. [#289](https://github.com/mmistakes/minimal-mistakes/issues/289)
## [3.1.2](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.1.2)
### Enhancements
- Explain how to use `nav_list` helper in [documentation](https://mmistakes.github.io/minimal-mistakes/docs/helpers/#navigation-list).
- Reduce left/right padding on smaller screens to increase width of main content column.
### Bug Fixes
- Fix alignment issues with related posts [#273](https://github.com/mmistakes/minimal-mistakes/issues/273) and "Follow" button in author profile [#274](https://github.com/mmistakes/minimal-mistakes/issues/274).
## [3.1.1](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.1.1)
### Bug Fix
- Fixed reading time bug when `words_per_minute` wasn't set in `_config.yml` [#271](https://github.com/mmistakes/minimal-mistakes/issues/271)
## [3.1.0](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.1.0)
### Enhancements
- Updated [Font Awesome](https://fortawesome.github.io/Font-Awesome/whats-new/) to version 4.6.1
- Added optional GitHub and Bitbucket links to footer if set on `site.author` in `_config.yml`.
### Bug Fixes
- Fixed Bitbucket URL typo in author sidebar.
## [3.0.3](https://github.com/mmistakes/minimal-mistakes/releases/tag/3.0.3)
### Enhancements
- Rebuilt the entire theme: layouts, includes, stylesheets, scripts, you name it.
- Refreshed the look and feel while staying true to the original design of the theme (author sidebar/main content).
- Replaced grid system with [Susy](http://susy.oddbird.net/).
- Replaced Grunt tasks with `npm` scripts.
- Removed Google Fonts and replaced with system fonts to improve performance (they can be [added back](https://mmistakes.github.io/minimal-mistakes/docs/stylesheets/) if desired)
- Greatly improved [theme documentation](https://mmistakes.github.io/minimal-mistakes/docs/quick-start-guide/).
- Increased the amount of sample posts, sample pages, and sample collections to throughly test the theme and edge-cases.
- Moved all sample content and assets out of `master` to keep it as clean as possible for forking.
- Added new layouts for `splash` pages, archives for [`jekyll-archives`](https://github.com/jekyll/jekyll-archives) if enabled, and [`compress.html`](https://github.com/penibelst/jekyll-compress-html) to improve performance.
- Added taxonomy links to posts (tags and categories).
- Added optional "reading time" meta data.
- Improved Liquid used for Twitter Cards and Open Graph data in `<head>`.
- Improved `gallery` include helper and added `feature_row` for use with splash page layout.
- Added Keybase.io, author web URI, and Bitbucket optional links to sidebar.
- Add `feed.xml` link to footer.
- Added a [UI text data file](https://mmistakes.github.io/minimal-mistakes/docs/ui-text/) to easily change all text found in the theme.
- Added LinkedIn to optional social share buttons.
- Added Facebook, Google+, and custom commenting options in addition to Disqus.
- Added optional breadcrumb links.
## [2.2.1](https://github.com/mmistakes/minimal-mistakes/releases/tag/2.2.1)
## [2.2.0](https://github.com/mmistakes/minimal-mistakes/releases/tag/2.2.0)
### Enhancements
- Add support for Jekyll 3.0
- Minor updates to syntax highlighting CSS and theme documentation
## [2.1.3](https://github.com/mmistakes/minimal-mistakes/releases/tag/2.1.3)
### Enhancements
- Cleaner print styles that remove the top navigation, social sharing buttons, and other elements not needed when printed.
## [2.1.2](https://github.com/mmistakes/minimal-mistakes/releases/tag/2.1.2)
### Enhancements
- Add optional CodePen icon/url to author side bar [#156](https://github.com/mmistakes/minimal-mistakes/pull/156)
- Documented Stackoverflow username explanation in `_config.yml` [#157](https://github.com/mmistakes/minimal-mistakes/pull/157)
- Simplified Liquid in `post-index.html` to better handle year listings [#166](https://github.com/mmistakes/minimal-mistakes/pull/166)
### Bug Fixes
- Cleanup Facebook related Open Graph meta tags [#149](https://github.com/mmistakes/minimal-mistakes/issues/149)
- Corrected minor typos [#158](https://github.com/mmistakes/minimal-mistakes/pull/158) [#175](https://github.com/mmistakes/minimal-mistakes/issues/175)
## [2.1.1](https://github.com/mmistakes/minimal-mistakes/releases/tag/2.1.1)
### Enhancements
- Add optional XING profile link to author sidebar
- Include open graph meta tags for feature image (if assigned) [#149](https://github.com/mmistakes/minimal-mistakes/issues/149)
- Create an include for feed footer
### Bug Fixes
- Remove http protocol from Google search form on sample 404 page
- Only show related posts if there are one or more available
- Fix alignment of email address link in author sidebar
## [2.1.0](https://github.com/mmistakes/minimal-mistakes/releases/tag/2.1.0)
### Enhancements
- Add optional social sharing buttons ([#42](https://github.com/mmistakes/minimal-mistakes/issues/42))
![social sharing buttons](https://cloud.githubusercontent.com/assets/1376749/5860522/d9f28a96-a22f-11e4-9b83-940a3a9a766a.png)
- Add Soundcloud, YouTube ([#95](https://github.com/mmistakes/minimal-mistakes/pull/95)), Flickr ([#119](https://github.com/mmistakes/minimal-mistakes/pull/119)), and Weibo ([#116](https://github.com/mmistakes/minimal-mistakes/pull/116)) icons for use in author sidebar.
- Fix typos in posts and documentation and remove references to Less
- Include note about Octopress gem being optional
- Post author override support extended to the Atom feed ([#71](https://github.com/mmistakes/minimal-mistakes/pull/71))
- Only include email address in feed if specified in `_config.yml` or author `_data`
- Wrap all page content in `#main` to harmonize article and post index styles ([#86](https://github.com/mmistakes/minimal-mistakes/issues/86))
- Include new sample feature images for posts and pages
- Table of contents improvements: fix collapse toggle, indent nested elements, show on small screens, and create an `_include` for reusing in posts and pages.
- Include note about running Jekyll with `bundle exec` when using Bundler
- Fix home page path in top navigation
- Remove Google Authorship ([#120](https://github.com/mmistakes/minimal-mistakes/issues/120))
- Remove duplicate author content that displayed in `div.article-author-bottom`
- Removed unused `_sass/print.scss` styles
- Improve comments in `.scss` files
## [2.0.0](https://github.com/mmistakes/minimal-mistakes/releases/tag/v2.0)
## [1.3.3](https://github.com/mmistakes/minimal-mistakes/releases/tag/1.3.3)
### Enhancements
- Added new icons and profile links for Stackoverflow, Dribbble, Pinterest, Foursquare, and Steam to the author bio sidebar.
- Cleaned up the Kramdown auto table of contents styling to be more readable
- Removed page width specific .less stylesheets and created mixins for easier updating
- Removed Modernizr since it wasn't being used
- Added pages to sitemap.xml
- Added category: to rake new_post task
- Minor typographic changes
### Bug Fixes
- Corrected various broken links in README and Theme Setup.
## [1.3.1](https://github.com/mmistakes/minimal-mistakes/releases/tag/1.3.1)
### Enhancements
- Cleaned up table of contents styling
- Reworked top navigation to be a better experience on small screens. Nav items now display vertically when the menu button is tapped, revealing links with larger touch targets.
![menu animation](https://camo.githubusercontent.com/3fbd8c1326485f4b1ab32c0005c0fca7660b5d31/68747470733a2f2f662e636c6f75642e6769746875622e636f6d2f6173736574732f313337363734392f323136343037352f31653366303663322d393465372d313165332d383961612d6436623636376562306564662e676966)
## [1.2.0](https://github.com/mmistakes/minimal-mistakes/releases/tag/1.2.0)
### Bug Fixes
- Table weren't filling the entire width of the content container. They now scale at 100%. Thanks [@dhruvbhatia](https://github.com/dhruvbhatia)
### Enhancements
- Decreased spacing between Markdown footnotes
- Removed dark background on footer
- Removed UPPERCASE styling on post titles in the index listing
## [1.1.4](https://github.com/mmistakes/minimal-mistakes/releases/tag/1.1.4)
### Bug Fixes
- Fix top navigation bug issue ([#10](https://github.com/mmistakes/minimal-mistakes/issues/10)) for real this time. Remember to clear your floats kids.
## [1.1.3](https://github.com/mmistakes/minimal-mistakes/releases/tag/1.1.3)
### Bug Fixes
- Fix top navigation links that weren't click able on small viewports (Issue [#10](https://github.com/mmistakes/minimal-mistakes/issues/10)).
- Remove line wrap from top navigation links that may span multiple lines.
## [1.1.2](https://github.com/mmistakes/minimal-mistakes/releases/tag/1.1.2)
### Enhancements
- Added Grunt build script for compiling Less/JavaScript and optimizing image assets.
- Added support for large image summary Twitter card.
- Stylesheet adjustments
## [1.1.1](https://github.com/mmistakes/minimal-mistakes/releases/tag/1.1.1)
### Bug Fixes
- Removed [Typeplate](http://typeplate.com/) styles. Was [causing issues with newer versions of Less](https://github.com/typeplate/typeplate.github.io/issues/108) and is no longer maintained.
### Enhancements
- Added [image attribution](http://mmistakes.github.io/minimal-mistakes/theme-setup/#feature-images) for post and page feature images.
- Added [404 page](http://mmistakes.github.io/minimal-mistakes/404.html).
- Cleaned up various Less variables to better align with naming conventions used in other MM Jekyll themes.
- Removed Chrome Frame references.
- Added global CSS3 transitions to text and block elements.
- Improved typography in a few places.
## [1.0.2](https://github.com/mmistakes/minimal-mistakes/releases/tag/v1.0.2)
### Enhancements
- Google Analytics, Google Authorship, webmaster verifies, and Twitter card meta are now optional.
## [1.0.1](https://github.com/mmistakes/minimal-mistakes/releases/tag/v1.0.1)
+80 -101
View File
@@ -1,30 +1,14 @@
--- GNU GENERAL PUBLIC LICENSE
layout: splash Version 2, June 1991
title : "GRID license"
author_profile: false
excerpt: "Grid is licensed under GPL 2.0"
permalink: /license/
header:
overlay_color: "#333"
cta_label: "GPL licenses FAQs"
cta_url: "https://www.gnu.org/licenses/gpl-faq.html"
---
{% include base_path %} Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
GNU General Public License Preamble
==========================
_Version 2, June 1991_ The licenses for most software are designed to take away your
_Copyright © 1989, 1991 Free Software Foundation, Inc.,_
_51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA_
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
### Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This software--to make sure the software is free for all its users. This
@@ -34,55 +18,56 @@ using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to the GNU Lesser General Public License instead.) You can apply it to
your programs, too. your programs, too.
When we speak of free software, we are referring to freedom, not When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things. in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights. anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it. distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their source code. And you must show them these terms so they know their
rights. rights.
We protect your rights with two steps: **(1)** copyright the software, and We protect your rights with two steps: (1) copyright the software, and
**(2)** offer you this license which gives you legal permission to copy, (2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software. distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original that any problems introduced by others will not reflect on the original
authors' reputations. authors' reputations.
Finally, any free program is threatened constantly by software Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all. patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and The precise terms and conditions for copying, distribution and
modification follow. modification follow.
### TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
**0.** This License applies to any program or other work which contains 0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The Program, below, under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a work based on the Program refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law: means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it, that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in language. (Hereinafter, translation is included without limitation in
the term modification.) Each licensee is addressed as you. the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of covered by this License; they are outside its scope. The act of
@@ -91,7 +76,7 @@ is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program). Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does. Whether that is true depends on what the Program does.
**1.** You may copy and distribute verbatim copies of the Program's 1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the copyright notice and disclaimer of warranty; keep intact all the
@@ -102,27 +87,29 @@ along with the Program.
You may charge a fee for the physical act of transferring a copy, and You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee. you may at your option offer warranty protection in exchange for a fee.
**2.** You may modify your copy or copies of the Program or any portion 2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1 distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions: above, provided that you also meet all of these conditions:
* **a)** You must cause the modified files to carry prominent notices a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change. stating that you changed the files and the date of any change.
* **b)** You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any b) You must cause any work that you distribute or publish, that in
part thereof, to be licensed as a whole at no charge to all third whole or in part contains or is derived from the Program or any
parties under the terms of this License. part thereof, to be licensed as a whole at no charge to all third
* **c)** If the modified program normally reads commands interactively parties under the terms of this License.
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an c) If the modified program normally reads commands interactively
announcement including an appropriate copyright notice and a when run, you must cause it, when started running for such
notice that there is no warranty (or else, saying that you provide interactive use in the most ordinary way, to print or display an
a warranty) and that users may redistribute the program under announcement including an appropriate copyright notice and a
these conditions, and telling the user how to view a copy of this notice that there is no warranty (or else, saying that you provide
License. (Exception: if the Program itself is interactive but a warranty) and that users may redistribute the program under
does not normally print such an announcement, your work based on these conditions, and telling the user how to view a copy of this
the Program is not required to print an announcement.) License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program, identifiable sections of that work are not derived from the Program,
@@ -144,24 +131,26 @@ with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under a storage or distribution medium does not bring the other work under
the scope of this License. the scope of this License.
**3.** You may copy and distribute the Program (or a work based on it, 3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following: Sections 1 and 2 above provided that you also do one of the following:
* **a)** Accompany it with the complete corresponding machine-readable a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or, 1 and 2 above on a medium customarily used for software interchange; or,
* **b)** Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your b) Accompany it with a written offer, valid for at least three
cost of physically performing source distribution, a complete years, to give any third party, for a charge no more than your
machine-readable copy of the corresponding source code, to be cost of physically performing source distribution, a complete
distributed under the terms of Sections 1 and 2 above on a medium machine-readable copy of the corresponding source code, to be
customarily used for software interchange; or, distributed under the terms of Sections 1 and 2 above on a medium
* **c)** Accompany it with the information you received as to the offer customarily used for software interchange; or,
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you c) Accompany it with the information you received as to the offer
received the program in object code or executable form with such to distribute corresponding source code. (This alternative is
an offer, in accord with Subsection b above.) allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source making modifications to it. For an executable work, complete source
@@ -180,7 +169,7 @@ access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not distribution of the source code, even though third parties are not
compelled to copy the source along with the object code. compelled to copy the source along with the object code.
**4.** You may not copy, modify, sublicense, or distribute the Program 4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License. void, and will automatically terminate your rights under this License.
@@ -188,7 +177,7 @@ However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such this License will not have their licenses terminated so long as such
parties remain in full compliance. parties remain in full compliance.
**5.** You are not required to accept this License, since you have not 5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by prohibited by law if you do not accept this License. Therefore, by
@@ -197,7 +186,7 @@ Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying all its terms and conditions for copying, distributing or modifying
the Program or works based on it. the Program or works based on it.
**6.** Each time you redistribute the Program (or any work based on the 6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further these terms and conditions. You may not impose any further
@@ -205,7 +194,7 @@ restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to You are not responsible for enforcing compliance by third parties to
this License. this License.
**7.** If, as a consequence of a court judgment or allegation of patent 7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues), infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not otherwise) that contradict the conditions of this License, they do not
@@ -237,7 +226,7 @@ impose that choice.
This section is intended to make thoroughly clear what is believed to This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License. be a consequence of the rest of this License.
**8.** If the distribution and/or use of the Program is restricted in 8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding may add an explicit geographical distribution limitation excluding
@@ -245,20 +234,20 @@ those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License. the limitation as if written in the body of this License.
**9.** The Free Software Foundation may publish revised and/or new versions 9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to be similar in spirit to the present version, but may differ in detail to
address new problems or concerns. address new problems or concerns.
Each version is given a distinguishing version number. If the Program Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and any specifies a version number of this License which applies to it and "any
later version, you have the option of following the terms and conditions later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software this License, you may choose any version ever published by the Free Software
Foundation. Foundation.
**10.** If you wish to incorporate parts of the Program into other free 10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes Software Foundation, write to the Free Software Foundation; we sometimes
@@ -266,19 +255,19 @@ make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally. of promoting the sharing and reuse of software generally.
### NO WARRANTY NO WARRANTY
**11.** BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION. REPAIR OR CORRECTION.
**12.** IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
@@ -288,18 +277,18 @@ YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES. POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS END OF TERMS AND CONDITIONS
### How to Apply These Terms to Your New Programs How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms. free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least convey the exclusion of warranty; and each file should have at least
the copyright line and a pointer to where the full notice is found. the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.> <one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author> Copyright (C) <year> <name of author>
@@ -328,13 +317,13 @@ when it starts in an interactive mode:
This is free software, and you are welcome to redistribute it This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details. under certain conditions; type `show c' for details.
The hypothetical commands `show w` and `show c` should show the appropriate The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may parts of the General Public License. Of course, the commands you use may
be called something other than `show w` and `show c`; they could even be be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program. mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your You should also get your employer (if you work as a programmer) or your
school, if any, to sign a copyright disclaimer for the program, if school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names: necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program Yoyodyne, Inc., hereby disclaims all copyright interest in the program
@@ -349,13 +338,3 @@ consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. Public License instead of this License.
{% for post in paginator.posts %}
{% include archive-single.html %}
{% endfor %}
{% include paginator.html %}
View File
-27
View File
@@ -1,27 +0,0 @@
source "https://rubygems.org"
# Hello! This is where you manage which Jekyll version is used to run.
# When you want to use a different version, change it below, save the
# file and run `bundle install`. Run Jekyll with `bundle exec`, like so:
#
# bundle exec jekyll serve
#
# This will help ensure the proper Jekyll version is running.
# Happy Jekylling!
gem "github-pages", group: :jekyll_plugins
# If you want to use Jekyll native, uncomment the line below.
# To upgrade, run `bundle update`.
# gem "jekyll"
gem "wdm", "~> 0.1.0" if Gem.win_platform?
# If you have any plugins, put them here!
group :jekyll_plugins do
# gem "jekyll-archives"
gem 'jekyll-octicons'
end
-155
View File
@@ -1,155 +0,0 @@
GEM
remote: https://rubygems.org/
specs:
activesupport (4.2.7)
i18n (~> 0.7)
json (~> 1.7, >= 1.7.7)
minitest (~> 5.1)
thread_safe (~> 0.3, >= 0.3.4)
tzinfo (~> 1.1)
addressable (2.4.0)
coffee-script (2.4.1)
coffee-script-source
execjs
coffee-script-source (1.10.0)
colorator (1.1.0)
ethon (0.9.1)
ffi (>= 1.3.0)
execjs (2.7.0)
faraday (0.9.2)
multipart-post (>= 1.2, < 3)
ffi (1.9.14)
ffi (1.9.14-x64-mingw32)
forwardable-extended (2.6.0)
gemoji (2.1.0)
github-pages (104)
activesupport (= 4.2.7)
github-pages-health-check (= 1.2.0)
jekyll (= 3.3.0)
jekyll-avatar (= 0.4.2)
jekyll-coffeescript (= 1.0.1)
jekyll-feed (= 0.8.0)
jekyll-gist (= 1.4.0)
jekyll-github-metadata (= 2.2.0)
jekyll-mentions (= 1.2.0)
jekyll-paginate (= 1.1.0)
jekyll-redirect-from (= 0.11.0)
jekyll-sass-converter (= 1.3.0)
jekyll-seo-tag (= 2.1.0)
jekyll-sitemap (= 0.12.0)
jekyll-swiss (= 0.4.0)
jemoji (= 0.7.0)
kramdown (= 1.11.1)
liquid (= 3.0.6)
listen (= 3.0.6)
mercenary (~> 0.3)
minima (= 2.0.0)
rouge (= 1.11.1)
terminal-table (~> 1.4)
github-pages-health-check (1.2.0)
addressable (~> 2.3)
net-dns (~> 0.8)
octokit (~> 4.0)
public_suffix (~> 1.4)
typhoeus (~> 0.7)
html-pipeline (2.4.2)
activesupport (>= 2)
nokogiri (>= 1.4)
i18n (0.7.0)
jekyll (3.3.0)
addressable (~> 2.4)
colorator (~> 1.0)
jekyll-sass-converter (~> 1.0)
jekyll-watch (~> 1.1)
kramdown (~> 1.3)
liquid (~> 3.0)
mercenary (~> 0.3.3)
pathutil (~> 0.9)
rouge (~> 1.7)
safe_yaml (~> 1.0)
jekyll-avatar (0.4.2)
jekyll (~> 3.0)
jekyll-coffeescript (1.0.1)
coffee-script (~> 2.2)
jekyll-feed (0.8.0)
jekyll (~> 3.3)
jekyll-gist (1.4.0)
octokit (~> 4.2)
jekyll-github-metadata (2.2.0)
jekyll (~> 3.1)
octokit (~> 4.0, != 4.4.0)
jekyll-mentions (1.2.0)
activesupport (~> 4.0)
html-pipeline (~> 2.3)
jekyll (~> 3.0)
jekyll-octicons (3.0.1)
jekyll (~> 3.1)
octicons (~> 3.0)
jekyll-paginate (1.1.0)
jekyll-redirect-from (0.11.0)
jekyll (>= 2.0)
jekyll-sass-converter (1.3.0)
sass (~> 3.2)
jekyll-seo-tag (2.1.0)
jekyll (~> 3.3)
jekyll-sitemap (0.12.0)
jekyll (~> 3.3)
jekyll-swiss (0.4.0)
jekyll-watch (1.5.0)
listen (~> 3.0, < 3.1)
jemoji (0.7.0)
activesupport (~> 4.0)
gemoji (~> 2.0)
html-pipeline (~> 2.2)
jekyll (>= 3.0)
json (1.8.3)
kramdown (1.11.1)
liquid (3.0.6)
listen (3.0.6)
rb-fsevent (>= 0.9.3)
rb-inotify (>= 0.9.7)
mercenary (0.3.6)
mini_portile2 (2.1.0)
minima (2.0.0)
minitest (5.9.1)
multipart-post (2.0.0)
net-dns (0.8.0)
nokogiri (1.6.8.1)
mini_portile2 (~> 2.1.0)
nokogiri (1.6.8.1-x64-mingw32)
mini_portile2 (~> 2.1.0)
octicons (3.0.1)
nokogiri (>= 1.6.3.1)
octokit (4.4.1)
sawyer (~> 0.7.0, >= 0.5.3)
pathutil (0.14.0)
forwardable-extended (~> 2.6)
public_suffix (1.5.3)
rb-fsevent (0.9.8)
rb-inotify (0.9.7)
ffi (>= 0.5.0)
rouge (1.11.1)
safe_yaml (1.0.4)
sass (3.4.22)
sawyer (0.7.0)
addressable (>= 2.3.5, < 2.5)
faraday (~> 0.8, < 0.10)
terminal-table (1.7.3)
unicode-display_width (~> 1.1.1)
thread_safe (0.3.5)
typhoeus (0.8.0)
ethon (>= 0.8.0)
tzinfo (1.2.2)
thread_safe (~> 0.1)
unicode-display_width (1.1.1)
PLATFORMS
ruby
x64-mingw32
DEPENDENCIES
github-pages
jekyll-octicons
BUNDLED WITH
1.13.3
+37
View File
@@ -0,0 +1,37 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/DisableWarnings.h
Copyright (C) 2016
Author: Guido Cossu <guido.cossu@ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#ifndef DISABLE_WARNINGS_H
#define DISABLE_WARNINGS_H
//disables and intel compiler specific warning (in json.hpp)
#pragma warning disable 488
#endif
+49
View File
@@ -0,0 +1,49 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Grid.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: azusayamaguchi <ayamaguc@YAMAKAZE.local>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
//
// Grid.h
// simd
//
// Created by Peter Boyle on 09/05/2014.
// Copyright (c) 2014 University of Edinburgh. All rights reserved.
//
#ifndef GRID_H
#define GRID_H
#include <Grid/GridCore.h>
#include <Grid/GridQCDcore.h>
#include <Grid/qcd/action/Action.h>
#include <Grid/qcd/utils/GaugeFix.h>
#include <Grid/qcd/smearing/Smearing.h>
#include <Grid/parallelIO/MetaData.h>
#include <Grid/qcd/hmc/HMC_aggregate.h>
#endif
+61
View File
@@ -0,0 +1,61 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Grid.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: azusayamaguchi <ayamaguc@YAMAKAZE.local>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
//
// Grid.h
// simd
//
// Created by Peter Boyle on 09/05/2014.
// Copyright (c) 2014 University of Edinburgh. All rights reserved.
//
#ifndef GRID_BASE_H
#define GRID_BASE_H
#include <Grid/GridStd.h>
#include <Grid/perfmon/Timer.h>
#include <Grid/perfmon/PerfCount.h>
#include <Grid/log/Log.h>
#include <Grid/allocator/AlignedAllocator.h>
#include <Grid/simd/Simd.h>
#include <Grid/serialisation/Serialisation.h>
#include <Grid/threads/Threads.h>
#include <Grid/util/Util.h>
#include <Grid/util/Sha.h>
#include <Grid/communicator/Communicator.h>
#include <Grid/cartesian/Cartesian.h>
#include <Grid/tensors/Tensors.h>
#include <Grid/lattice/Lattice.h>
#include <Grid/cshift/Cshift.h>
#include <Grid/stencil/Stencil.h>
#include <Grid/parallelIO/BinaryIO.h>
#include <Grid/algorithms/Algorithms.h>
#endif
+42
View File
@@ -0,0 +1,42 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Grid.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: azusayamaguchi <ayamaguc@YAMAKAZE.local>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_QCD_CORE_H
#define GRID_QCD_CORE_H
/////////////////////////
// Core Grid QCD headers
/////////////////////////
#include <Grid/GridCore.h>
#include <Grid/qcd/QCD.h>
#include <Grid/qcd/spin/Spin.h>
#include <Grid/qcd/utils/Utils.h>
#include <Grid/qcd/representations/Representations.h>
#endif
+29
View File
@@ -0,0 +1,29 @@
#ifndef GRID_STD_H
#define GRID_STD_H
///////////////////
// Std C++ dependencies
///////////////////
#include <cassert>
#include <complex>
#include <vector>
#include <string>
#include <iostream>
#include <iomanip>
#include <random>
#include <functional>
#include <stdio.h>
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
#include <ctime>
#include <sys/time.h>
#include <chrono>
#include <zlib.h>
///////////////////
// Grid config
///////////////////
#include "Config.h"
#endif /* GRID_STD_H */
+14
View File
@@ -0,0 +1,14 @@
#pragma once
// Force Eigen to use MKL if Grid has been configured with --enable-mkl
#ifdef USE_MKL
#define EIGEN_USE_MKL_ALL
#endif
#if defined __GNUC__
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
#endif
#include <Grid/Eigen/Dense>
#if defined __GNUC__
#pragma GCC diagnostic pop
#endif
+63
View File
@@ -0,0 +1,63 @@
extra_sources=
extra_headers=
if BUILD_COMMS_MPI3
extra_sources+=communicator/Communicator_mpi3.cc
extra_sources+=communicator/Communicator_base.cc
extra_sources+=communicator/SharedMemoryMPI.cc
extra_sources+=communicator/SharedMemory.cc
endif
if BUILD_COMMS_NONE
extra_sources+=communicator/Communicator_none.cc
extra_sources+=communicator/Communicator_base.cc
extra_sources+=communicator/SharedMemoryNone.cc
extra_sources+=communicator/SharedMemory.cc
endif
if BUILD_HDF5
extra_sources+=serialisation/Hdf5IO.cc
extra_headers+=serialisation/Hdf5IO.h
extra_headers+=serialisation/Hdf5Type.h
endif
all: version-cache
version-cache:
@if [ `git status --porcelain | grep -v '??' | wc -l` -gt 0 ]; then\
a="uncommited changes";\
else\
a="clean";\
fi;\
echo "`git log -n 1 --format=format:"#define GITHASH \\"%H:%d $$a\\"%n" HEAD`" > vertmp;\
if [ -e version-cache ]; then\
d=`diff vertmp version-cache`;\
if [ "$${d}" != "" ]; then\
mv vertmp version-cache;\
rm -f Version.h;\
fi;\
else\
mv vertmp version-cache;\
rm -f Version.h;\
fi;\
rm -f vertmp
Version.h:
cp version-cache Version.h
.PHONY: version-cache
#
# Libraries
#
include Make.inc
include Eigen.inc
lib_LIBRARIES = libGrid.a
CCFILES += $(extra_sources)
HFILES += $(extra_headers) Config.h Version.h
libGrid_a_SOURCES = $(CCFILES)
libGrid_adir = $(includedir)/Grid
nobase_dist_pkginclude_HEADERS = $(HFILES) $(eigen_files) $(eigen_unsupp_files)
+61
View File
@@ -0,0 +1,61 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Algorithms.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_ALGORITHMS_H
#define GRID_ALGORITHMS_H
#include <Grid/algorithms/SparseMatrix.h>
#include <Grid/algorithms/LinearOperator.h>
#include <Grid/algorithms/Preconditioner.h>
#include <Grid/algorithms/approx/Zolotarev.h>
#include <Grid/algorithms/approx/Chebyshev.h>
#include <Grid/algorithms/approx/Remez.h>
#include <Grid/algorithms/approx/MultiShiftFunction.h>
#include <Grid/algorithms/approx/Forecast.h>
#include <Grid/algorithms/iterative/Deflation.h>
#include <Grid/algorithms/iterative/ConjugateGradient.h>
#include <Grid/algorithms/iterative/ConjugateResidual.h>
#include <Grid/algorithms/iterative/NormalEquations.h>
#include <Grid/algorithms/iterative/SchurRedBlack.h>
#include <Grid/algorithms/iterative/ConjugateGradientMultiShift.h>
#include <Grid/algorithms/iterative/ConjugateGradientMixedPrec.h>
#include <Grid/algorithms/iterative/BlockConjugateGradient.h>
#include <Grid/algorithms/iterative/ConjugateGradientReliableUpdate.h>
#include <Grid/algorithms/iterative/ImplicitlyRestartedLanczos.h>
#include <Grid/algorithms/CoarsenedMatrix.h>
#include <Grid/algorithms/FFT.h>
// EigCg
// Pcg
// Hdcg
// GCR
// etc..
#endif
+480
View File
@@ -0,0 +1,480 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/CoarsenedMatrix.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: Peter Boyle <peterboyle@Peters-MacBook-Pro-2.local>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_ALGORITHM_COARSENED_MATRIX_H
#define GRID_ALGORITHM_COARSENED_MATRIX_H
namespace Grid {
class Geometry {
// int dimension;
public:
int npoint;
std::vector<int> directions ;
std::vector<int> displacements;
Geometry(int _d) {
int base = (_d==5) ? 1:0;
// make coarse grid stencil for 4d , not 5d
if ( _d==5 ) _d=4;
npoint = 2*_d+1;
directions.resize(npoint);
displacements.resize(npoint);
for(int d=0;d<_d;d++){
directions[2*d ] = d+base;
directions[2*d+1] = d+base;
displacements[2*d ] = +1;
displacements[2*d+1] = -1;
}
directions [2*_d]=0;
displacements[2*_d]=0;
//// report back
std::cout<<GridLogMessage<<"directions :";
for(int d=0;d<npoint;d++) std::cout<< directions[d]<< " ";
std::cout <<std::endl;
std::cout<<GridLogMessage<<"displacements :";
for(int d=0;d<npoint;d++) std::cout<< displacements[d]<< " ";
std::cout<<std::endl;
}
/*
// Original cleaner code
Geometry(int _d) : dimension(_d), npoint(2*_d+1), directions(npoint), displacements(npoint) {
for(int d=0;d<dimension;d++){
directions[2*d ] = d;
directions[2*d+1] = d;
displacements[2*d ] = +1;
displacements[2*d+1] = -1;
}
directions [2*dimension]=0;
displacements[2*dimension]=0;
}
std::vector<int> GetDelta(int point) {
std::vector<int> delta(dimension,0);
delta[directions[point]] = displacements[point];
return delta;
};
*/
};
template<class Fobj,class CComplex,int nbasis>
class Aggregation {
public:
typedef iVector<CComplex,nbasis > siteVector;
typedef Lattice<siteVector> CoarseVector;
typedef Lattice<iMatrix<CComplex,nbasis > > CoarseMatrix;
typedef Lattice< CComplex > CoarseScalar; // used for inner products on fine field
typedef Lattice<Fobj > FineField;
GridBase *CoarseGrid;
GridBase *FineGrid;
std::vector<Lattice<Fobj> > subspace;
int checkerboard;
Aggregation(GridBase *_CoarseGrid,GridBase *_FineGrid,int _checkerboard) :
CoarseGrid(_CoarseGrid),
FineGrid(_FineGrid),
subspace(nbasis,_FineGrid),
checkerboard(_checkerboard)
{
};
void Orthogonalise(void){
CoarseScalar InnerProd(CoarseGrid);
std::cout << GridLogMessage <<" Gramm-Schmidt pass 1"<<std::endl;
blockOrthogonalise(InnerProd,subspace);
std::cout << GridLogMessage <<" Gramm-Schmidt pass 2"<<std::endl;
blockOrthogonalise(InnerProd,subspace);
// std::cout << GridLogMessage <<" Gramm-Schmidt checking orthogonality"<<std::endl;
// CheckOrthogonal();
}
void CheckOrthogonal(void){
CoarseVector iProj(CoarseGrid);
CoarseVector eProj(CoarseGrid);
for(int i=0;i<nbasis;i++){
blockProject(iProj,subspace[i],subspace);
eProj=zero;
parallel_for(int ss=0;ss<CoarseGrid->oSites();ss++){
eProj._odata[ss](i)=CComplex(1.0);
}
eProj=eProj - iProj;
std::cout<<GridLogMessage<<"Orthog check error "<<i<<" " << norm2(eProj)<<std::endl;
}
std::cout<<GridLogMessage <<"CheckOrthog done"<<std::endl;
}
void ProjectToSubspace(CoarseVector &CoarseVec,const FineField &FineVec){
blockProject(CoarseVec,FineVec,subspace);
}
void PromoteFromSubspace(const CoarseVector &CoarseVec,FineField &FineVec){
FineVec.checkerboard = subspace[0].checkerboard;
blockPromote(CoarseVec,FineVec,subspace);
}
void CreateSubspaceRandom(GridParallelRNG &RNG){
for(int i=0;i<nbasis;i++){
random(RNG,subspace[i]);
std::cout<<GridLogMessage<<" norm subspace["<<i<<"] "<<norm2(subspace[i])<<std::endl;
}
Orthogonalise();
}
/*
virtual void CreateSubspaceLanczos(GridParallelRNG &RNG,LinearOperatorBase<FineField> &hermop,int nn=nbasis)
{
// Run a Lanczos with sloppy convergence
const int Nstop = nn;
const int Nk = nn+20;
const int Np = nn+20;
const int Nm = Nk+Np;
const int MaxIt= 10000;
RealD resid = 1.0e-3;
Chebyshev<FineField> Cheb(0.5,64.0,21);
ImplicitlyRestartedLanczos<FineField> IRL(hermop,Cheb,Nstop,Nk,Nm,resid,MaxIt);
// IRL.lock = 1;
FineField noise(FineGrid); gaussian(RNG,noise);
FineField tmp(FineGrid);
std::vector<RealD> eval(Nm);
std::vector<FineField> evec(Nm,FineGrid);
int Nconv;
IRL.calc(eval,evec,
noise,
Nconv);
// pull back nn vectors
for(int b=0;b<nn;b++){
subspace[b] = evec[b];
std::cout << GridLogMessage <<"subspace["<<b<<"] = "<<norm2(subspace[b])<<std::endl;
hermop.Op(subspace[b],tmp);
std::cout<<GridLogMessage << "filtered["<<b<<"] <f|MdagM|f> "<<norm2(tmp)<<std::endl;
noise = tmp - sqrt(eval[b])*subspace[b] ;
std::cout<<GridLogMessage << " lambda_"<<b<<" = "<< eval[b] <<" ; [ M - Lambda ]_"<<b<<" vec_"<<b<<" = " <<norm2(noise)<<std::endl;
noise = tmp + eval[b]*subspace[b] ;
std::cout<<GridLogMessage << " lambda_"<<b<<" = "<< eval[b] <<" ; [ M - Lambda ]_"<<b<<" vec_"<<b<<" = " <<norm2(noise)<<std::endl;
}
Orthogonalise();
for(int b=0;b<nn;b++){
std::cout << GridLogMessage <<"subspace["<<b<<"] = "<<norm2(subspace[b])<<std::endl;
}
}
*/
virtual void CreateSubspace(GridParallelRNG &RNG,LinearOperatorBase<FineField> &hermop,int nn=nbasis) {
RealD scale;
ConjugateGradient<FineField> CG(1.0e-2,10000);
FineField noise(FineGrid);
FineField Mn(FineGrid);
for(int b=0;b<nn;b++){
gaussian(RNG,noise);
scale = std::pow(norm2(noise),-0.5);
noise=noise*scale;
hermop.Op(noise,Mn); std::cout<<GridLogMessage << "noise ["<<b<<"] <n|MdagM|n> "<<norm2(Mn)<<std::endl;
for(int i=0;i<1;i++){
CG(hermop,noise,subspace[b]);
noise = subspace[b];
scale = std::pow(norm2(noise),-0.5);
noise=noise*scale;
}
hermop.Op(noise,Mn); std::cout<<GridLogMessage << "filtered["<<b<<"] <f|MdagM|f> "<<norm2(Mn)<<std::endl;
subspace[b] = noise;
}
Orthogonalise();
}
};
// Fine Object == (per site) type of fine field
// nbasis == number of deflation vectors
template<class Fobj,class CComplex,int nbasis>
class CoarsenedMatrix : public SparseMatrixBase<Lattice<iVector<CComplex,nbasis > > > {
public:
typedef iVector<CComplex,nbasis > siteVector;
typedef Lattice<siteVector> CoarseVector;
typedef Lattice<iMatrix<CComplex,nbasis > > CoarseMatrix;
typedef Lattice< CComplex > CoarseScalar; // used for inner products on fine field
typedef Lattice<Fobj > FineField;
////////////////////
// Data members
////////////////////
Geometry geom;
GridBase * _grid;
CartesianStencil<siteVector,siteVector> Stencil;
std::vector<CoarseMatrix> A;
///////////////////////
// Interface
///////////////////////
GridBase * Grid(void) { return _grid; }; // this is all the linalg routines need to know
RealD M (const CoarseVector &in, CoarseVector &out){
conformable(_grid,in._grid);
conformable(in._grid,out._grid);
SimpleCompressor<siteVector> compressor;
Stencil.HaloExchange(in,compressor);
parallel_for(int ss=0;ss<Grid()->oSites();ss++){
siteVector res = zero;
siteVector nbr;
int ptype;
StencilEntry *SE;
for(int point=0;point<geom.npoint;point++){
SE=Stencil.GetEntry(ptype,point,ss);
if(SE->_is_local&&SE->_permute) {
permute(nbr,in._odata[SE->_offset],ptype);
} else if(SE->_is_local) {
nbr = in._odata[SE->_offset];
} else {
nbr = Stencil.CommBuf()[SE->_offset];
}
res = res + A[point]._odata[ss]*nbr;
}
vstream(out._odata[ss],res);
}
return norm2(out);
};
RealD Mdag (const CoarseVector &in, CoarseVector &out){
return M(in,out);
};
// Defer support for further coarsening for now
void Mdiag (const CoarseVector &in, CoarseVector &out){};
void Mdir (const CoarseVector &in, CoarseVector &out,int dir, int disp){};
CoarsenedMatrix(GridCartesian &CoarseGrid) :
_grid(&CoarseGrid),
geom(CoarseGrid._ndimension),
Stencil(&CoarseGrid,geom.npoint,Even,geom.directions,geom.displacements),
A(geom.npoint,&CoarseGrid)
{
};
void CoarsenOperator(GridBase *FineGrid,LinearOperatorBase<Lattice<Fobj> > &linop,
Aggregation<Fobj,CComplex,nbasis> & Subspace){
FineField iblock(FineGrid); // contributions from within this block
FineField oblock(FineGrid); // contributions from outwith this block
FineField phi(FineGrid);
FineField tmp(FineGrid);
FineField zz(FineGrid); zz=zero;
FineField Mphi(FineGrid);
Lattice<iScalar<vInteger> > coor(FineGrid);
CoarseVector iProj(Grid());
CoarseVector oProj(Grid());
CoarseScalar InnerProd(Grid());
// Orthogonalise the subblocks over the basis
blockOrthogonalise(InnerProd,Subspace.subspace);
// Compute the matrix elements of linop between this orthonormal
// set of vectors.
int self_stencil=-1;
for(int p=0;p<geom.npoint;p++){
A[p]=zero;
if( geom.displacements[p]==0){
self_stencil=p;
}
}
assert(self_stencil!=-1);
for(int i=0;i<nbasis;i++){
phi=Subspace.subspace[i];
std::cout<<GridLogMessage<<"("<<i<<").."<<std::endl;
for(int p=0;p<geom.npoint;p++){
int dir = geom.directions[p];
int disp = geom.displacements[p];
Integer block=(FineGrid->_rdimensions[dir])/(Grid()->_rdimensions[dir]);
LatticeCoordinate(coor,dir);
if ( disp==0 ){
linop.OpDiag(phi,Mphi);
}
else {
linop.OpDir(phi,Mphi,dir,disp);
}
////////////////////////////////////////////////////////////////////////
// Pick out contributions coming from this cell and neighbour cell
////////////////////////////////////////////////////////////////////////
if ( disp==0 ) {
iblock = Mphi;
oblock = zero;
} else if ( disp==1 ) {
oblock = where(mod(coor,block)==(block-1),Mphi,zz);
iblock = where(mod(coor,block)!=(block-1),Mphi,zz);
} else if ( disp==-1 ) {
oblock = where(mod(coor,block)==(Integer)0,Mphi,zz);
iblock = where(mod(coor,block)!=(Integer)0,Mphi,zz);
} else {
assert(0);
}
Subspace.ProjectToSubspace(iProj,iblock);
Subspace.ProjectToSubspace(oProj,oblock);
// blockProject(iProj,iblock,Subspace.subspace);
// blockProject(oProj,oblock,Subspace.subspace);
parallel_for(int ss=0;ss<Grid()->oSites();ss++){
for(int j=0;j<nbasis;j++){
if( disp!= 0 ) {
A[p]._odata[ss](j,i) = oProj._odata[ss](j);
}
A[self_stencil]._odata[ss](j,i) = A[self_stencil]._odata[ss](j,i) + iProj._odata[ss](j);
}
}
}
}
#if 0
///////////////////////////
// test code worth preserving in if block
///////////////////////////
std::cout<<GridLogMessage<< " Computed matrix elements "<< self_stencil <<std::endl;
for(int p=0;p<geom.npoint;p++){
std::cout<<GridLogMessage<< "A["<<p<<"]" << std::endl;
std::cout<<GridLogMessage<< A[p] << std::endl;
}
std::cout<<GridLogMessage<< " picking by block0 "<< self_stencil <<std::endl;
phi=Subspace.subspace[0];
std::vector<int> bc(FineGrid->_ndimension,0);
blockPick(Grid(),phi,tmp,bc); // Pick out a block
linop.Op(tmp,Mphi); // Apply big dop
blockProject(iProj,Mphi,Subspace.subspace); // project it and print it
std::cout<<GridLogMessage<< " Computed matrix elements from block zero only "<<std::endl;
std::cout<<GridLogMessage<< iProj <<std::endl;
std::cout<<GridLogMessage<<"Computed Coarse Operator"<<std::endl;
#endif
// ForceHermitian();
AssertHermitian();
// ForceDiagonal();
}
void ForceDiagonal(void) {
std::cout<<GridLogMessage<<"**************************************************"<<std::endl;
std::cout<<GridLogMessage<<"**** Forcing coarse operator to be diagonal ****"<<std::endl;
std::cout<<GridLogMessage<<"**************************************************"<<std::endl;
for(int p=0;p<8;p++){
A[p]=zero;
}
GridParallelRNG RNG(Grid()); RNG.SeedFixedIntegers(std::vector<int>({55,72,19,17,34}));
Lattice<iScalar<CComplex> > val(Grid()); random(RNG,val);
Complex one(1.0);
iMatrix<CComplex,nbasis> ident; ident=one;
val = val*adj(val);
val = val + 1.0;
A[8] = val*ident;
// for(int s=0;s<Grid()->oSites();s++) {
// A[8]._odata[s]=val._odata[s];
// }
}
void ForceHermitian(void) {
for(int d=0;d<4;d++){
int dd=d+1;
A[2*d] = adj(Cshift(A[2*d+1],dd,1));
}
// A[8] = 0.5*(A[8] + adj(A[8]));
}
void AssertHermitian(void) {
CoarseMatrix AA (Grid());
CoarseMatrix AAc (Grid());
CoarseMatrix Diff (Grid());
for(int d=0;d<4;d++){
int dd=d+1;
AAc = Cshift(A[2*d+1],dd,1);
AA = A[2*d];
Diff = AA - adj(AAc);
std::cout<<GridLogMessage<<"Norm diff dim "<<d<<" "<< norm2(Diff)<<std::endl;
std::cout<<GridLogMessage<<"Norm dim "<<d<<" "<< norm2(AA)<<std::endl;
}
Diff = A[8] - adj(A[8]);
std::cout<<GridLogMessage<<"Norm diff local "<< norm2(Diff)<<std::endl;
std::cout<<GridLogMessage<<"Norm local "<< norm2(A[8])<<std::endl;
}
};
}
#endif
+306
View File
@@ -0,0 +1,306 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Cshift.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef _GRID_FFT_H_
#define _GRID_FFT_H_
#ifdef HAVE_FFTW
#ifdef USE_MKL
#include <fftw/fftw3.h>
#else
#include <fftw3.h>
#endif
#endif
namespace Grid {
template<class scalar> struct FFTW { };
#ifdef HAVE_FFTW
template<> struct FFTW<ComplexD> {
public:
typedef fftw_complex FFTW_scalar;
typedef fftw_plan FFTW_plan;
static FFTW_plan fftw_plan_many_dft(int rank, const int *n,int howmany,
FFTW_scalar *in, const int *inembed,
int istride, int idist,
FFTW_scalar *out, const int *onembed,
int ostride, int odist,
int sign, unsigned flags) {
return ::fftw_plan_many_dft(rank,n,howmany,in,inembed,istride,idist,out,onembed,ostride,odist,sign,flags);
}
static void fftw_flops(const FFTW_plan p,double *add, double *mul, double *fmas){
::fftw_flops(p,add,mul,fmas);
}
inline static void fftw_execute_dft(const FFTW_plan p,FFTW_scalar *in,FFTW_scalar *out) {
::fftw_execute_dft(p,in,out);
}
inline static void fftw_destroy_plan(const FFTW_plan p) {
::fftw_destroy_plan(p);
}
};
template<> struct FFTW<ComplexF> {
public:
typedef fftwf_complex FFTW_scalar;
typedef fftwf_plan FFTW_plan;
static FFTW_plan fftw_plan_many_dft(int rank, const int *n,int howmany,
FFTW_scalar *in, const int *inembed,
int istride, int idist,
FFTW_scalar *out, const int *onembed,
int ostride, int odist,
int sign, unsigned flags) {
return ::fftwf_plan_many_dft(rank,n,howmany,in,inembed,istride,idist,out,onembed,ostride,odist,sign,flags);
}
static void fftw_flops(const FFTW_plan p,double *add, double *mul, double *fmas){
::fftwf_flops(p,add,mul,fmas);
}
inline static void fftw_execute_dft(const FFTW_plan p,FFTW_scalar *in,FFTW_scalar *out) {
::fftwf_execute_dft(p,in,out);
}
inline static void fftw_destroy_plan(const FFTW_plan p) {
::fftwf_destroy_plan(p);
}
};
#endif
#ifndef FFTW_FORWARD
#define FFTW_FORWARD (-1)
#define FFTW_BACKWARD (+1)
#endif
class FFT {
private:
GridCartesian *vgrid;
GridCartesian *sgrid;
int Nd;
double flops;
double flops_call;
uint64_t usec;
std::vector<int> dimensions;
std::vector<int> processors;
std::vector<int> processor_coor;
public:
static const int forward=FFTW_FORWARD;
static const int backward=FFTW_BACKWARD;
double Flops(void) {return flops;}
double MFlops(void) {return flops/usec;}
double USec(void) {return (double)usec;}
FFT ( GridCartesian * grid ) :
vgrid(grid),
Nd(grid->_ndimension),
dimensions(grid->_fdimensions),
processors(grid->_processors),
processor_coor(grid->_processor_coor)
{
flops=0;
usec =0;
std::vector<int> layout(Nd,1);
sgrid = new GridCartesian(dimensions,layout,processors);
};
~FFT ( void) {
delete sgrid;
}
template<class vobj>
void FFT_dim_mask(Lattice<vobj> &result,const Lattice<vobj> &source,std::vector<int> mask,int sign){
conformable(result._grid,vgrid);
conformable(source._grid,vgrid);
Lattice<vobj> tmp(vgrid);
tmp = source;
for(int d=0;d<Nd;d++){
if( mask[d] ) {
FFT_dim(result,tmp,d,sign);
tmp=result;
}
}
}
template<class vobj>
void FFT_all_dim(Lattice<vobj> &result,const Lattice<vobj> &source,int sign){
std::vector<int> mask(Nd,1);
FFT_dim_mask(result,source,mask,sign);
}
template<class vobj>
void FFT_dim(Lattice<vobj> &result,const Lattice<vobj> &source,int dim, int sign){
#ifndef HAVE_FFTW
assert(0);
#else
conformable(result._grid,vgrid);
conformable(source._grid,vgrid);
int L = vgrid->_ldimensions[dim];
int G = vgrid->_fdimensions[dim];
std::vector<int> layout(Nd,1);
std::vector<int> pencil_gd(vgrid->_fdimensions);
pencil_gd[dim] = G*processors[dim];
// Pencil global vol LxLxGxLxL per node
GridCartesian pencil_g(pencil_gd,layout,processors);
// Construct pencils
typedef typename vobj::scalar_object sobj;
typedef typename sobj::scalar_type scalar;
Lattice<sobj> pgbuf(&pencil_g);
typedef typename FFTW<scalar>::FFTW_scalar FFTW_scalar;
typedef typename FFTW<scalar>::FFTW_plan FFTW_plan;
int Ncomp = sizeof(sobj)/sizeof(scalar);
int Nlow = 1;
for(int d=0;d<dim;d++){
Nlow*=vgrid->_ldimensions[d];
}
int rank = 1; /* 1d transforms */
int n[] = {G}; /* 1d transforms of length G */
int howmany = Ncomp;
int odist,idist,istride,ostride;
idist = odist = 1; /* Distance between consecutive FT's */
istride = ostride = Ncomp*Nlow; /* distance between two elements in the same FT */
int *inembed = n, *onembed = n;
scalar div;
if ( sign == backward ) div = 1.0/G;
else if ( sign == forward ) div = 1.0;
else assert(0);
FFTW_plan p;
{
FFTW_scalar *in = (FFTW_scalar *)&pgbuf._odata[0];
FFTW_scalar *out= (FFTW_scalar *)&pgbuf._odata[0];
p = FFTW<scalar>::fftw_plan_many_dft(rank,n,howmany,
in,inembed,
istride,idist,
out,onembed,
ostride, odist,
sign,FFTW_ESTIMATE);
}
// Barrel shift and collect global pencil
std::vector<int> lcoor(Nd), gcoor(Nd);
result = source;
int pc = processor_coor[dim];
for(int p=0;p<processors[dim];p++) {
PARALLEL_REGION
{
std::vector<int> cbuf(Nd);
sobj s;
PARALLEL_FOR_LOOP_INTERN
for(int idx=0;idx<sgrid->lSites();idx++) {
sgrid->LocalIndexToLocalCoor(idx,cbuf);
peekLocalSite(s,result,cbuf);
cbuf[dim]+=((pc+p) % processors[dim])*L;
// cbuf[dim]+=p*L;
pokeLocalSite(s,pgbuf,cbuf);
}
}
if (p != processors[dim] - 1)
{
result = Cshift(result,dim,L);
}
}
// Loop over orthog coords
int NN=pencil_g.lSites();
GridStopWatch timer;
timer.Start();
PARALLEL_REGION
{
std::vector<int> cbuf(Nd);
PARALLEL_FOR_LOOP_INTERN
for(int idx=0;idx<NN;idx++) {
pencil_g.LocalIndexToLocalCoor(idx, cbuf);
if ( cbuf[dim] == 0 ) { // restricts loop to plane at lcoor[dim]==0
FFTW_scalar *in = (FFTW_scalar *)&pgbuf._odata[idx];
FFTW_scalar *out= (FFTW_scalar *)&pgbuf._odata[idx];
FFTW<scalar>::fftw_execute_dft(p,in,out);
}
}
}
timer.Stop();
// performance counting
double add,mul,fma;
FFTW<scalar>::fftw_flops(p,&add,&mul,&fma);
flops_call = add+mul+2.0*fma;
usec += timer.useconds();
flops+= flops_call*NN;
// writing out result
PARALLEL_REGION
{
std::vector<int> clbuf(Nd), cgbuf(Nd);
sobj s;
PARALLEL_FOR_LOOP_INTERN
for(int idx=0;idx<sgrid->lSites();idx++) {
sgrid->LocalIndexToLocalCoor(idx,clbuf);
cgbuf = clbuf;
cgbuf[dim] = clbuf[dim]+L*pc;
peekLocalSite(s,pgbuf,cgbuf);
pokeLocalSite(s,result,clbuf);
}
}
result = result*div;
// destroying plan
FFTW<scalar>::fftw_destroy_plan(p);
#endif
}
};
}
#endif
+481
View File
@@ -0,0 +1,481 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/LinearOperator.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_ALGORITHM_LINEAR_OP_H
#define GRID_ALGORITHM_LINEAR_OP_H
namespace Grid {
/////////////////////////////////////////////////////////////////////////////////////////////
// LinearOperators Take a something and return a something.
/////////////////////////////////////////////////////////////////////////////////////////////
//
// Hopefully linearity is satisfied and the AdjOp is indeed the Hermitian conjugateugate (transpose if real):
//SBase
// i) F(a x + b y) = aF(x) + b F(y).
// ii) <x|Op|y> = <y|AdjOp|x>^\ast
//
// Would be fun to have a test linearity & Herm Conj function!
/////////////////////////////////////////////////////////////////////////////////////////////
template<class Field> class LinearOperatorBase {
public:
// Support for coarsening to a multigrid
virtual void OpDiag (const Field &in, Field &out) = 0; // Abstract base
virtual void OpDir (const Field &in, Field &out,int dir,int disp) = 0; // Abstract base
virtual void Op (const Field &in, Field &out) = 0; // Abstract base
virtual void AdjOp (const Field &in, Field &out) = 0; // Abstract base
virtual void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2) = 0;
virtual void HermOp(const Field &in, Field &out)=0;
};
/////////////////////////////////////////////////////////////////////////////////////////////
// By sharing the class for Sparse Matrix across multiple operator wrappers, we can share code
// between RB and non-RB variants. Sparse matrix is like the fermion action def, and then
// the wrappers implement the specialisation of "Op" and "AdjOp" to the cases minimising
// replication of code.
//
// I'm not entirely happy with implementation; to share the Schur code between herm and non-herm
// while still having a "OpAndNorm" in the abstract base I had to implement it in both cases
// with an assert trap in the non-herm. This isn't right; there must be a better C++ way to
// do it, but I fear it required multiple inheritance and mixed in abstract base classes
/////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////
// Construct herm op from non-herm matrix
////////////////////////////////////////////////////////////////////
template<class Matrix,class Field>
class MdagMLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
public:
MdagMLinearOperator(Matrix &Mat): _Mat(Mat){};
// Support for coarsening to a multigrid
void OpDiag (const Field &in, Field &out) {
_Mat.Mdiag(in,out);
}
void OpDir (const Field &in, Field &out,int dir,int disp) {
_Mat.Mdir(in,out,dir,disp);
}
void Op (const Field &in, Field &out){
_Mat.M(in,out);
}
void AdjOp (const Field &in, Field &out){
_Mat.Mdag(in,out);
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){
_Mat.MdagM(in,out,n1,n2);
}
void HermOp(const Field &in, Field &out){
RealD n1,n2;
HermOpAndNorm(in,out,n1,n2);
}
};
////////////////////////////////////////////////////////////////////
// Construct herm op and shift it for mgrid smoother
////////////////////////////////////////////////////////////////////
template<class Matrix,class Field>
class ShiftedMdagMLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
RealD _shift;
public:
ShiftedMdagMLinearOperator(Matrix &Mat,RealD shift): _Mat(Mat), _shift(shift){};
// Support for coarsening to a multigrid
void OpDiag (const Field &in, Field &out) {
_Mat.Mdiag(in,out);
assert(0);
}
void OpDir (const Field &in, Field &out,int dir,int disp) {
_Mat.Mdir(in,out,dir,disp);
assert(0);
}
void Op (const Field &in, Field &out){
_Mat.M(in,out);
assert(0);
}
void AdjOp (const Field &in, Field &out){
_Mat.Mdag(in,out);
assert(0);
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){
_Mat.MdagM(in,out,n1,n2);
out = out + _shift*in;
ComplexD dot;
dot= innerProduct(in,out);
n1=real(dot);
n2=norm2(out);
}
void HermOp(const Field &in, Field &out){
RealD n1,n2;
HermOpAndNorm(in,out,n1,n2);
}
};
////////////////////////////////////////////////////////////////////
// Wrap an already herm matrix
////////////////////////////////////////////////////////////////////
template<class Matrix,class Field>
class HermitianLinearOperator : public LinearOperatorBase<Field> {
Matrix &_Mat;
public:
HermitianLinearOperator(Matrix &Mat): _Mat(Mat){};
// Support for coarsening to a multigrid
void OpDiag (const Field &in, Field &out) {
_Mat.Mdiag(in,out);
}
void OpDir (const Field &in, Field &out,int dir,int disp) {
_Mat.Mdir(in,out,dir,disp);
}
void Op (const Field &in, Field &out){
_Mat.M(in,out);
}
void AdjOp (const Field &in, Field &out){
_Mat.M(in,out);
}
void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){
_Mat.M(in,out);
ComplexD dot= innerProduct(in,out); n1=real(dot);
n2=norm2(out);
}
void HermOp(const Field &in, Field &out){
_Mat.M(in,out);
}
};
//////////////////////////////////////////////////////////
// Even Odd Schur decomp operators; there are several
// ways to introduce the even odd checkerboarding
//////////////////////////////////////////////////////////
template<class Field>
class SchurOperatorBase : public LinearOperatorBase<Field> {
public:
virtual RealD Mpc (const Field &in, Field &out) =0;
virtual RealD MpcDag (const Field &in, Field &out) =0;
virtual void MpcDagMpc(const Field &in, Field &out,RealD &ni,RealD &no) {
Field tmp(in._grid);
tmp.checkerboard = in.checkerboard;
ni=Mpc(in,tmp);
no=MpcDag(tmp,out);
}
virtual void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){
out.checkerboard = in.checkerboard;
MpcDagMpc(in,out,n1,n2);
}
virtual void HermOp(const Field &in, Field &out){
RealD n1,n2;
HermOpAndNorm(in,out,n1,n2);
}
void Op (const Field &in, Field &out){
Mpc(in,out);
}
void AdjOp (const Field &in, Field &out){
MpcDag(in,out);
}
// Support for coarsening to a multigrid
void OpDiag (const Field &in, Field &out) {
assert(0); // must coarsen the unpreconditioned system
}
void OpDir (const Field &in, Field &out,int dir,int disp) {
assert(0);
}
};
template<class Matrix,class Field>
class SchurDiagMooeeOperator : public SchurOperatorBase<Field> {
protected:
Matrix &_Mat;
public:
SchurDiagMooeeOperator (Matrix &Mat): _Mat(Mat){};
virtual RealD Mpc (const Field &in, Field &out) {
Field tmp(in._grid);
tmp.checkerboard = !in.checkerboard;
//std::cout <<"grid pointers: in._grid="<< in._grid << " out._grid=" << out._grid << " _Mat.Grid=" << _Mat.Grid() << " _Mat.RedBlackGrid=" << _Mat.RedBlackGrid() << std::endl;
_Mat.Meooe(in,tmp);
_Mat.MooeeInv(tmp,out);
_Mat.Meooe(out,tmp);
//std::cout << "cb in " << in.checkerboard << " cb out " << out.checkerboard << std::endl;
_Mat.Mooee(in,out);
return axpy_norm(out,-1.0,tmp,out);
}
virtual RealD MpcDag (const Field &in, Field &out){
Field tmp(in._grid);
_Mat.MeooeDag(in,tmp);
_Mat.MooeeInvDag(tmp,out);
_Mat.MeooeDag(out,tmp);
_Mat.MooeeDag(in,out);
return axpy_norm(out,-1.0,tmp,out);
}
};
template<class Matrix,class Field>
class SchurDiagOneOperator : public SchurOperatorBase<Field> {
protected:
Matrix &_Mat;
public:
SchurDiagOneOperator (Matrix &Mat): _Mat(Mat){};
virtual RealD Mpc (const Field &in, Field &out) {
Field tmp(in._grid);
_Mat.Meooe(in,out);
_Mat.MooeeInv(out,tmp);
_Mat.Meooe(tmp,out);
_Mat.MooeeInv(out,tmp);
return axpy_norm(out,-1.0,tmp,in);
}
virtual RealD MpcDag (const Field &in, Field &out){
Field tmp(in._grid);
_Mat.MooeeInvDag(in,out);
_Mat.MeooeDag(out,tmp);
_Mat.MooeeInvDag(tmp,out);
_Mat.MeooeDag(out,tmp);
return axpy_norm(out,-1.0,tmp,in);
}
};
template<class Matrix,class Field>
class SchurDiagTwoOperator : public SchurOperatorBase<Field> {
protected:
Matrix &_Mat;
public:
SchurDiagTwoOperator (Matrix &Mat): _Mat(Mat){};
virtual RealD Mpc (const Field &in, Field &out) {
Field tmp(in._grid);
_Mat.MooeeInv(in,out);
_Mat.Meooe(out,tmp);
_Mat.MooeeInv(tmp,out);
_Mat.Meooe(out,tmp);
return axpy_norm(out,-1.0,tmp,in);
}
virtual RealD MpcDag (const Field &in, Field &out){
Field tmp(in._grid);
_Mat.MeooeDag(in,out);
_Mat.MooeeInvDag(out,tmp);
_Mat.MeooeDag(tmp,out);
_Mat.MooeeInvDag(out,tmp);
return axpy_norm(out,-1.0,tmp,in);
}
};
///////////////////////////////////////////////////////////////////////////////////////////////////
// Left handed Moo^-1 ; (Moo - Moe Mee^-1 Meo) psi = eta --> ( 1 - Moo^-1 Moe Mee^-1 Meo ) psi = Moo^-1 eta
// Right handed Moo^-1 ; (Moo - Moe Mee^-1 Meo) Moo^-1 Moo psi = eta --> ( 1 - Moe Mee^-1 Meo ) Moo^-1 phi=eta ; psi = Moo^-1 phi
///////////////////////////////////////////////////////////////////////////////////////////////////
template<class Matrix,class Field> using SchurDiagOneRH = SchurDiagTwoOperator<Matrix,Field> ;
template<class Matrix,class Field> using SchurDiagOneLH = SchurDiagOneOperator<Matrix,Field> ;
///////////////////////////////////////////////////////////////////////////////////////////////////
// Staggered use
///////////////////////////////////////////////////////////////////////////////////////////////////
template<class Matrix,class Field>
class SchurStaggeredOperator : public SchurOperatorBase<Field> {
protected:
Matrix &_Mat;
Field tmp;
RealD mass;
double tMpc;
double tIP;
double tMeo;
double taxpby_norm;
uint64_t ncall;
public:
void Report(void)
{
std::cout << GridLogMessage << " HermOpAndNorm.Mpc "<< tMpc/ncall<<" usec "<<std::endl;
std::cout << GridLogMessage << " HermOpAndNorm.IP "<< tIP /ncall<<" usec "<<std::endl;
std::cout << GridLogMessage << " Mpc.MeoMoe "<< tMeo/ncall<<" usec "<<std::endl;
std::cout << GridLogMessage << " Mpc.axpby_norm "<< taxpby_norm/ncall<<" usec "<<std::endl;
}
SchurStaggeredOperator (Matrix &Mat): _Mat(Mat), tmp(_Mat.RedBlackGrid())
{
assert( _Mat.isTrivialEE() );
mass = _Mat.Mass();
tMpc=0;
tIP =0;
tMeo=0;
taxpby_norm=0;
ncall=0;
}
virtual void HermOpAndNorm(const Field &in, Field &out,RealD &n1,RealD &n2){
ncall++;
tMpc-=usecond();
n2 = Mpc(in,out);
tMpc+=usecond();
tIP-=usecond();
ComplexD dot= innerProduct(in,out);
tIP+=usecond();
n1 = real(dot);
}
virtual void HermOp(const Field &in, Field &out){
ncall++;
tMpc-=usecond();
_Mat.Meooe(in,out);
_Mat.Meooe(out,tmp);
tMpc+=usecond();
taxpby_norm-=usecond();
axpby(out,-1.0,mass*mass,tmp,in);
taxpby_norm+=usecond();
}
virtual RealD Mpc (const Field &in, Field &out) {
tMeo-=usecond();
_Mat.Meooe(in,out);
_Mat.Meooe(out,tmp);
tMeo+=usecond();
taxpby_norm-=usecond();
RealD nn=axpby_norm(out,-1.0,mass*mass,tmp,in);
taxpby_norm+=usecond();
return nn;
}
virtual RealD MpcDag (const Field &in, Field &out){
return Mpc(in,out);
}
virtual void MpcDagMpc(const Field &in, Field &out,RealD &ni,RealD &no) {
assert(0);// Never need with staggered
}
};
template<class Matrix,class Field> using SchurStagOperator = SchurStaggeredOperator<Matrix,Field>;
/////////////////////////////////////////////////////////////
// Base classes for functions of operators
/////////////////////////////////////////////////////////////
template<class Field> class OperatorFunction {
public:
virtual void operator() (LinearOperatorBase<Field> &Linop, const Field &in, Field &out) = 0;
};
template<class Field> class LinearFunction {
public:
virtual void operator() (const Field &in, Field &out) = 0;
};
template<class Field> class IdentityLinearFunction : public LinearFunction<Field> {
public:
void operator() (const Field &in, Field &out){
out = in;
};
};
/////////////////////////////////////////////////////////////
// Base classes for Multishift solvers for operators
/////////////////////////////////////////////////////////////
template<class Field> class OperatorMultiFunction {
public:
virtual void operator() (LinearOperatorBase<Field> &Linop, const Field &in, std::vector<Field> &out) = 0;
};
// FIXME : To think about
// Chroma functionality list defining LinearOperator
/*
virtual void operator() (T& chi, const T& psi, enum PlusMinus isign) const = 0;
virtual void operator() (T& chi, const T& psi, enum PlusMinus isign, Real epsilon) const
virtual const Subset& subset() const = 0;
virtual unsigned long nFlops() const { return 0; }
virtual void deriv(P& ds_u, const T& chi, const T& psi, enum PlusMinus isign) const
class UnprecLinearOperator : public DiffLinearOperator<T,P,Q>
const Subset& subset() const {return all;}
};
*/
////////////////////////////////////////////////////////////////////////////////////////////
// Hermitian operator Linear function and operator function
////////////////////////////////////////////////////////////////////////////////////////////
template<class Field>
class HermOpOperatorFunction : public OperatorFunction<Field> {
void operator() (LinearOperatorBase<Field> &Linop, const Field &in, Field &out) {
Linop.HermOp(in,out);
};
};
template<typename Field>
class PlainHermOp : public LinearFunction<Field> {
public:
LinearOperatorBase<Field> &_Linop;
PlainHermOp(LinearOperatorBase<Field>& linop) : _Linop(linop)
{}
void operator()(const Field& in, Field& out) {
_Linop.HermOp(in,out);
}
};
template<typename Field>
class FunctionHermOp : public LinearFunction<Field> {
public:
OperatorFunction<Field> & _poly;
LinearOperatorBase<Field> &_Linop;
FunctionHermOp(OperatorFunction<Field> & poly,LinearOperatorBase<Field>& linop)
: _poly(poly), _Linop(linop) {};
void operator()(const Field& in, Field& out) {
_poly(_Linop,in,out);
}
};
template<class Field>
class Polynomial : public OperatorFunction<Field> {
private:
std::vector<RealD> Coeffs;
public:
Polynomial(std::vector<RealD> &_Coeffs) : Coeffs(_Coeffs) { };
// Implement the required interface
void operator() (LinearOperatorBase<Field> &Linop, const Field &in, Field &out) {
Field AtoN(in._grid);
Field Mtmp(in._grid);
AtoN = in;
out = AtoN*Coeffs[0];
for(int n=1;n<Coeffs.size();n++){
Mtmp = AtoN;
Linop.HermOp(Mtmp,AtoN);
out=out+AtoN*Coeffs[n];
}
};
};
}
#endif
+46
View File
@@ -0,0 +1,46 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/Preconditioner.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_PRECONDITIONER_H
#define GRID_PRECONDITIONER_H
namespace Grid {
template<class Field> class Preconditioner : public LinearFunction<Field> {
virtual void operator()(const Field &src, Field & psi)=0;
};
template<class Field> class TrivialPrecon : public Preconditioner<Field> {
public:
void operator()(const Field &src, Field & psi){
psi = src;
}
TrivialPrecon(void){};
};
}
#endif
+71
View File
@@ -0,0 +1,71 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/SparseMatrix.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_ALGORITHM_SPARSE_MATRIX_H
#define GRID_ALGORITHM_SPARSE_MATRIX_H
namespace Grid {
/////////////////////////////////////////////////////////////////////////////////////////////
// Interface defining what I expect of a general sparse matrix, such as a Fermion action
/////////////////////////////////////////////////////////////////////////////////////////////
template<class Field> class SparseMatrixBase {
public:
virtual GridBase *Grid(void) =0;
// Full checkerboar operations
virtual RealD M (const Field &in, Field &out)=0;
virtual RealD Mdag (const Field &in, Field &out)=0;
virtual void MdagM(const Field &in, Field &out,RealD &ni,RealD &no) {
Field tmp (in._grid);
ni=M(in,tmp);
no=Mdag(tmp,out);
}
virtual void Mdiag (const Field &in, Field &out)=0;
virtual void Mdir (const Field &in, Field &out,int dir, int disp)=0;
};
/////////////////////////////////////////////////////////////////////////////////////////////
// Interface augmented by a red black sparse matrix, such as a Fermion action
/////////////////////////////////////////////////////////////////////////////////////////////
template<class Field> class CheckerBoardedSparseMatrixBase : public SparseMatrixBase<Field> {
public:
virtual GridBase *RedBlackGrid(void)=0;
// half checkerboard operaions
virtual void Meooe (const Field &in, Field &out)=0;
virtual void Mooee (const Field &in, Field &out)=0;
virtual void MooeeInv (const Field &in, Field &out)=0;
virtual void MeooeDag (const Field &in, Field &out)=0;
virtual void MooeeDag (const Field &in, Field &out)=0;
virtual void MooeeInvDag (const Field &in, Field &out)=0;
};
}
#endif
+377
View File
@@ -0,0 +1,377 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/approx/Chebyshev.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: paboyle <paboyle@ph.ed.ac.uk>
Author: Christoph Lehner <clehner@bnl.gov>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_CHEBYSHEV_H
#define GRID_CHEBYSHEV_H
#include <Grid/algorithms/LinearOperator.h>
namespace Grid {
struct ChebyParams : Serializable {
GRID_SERIALIZABLE_CLASS_MEMBERS(ChebyParams,
RealD, alpha,
RealD, beta,
int, Npoly);
};
////////////////////////////////////////////////////////////////////////////////////////////
// Generic Chebyshev approximations
////////////////////////////////////////////////////////////////////////////////////////////
template<class Field>
class Chebyshev : public OperatorFunction<Field> {
private:
std::vector<RealD> Coeffs;
int order;
RealD hi;
RealD lo;
public:
void csv(std::ostream &out){
RealD diff = hi-lo;
RealD delta = (hi-lo)*1.0e-9;
for (RealD x=lo; x<hi; x+=delta) {
delta*=1.1;
RealD f = approx(x);
out<< x<<" "<<f<<std::endl;
}
return;
}
// Convenience for plotting the approximation
void PlotApprox(std::ostream &out) {
out<<"Polynomial approx ["<<lo<<","<<hi<<"]"<<std::endl;
for(RealD x=lo;x<hi;x+=(hi-lo)/50.0){
out <<x<<"\t"<<approx(x)<<std::endl;
}
};
Chebyshev(){};
Chebyshev(ChebyParams p){ Init(p.alpha,p.beta,p.Npoly);};
Chebyshev(RealD _lo,RealD _hi,int _order, RealD (* func)(RealD) ) {Init(_lo,_hi,_order,func);};
Chebyshev(RealD _lo,RealD _hi,int _order) {Init(_lo,_hi,_order);};
////////////////////////////////////////////////////////////////////////////////////////////////////
// c.f. numerical recipes "chebft"/"chebev". This is sec 5.8 "Chebyshev approximation".
////////////////////////////////////////////////////////////////////////////////////////////////////
// CJ: the one we need for Lanczos
void Init(RealD _lo,RealD _hi,int _order)
{
lo=_lo;
hi=_hi;
order=_order;
if(order < 2) exit(-1);
Coeffs.resize(order);
Coeffs.assign(0.,order);
Coeffs[order-1] = 1.;
};
void Init(RealD _lo,RealD _hi,int _order, RealD (* func)(RealD))
{
lo=_lo;
hi=_hi;
order=_order;
if(order < 2) exit(-1);
Coeffs.resize(order);
for(int j=0;j<order;j++){
RealD s=0;
for(int k=0;k<order;k++){
RealD y=std::cos(M_PI*(k+0.5)/order);
RealD x=0.5*(y*(hi-lo)+(hi+lo));
RealD f=func(x);
s=s+f*std::cos( j*M_PI*(k+0.5)/order );
}
Coeffs[j] = s * 2.0/order;
}
};
void JacksonSmooth(void){
RealD M=order;
RealD alpha = M_PI/(M+2);
RealD lmax = std::cos(alpha);
RealD sumUsq =0;
std::vector<RealD> U(M);
std::vector<RealD> a(M);
std::vector<RealD> g(M);
for(int n=0;n<=M;n++){
U[n] = std::sin((n+1)*std::acos(lmax))/std::sin(std::acos(lmax));
sumUsq += U[n]*U[n];
}
sumUsq = std::sqrt(sumUsq);
for(int i=1;i<=M;i++){
a[i] = U[i]/sumUsq;
}
g[0] = 1.0;
for(int m=1;m<=M;m++){
g[m] = 0;
for(int i=0;i<=M-m;i++){
g[m]+= a[i]*a[m+i];
}
}
for(int m=1;m<=M;m++){
Coeffs[m]*=g[m];
}
}
RealD approx(RealD x) // Convenience for plotting the approximation
{
RealD Tn;
RealD Tnm;
RealD Tnp;
RealD y=( x-0.5*(hi+lo))/(0.5*(hi-lo));
RealD T0=1;
RealD T1=y;
RealD sum;
sum = 0.5*Coeffs[0]*T0;
sum+= Coeffs[1]*T1;
Tn =T1;
Tnm=T0;
for(int i=2;i<order;i++){
Tnp=2*y*Tn-Tnm;
Tnm=Tn;
Tn =Tnp;
sum+= Tn*Coeffs[i];
}
return sum;
};
RealD approxD(RealD x)
{
RealD Un;
RealD Unm;
RealD Unp;
RealD y=( x-0.5*(hi+lo))/(0.5*(hi-lo));
RealD U0=1;
RealD U1=2*y;
RealD sum;
sum = Coeffs[1]*U0;
sum+= Coeffs[2]*U1*2.0;
Un =U1;
Unm=U0;
for(int i=2;i<order-1;i++){
Unp=2*y*Un-Unm;
Unm=Un;
Un =Unp;
sum+= Un*Coeffs[i+1]*(i+1.0);
}
return sum/(0.5*(hi-lo));
};
RealD approxInv(RealD z, RealD x0, int maxiter, RealD resid) {
RealD x = x0;
RealD eps;
int i;
for (i=0;i<maxiter;i++) {
eps = approx(x) - z;
if (fabs(eps / z) < resid)
return x;
x = x - eps / approxD(x);
}
return std::numeric_limits<double>::quiet_NaN();
}
// Implement the required interface
void operator() (LinearOperatorBase<Field> &Linop, const Field &in, Field &out) {
GridBase *grid=in._grid;
// std::cout << "Chevyshef(): in._grid="<<in._grid<<std::endl;
//std::cout <<" Linop.Grid()="<<Linop.Grid()<<"Linop.RedBlackGrid()="<<Linop.RedBlackGrid()<<std::endl;
int vol=grid->gSites();
Field T0(grid); T0 = in;
Field T1(grid);
Field T2(grid);
Field y(grid);
Field *Tnm = &T0;
Field *Tn = &T1;
Field *Tnp = &T2;
// Tn=T1 = (xscale M + mscale)in
RealD xscale = 2.0/(hi-lo);
RealD mscale = -(hi+lo)/(hi-lo);
Linop.HermOp(T0,y);
T1=y*xscale+in*mscale;
// sum = .5 c[0] T0 + c[1] T1
out = (0.5*Coeffs[0])*T0 + Coeffs[1]*T1;
for(int n=2;n<order;n++){
Linop.HermOp(*Tn,y);
y=xscale*y+mscale*(*Tn);
*Tnp=2.0*y-(*Tnm);
out=out+Coeffs[n]* (*Tnp);
// Cycle pointers to avoid copies
Field *swizzle = Tnm;
Tnm =Tn;
Tn =Tnp;
Tnp =swizzle;
}
}
};
template<class Field>
class ChebyshevLanczos : public Chebyshev<Field> {
private:
std::vector<RealD> Coeffs;
int order;
RealD alpha;
RealD beta;
RealD mu;
public:
ChebyshevLanczos(RealD _alpha,RealD _beta,RealD _mu,int _order) :
alpha(_alpha),
beta(_beta),
mu(_mu)
{
order=_order;
Coeffs.resize(order);
for(int i=0;i<_order;i++){
Coeffs[i] = 0.0;
}
Coeffs[order-1]=1.0;
};
void csv(std::ostream &out){
for (RealD x=-1.2*alpha; x<1.2*alpha; x+=(2.0*alpha)/10000) {
RealD f = approx(x);
out<< x<<" "<<f<<std::endl;
}
return;
}
RealD approx(RealD xx) // Convenience for plotting the approximation
{
RealD Tn;
RealD Tnm;
RealD Tnp;
Real aa = alpha * alpha;
Real bb = beta * beta;
RealD x = ( 2.0 * (xx-mu)*(xx-mu) - (aa+bb) ) / (aa-bb);
RealD y= x;
RealD T0=1;
RealD T1=y;
RealD sum;
sum = 0.5*Coeffs[0]*T0;
sum+= Coeffs[1]*T1;
Tn =T1;
Tnm=T0;
for(int i=2;i<order;i++){
Tnp=2*y*Tn-Tnm;
Tnm=Tn;
Tn =Tnp;
sum+= Tn*Coeffs[i];
}
return sum;
};
// shift_Multiply in Rudy's code
void AminusMuSq(LinearOperatorBase<Field> &Linop, const Field &in, Field &out)
{
GridBase *grid=in._grid;
Field tmp(grid);
RealD aa= alpha*alpha;
RealD bb= beta * beta;
Linop.HermOp(in,out);
out = out - mu*in;
Linop.HermOp(out,tmp);
tmp = tmp - mu * out;
out = (2.0/ (aa-bb) ) * tmp - ((aa+bb)/(aa-bb))*in;
};
// Implement the required interface
void operator() (LinearOperatorBase<Field> &Linop, const Field &in, Field &out) {
GridBase *grid=in._grid;
int vol=grid->gSites();
Field T0(grid); T0 = in;
Field T1(grid);
Field T2(grid);
Field y(grid);
Field *Tnm = &T0;
Field *Tn = &T1;
Field *Tnp = &T2;
// Tn=T1 = (xscale M )*in
AminusMuSq(Linop,T0,T1);
// sum = .5 c[0] T0 + c[1] T1
out = (0.5*Coeffs[0])*T0 + Coeffs[1]*T1;
for(int n=2;n<order;n++){
AminusMuSq(Linop,*Tn,y);
*Tnp=2.0*y-(*Tnm);
out=out+Coeffs[n]* (*Tnp);
// Cycle pointers to avoid copies
Field *swizzle = Tnm;
Tnm =Tn;
Tn =Tnp;
Tnp =swizzle;
}
}
};
}
#endif
+152
View File
@@ -0,0 +1,152 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/approx/Forecast.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: paboyle <paboyle@ph.ed.ac.uk>
Author: David Murphy <dmurphy@phys.columbia.edu>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef INCLUDED_FORECAST_H
#define INCLUDED_FORECAST_H
namespace Grid {
// Abstract base class.
// Takes a matrix (Mat), a source (phi), and a vector of Fields (chi)
// and returns a forecasted solution to the system D*psi = phi (psi).
template<class Matrix, class Field>
class Forecast
{
public:
virtual Field operator()(Matrix &Mat, const Field& phi, const std::vector<Field>& chi) = 0;
};
// Implementation of Brower et al.'s chronological inverter (arXiv:hep-lat/9509012),
// used to forecast solutions across poles of the EOFA heatbath.
//
// Modified from CPS (cps_pp/src/util/dirac_op/d_op_base/comsrc/minresext.C)
template<class Matrix, class Field>
class ChronoForecast : public Forecast<Matrix,Field>
{
public:
Field operator()(Matrix &Mat, const Field& phi, const std::vector<Field>& prev_solns)
{
int degree = prev_solns.size();
Field chi(phi); // forecasted solution
// Trivial cases
if(degree == 0){ chi = zero; return chi; }
else if(degree == 1){ return prev_solns[0]; }
RealD dot;
ComplexD xp;
Field r(phi); // residual
Field Mv(phi);
std::vector<Field> v(prev_solns); // orthonormalized previous solutions
std::vector<Field> MdagMv(degree,phi);
// Array to hold the matrix elements
std::vector<std::vector<ComplexD>> G(degree, std::vector<ComplexD>(degree));
// Solution and source vectors
std::vector<ComplexD> a(degree);
std::vector<ComplexD> b(degree);
// Orthonormalize the vector basis
for(int i=0; i<degree; i++){
v[i] *= 1.0/std::sqrt(norm2(v[i]));
for(int j=i+1; j<degree; j++){ v[j] -= innerProduct(v[i],v[j]) * v[i]; }
}
// Perform sparse matrix multiplication and construct rhs
for(int i=0; i<degree; i++){
b[i] = innerProduct(v[i],phi);
Mat.M(v[i],Mv);
Mat.Mdag(Mv,MdagMv[i]);
G[i][i] = innerProduct(v[i],MdagMv[i]);
}
// Construct the matrix
for(int j=0; j<degree; j++){
for(int k=j+1; k<degree; k++){
G[j][k] = innerProduct(v[j],MdagMv[k]);
G[k][j] = std::conj(G[j][k]);
}}
// Gauss-Jordan elimination with partial pivoting
for(int i=0; i<degree; i++){
// Perform partial pivoting
int k = i;
for(int j=i+1; j<degree; j++){ if(std::abs(G[j][j]) > std::abs(G[k][k])){ k = j; } }
if(k != i){
xp = b[k];
b[k] = b[i];
b[i] = xp;
for(int j=0; j<degree; j++){
xp = G[k][j];
G[k][j] = G[i][j];
G[i][j] = xp;
}
}
// Convert matrix to upper triangular form
for(int j=i+1; j<degree; j++){
xp = G[j][i]/G[i][i];
b[j] -= xp * b[i];
for(int k=0; k<degree; k++){ G[j][k] -= xp*G[i][k]; }
}
}
// Use Gaussian elimination to solve equations and calculate initial guess
chi = zero;
r = phi;
for(int i=degree-1; i>=0; i--){
a[i] = 0.0;
for(int j=i+1; j<degree; j++){ a[i] += G[i][j] * a[j]; }
a[i] = (b[i]-a[i])/G[i][i];
chi += a[i]*v[i];
r -= a[i]*MdagMv[i];
}
RealD true_r(0.0);
ComplexD tmp;
for(int i=0; i<degree; i++){
tmp = -b[i];
for(int j=0; j<degree; j++){ tmp += G[i][j]*a[j]; }
tmp = std::conj(tmp)*tmp;
true_r += std::sqrt(tmp.real());
}
RealD error = std::sqrt(norm2(r)/norm2(phi));
std::cout << GridLogMessage << "ChronoForecast: |res|/|src| = " << error << std::endl;
return chi;
};
};
}
#endif
@@ -1,6 +1,5 @@
MIT License
Copyright (c) 2012-2016 GitHub, Inc. Copyright (c) 2011 Michael Clark
Permission is hereby granted, free of charge, to any person obtaining a copy Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal of this software and associated documentation files (the "Software"), to deal
@@ -9,13 +8,14 @@ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions: furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all The above copyright notice and this permission notice shall be included in
copies or substantial portions of the Software. all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
SOFTWARE. THE SOFTWARE.
@@ -0,0 +1,56 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/approx/MultiShiftFunction.cc
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/GridCore.h>
namespace Grid {
double MultiShiftFunction::approx(double x)
{
double a = norm;
for(int n=0;n<poles.size();n++){
a = a + residues[n]/(x+poles[n]);
}
return a;
}
void MultiShiftFunction::gnuplot(std::ostream &out)
{
out<<"f(x) = "<<norm<<"";
for(int n=0;n<poles.size();n++){
out<<"+("<<residues[n]<<"/(x+"<<poles[n]<<"))";
}
out<<";"<<std::endl;
}
void MultiShiftFunction::csv(std::ostream &out)
{
for (double x=lo; x<hi; x*=1.05) {
double f = approx(x);
double r = sqrt(x);
out<< x<<","<<r<<","<<f<<","<<r-f<<std::endl;
}
return;
}
}
@@ -0,0 +1,67 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/approx/MultiShiftFunction.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef MULTI_SHIFT_FUNCTION
#define MULTI_SHIFT_FUNCTION
namespace Grid {
class MultiShiftFunction {
public:
int order;
std::vector<RealD> poles;
std::vector<RealD> residues;
std::vector<RealD> tolerances;
RealD norm;
RealD lo,hi;
MultiShiftFunction(int n,RealD _lo,RealD _hi): poles(n), residues(n), lo(_lo), hi(_hi) {;};
RealD approx(RealD x);
void csv(std::ostream &out);
void gnuplot(std::ostream &out);
void Init(AlgRemez & remez,double tol,bool inverse)
{
order=remez.getDegree();
tolerances.resize(remez.getDegree(),tol);
poles.resize(remez.getDegree());
residues.resize(remez.getDegree());
remez.getBounds(lo,hi);
if ( inverse ) remez.getIPFE (&residues[0],&poles[0],&norm);
else remez.getPFE (&residues[0],&poles[0],&norm);
}
// Allow deferred initialisation
MultiShiftFunction(void){};
MultiShiftFunction(AlgRemez & remez,double tol,bool inverse)
{
Init(remez,tol,inverse);
}
};
}
#endif
+80
View File
@@ -0,0 +1,80 @@
-----------------------------------------------------------------------------------
PAB. Took Mike Clark's AlgRemez from GitHub and (modified a little) include.
This is open source and license and readme and comments are preserved consistent
with the license. Mike, thankyou!
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
AlgRemez
The archive downloadable here contains an implementation of the Remez
algorithm which calculates optimal rational (and polynomial)
approximations to the nth root over a given spectral range. The Remez
algorithm, although in principle is extremely straightforward to
program, is quite difficult to get completely correct, e.g., the Maple
implementation of the algorithm does not always converge to the
correct answer.
To use this algorithm you need to install GMP, the GNU Multiple
Precision Library, and when configuring the install, you must include
the --enable-mpfr option (see the GMP manual for more details). You
also have to edit the Makefile for AlgRemez appropriately for your
system, namely to point corrrectly to the location of the GMP library.
The simple main program included with this archive invokes the
AlgRemez class to calculate an approximation given by command line
arguments. It is invoked by the following
./test y z n d lambda_low lambda_high precision,
where the function to be approximated is f(x) = x^(y/z), with degree
(n,d) over the spectral range [lambda_low, lambda_high], using
precision digits of precision in the arithmetic. So an example would
be
./test 1 2 5 5 0.0004 64 40
which corresponds to constructing a rational approximation to the
square root function, with degree (5,5) over the range [0.0004,64]
with 40 digits of precision used for the arithmetic. The parameters y
and z must be positive, the approximation to f(x) = x^(-y/z) is simply
the inverse of the approximation to f(x) = x^(y/z). After the
approximation has been constructed, the roots and poles of the
rational function are found, and then the partial fraction expansion
of both the rational function and it's inverse are found, the results
of which are output to a file called "approx.dat". In addition, the
error function of the approximation is output to "error.dat", where it
can be checked that the resultant approximation satisfies Chebychev's
criterion, namely all error maxima are equal in magnitude, and
adjacent maxima are oppostie in sign. There are some caveats here
however, the optimal polynomial approximation has complex roots, and
the root finding implemented here cannot (yet) handle complex roots.
In addition, the partial fraction expansion of rational approximations
is only found for the case n = d, i.e., the degree of numerator
polynomial equals that of the denominator polynomial. The convention
for the partial fraction expansion is that polar shifts are always
written added to x, not subtracted.
To do list
1. Include an exponential dampening factor in the function to be
approximated. This may sound trivial to implement, but for some
parameters, the algorithm seems to breakdown. Also, the roots in the
rational approximation sometimes become complex, which currently
breaks the stupidly simple root finding code.
2. Make the algorithm faster - it's too slow when running on qcdoc.
3. Add complex root finding.
4. Add more options for error minimisation - currently the code
minimises the relative error, should add options for absolute error,
and other norms.
There will be a forthcoming publication concerning the results
generated by this software, but in the meantime, if you use this
software, please cite it as
"M.A. Clark and A.D. Kennedy, https://github.com/mikeaclark/AlgRemez, 2005".
If you have any problems using the software, then please email scientist.mike@gmail.com.
+760
View File
@@ -0,0 +1,760 @@
/*
Mike Clark - 25th May 2005
alg_remez.C
AlgRemez is an implementation of the Remez algorithm, which in this
case is used for generating the optimal nth root rational
approximation.
Note this class requires the gnu multiprecision (GNU MP) library.
*/
#include<math.h>
#include<stdio.h>
#include<stdlib.h>
#include<string>
#include<iostream>
#include<iomanip>
#include<cassert>
#include<Grid/algorithms/approx/Remez.h>
// Constructor
AlgRemez::AlgRemez(double lower, double upper, long precision)
{
prec = precision;
bigfloat::setDefaultPrecision(prec);
apstrt = lower;
apend = upper;
apwidt = apend - apstrt;
std::cout<<"Approximation bounds are ["<<apstrt<<","<<apend<<"]\n";
std::cout<<"Precision of arithmetic is "<<precision<<std::endl;
alloc = 0;
n = 0;
d = 0;
foundRoots = 0;
// Only require the approximation spread to be less than 1 ulp
tolerance = 1e-15;
}
// Destructor
AlgRemez::~AlgRemez()
{
if (alloc) {
delete [] param;
delete [] roots;
delete [] poles;
delete [] xx;
delete [] mm;
delete [] a_power;
delete [] a;
}
}
// Free memory and reallocate as necessary
void AlgRemez::allocate(int num_degree, int den_degree)
{
// Arrays have previously been allocated, deallocate first, then allocate
if (alloc) {
delete [] param;
delete [] roots;
delete [] poles;
delete [] xx;
delete [] mm;
}
// Note use of new and delete in memory allocation - cannot run on qcdsp
param = new bigfloat[num_degree+den_degree+1];
roots = new bigfloat[num_degree];
poles = new bigfloat[den_degree];
xx = new bigfloat[num_degree+den_degree+3];
mm = new bigfloat[num_degree+den_degree+2];
if (!alloc) {
// The coefficients of the sum in the exponential
a = new bigfloat[SUM_MAX];
a_power = new int[SUM_MAX];
}
alloc = 1;
}
// Reset the bounds of the approximation
void AlgRemez::setBounds(double lower, double upper)
{
apstrt = lower;
apend = upper;
apwidt = apend - apstrt;
}
// Generate the rational approximation x^(pnum/pden)
double AlgRemez::generateApprox(int degree, unsigned long pnum,
unsigned long pden)
{
return generateApprox(degree, degree, pnum, pden);
}
double AlgRemez::generateApprox(int num_degree, int den_degree,
unsigned long pnum, unsigned long pden)
{
double *a_param = 0;
int *a_pow = 0;
return generateApprox(num_degree, den_degree, pnum, pden, 0, a_param, a_pow);
}
// Generate the rational approximation x^(pnum/pden)
double AlgRemez::generateApprox(int num_degree, int den_degree,
unsigned long pnum, unsigned long pden,
int a_len, double *a_param, int *a_pow)
{
std::cout<<"Degree of the approximation is ("<<num_degree<<","<<den_degree<<")\n";
std::cout<<"Approximating the function x^("<<pnum<<"/"<<pden<<")\n";
// Reallocate arrays, since degree has changed
if (num_degree != n || den_degree != d) allocate(num_degree,den_degree);
assert(a_len<=SUM_MAX);
step = new bigfloat[num_degree+den_degree+2];
a_length = a_len;
for (int j=0; j<a_len; j++) {
a[j]= a_param[j];
a_power[j] = a_pow[j];
}
power_num = pnum;
power_den = pden;
spread = 1.0e37;
iter = 0;
n = num_degree;
d = den_degree;
neq = n + d + 1;
initialGuess();
stpini(step);
while (spread > tolerance) { //iterate until convergance
if (iter++%100==0)
std::cout<<"Iteration " <<iter-1<<" spread "<<(double)spread<<" delta "<<(double)delta<<std::endl;
equations();
if (delta < tolerance) {
std::cout<<"Delta too small, try increasing precision\n";
assert(0);
};
assert( delta>= tolerance);
search(step);
}
int sign;
double error = (double)getErr(mm[0],&sign);
std::cout<<"Converged at "<<iter<<" iterations; error = "<<error<<std::endl;
// Once the approximation has been generated, calculate the roots
if(!root()) {
std::cout<<"Root finding failed\n";
} else {
foundRoots = 1;
}
delete [] step;
// Return the maximum error in the approximation
return error;
}
// Return the partial fraction expansion of the approximation x^(pnum/pden)
int AlgRemez::getPFE(double *Res, double *Pole, double *Norm) {
if (n!=d) {
std::cout<<"Cannot handle case: Numerator degree neq Denominator degree\n";
return 0;
}
if (!alloc) {
std::cout<<"Approximation not yet generated\n";
return 0;
}
if (!foundRoots) {
std::cout<<"Roots not found, so PFE cannot be taken\n";
return 0;
}
bigfloat *r = new bigfloat[n];
bigfloat *p = new bigfloat[d];
for (int i=0; i<n; i++) r[i] = roots[i];
for (int i=0; i<d; i++) p[i] = poles[i];
// Perform a partial fraction expansion
pfe(r, p, norm);
// Convert to double and return
*Norm = (double)norm;
for (int i=0; i<n; i++) Res[i] = (double)r[i];
for (int i=0; i<d; i++) Pole[i] = (double)p[i];
delete [] r;
delete [] p;
// Where the smallest shift is located
return 0;
}
// Return the partial fraction expansion of the approximation x^(-pnum/pden)
int AlgRemez::getIPFE(double *Res, double *Pole, double *Norm) {
if (n!=d) {
std::cout<<"Cannot handle case: Numerator degree neq Denominator degree\n";
return 0;
}
if (!alloc) {
std::cout<<"Approximation not yet generated\n";
return 0;
}
if (!foundRoots) {
std::cout<<"Roots not found, so PFE cannot be taken\n";
return 0;
}
bigfloat *r = new bigfloat[d];
bigfloat *p = new bigfloat[n];
// Want the inverse function
for (int i=0; i<n; i++) {
r[i] = poles[i];
p[i] = roots[i];
}
// Perform a partial fraction expansion
pfe(r, p, (bigfloat)1l/norm);
// Convert to double and return
*Norm = (double)((bigfloat)1l/(norm));
for (int i=0; i<n; i++) {
Res[i] = (double)r[i];
Pole[i] = (double)p[i];
}
delete [] r;
delete [] p;
// Where the smallest shift is located
return 0;
}
// Initial values of maximal and minimal errors
void AlgRemez::initialGuess() {
// Supply initial guesses for solution points
long ncheb = neq; // Degree of Chebyshev error estimate
bigfloat a, r;
// Find ncheb+1 extrema of Chebyshev polynomial
a = ncheb;
mm[0] = apstrt;
for (long i = 1; i < ncheb; i++) {
r = 0.5 * (1 - cos((M_PI * i)/(double) a));
//r *= sqrt_bf(r);
r = (exp((double)r)-1.0)/(exp(1.0)-1.0);
mm[i] = apstrt + r * apwidt;
}
mm[ncheb] = apend;
a = 2.0 * ncheb;
for (long i = 0; i <= ncheb; i++) {
r = 0.5 * (1 - cos(M_PI * (2*i+1)/(double) a));
//r *= sqrt_bf(r); // Squeeze to low end of interval
r = (exp((double)r)-1.0)/(exp(1.0)-1.0);
xx[i] = apstrt + r * apwidt;
}
}
// Initialise step sizes
void AlgRemez::stpini(bigfloat *step) {
xx[neq+1] = apend;
delta = 0.25;
step[0] = xx[0] - apstrt;
for (int i = 1; i < neq; i++) step[i] = xx[i] - xx[i-1];
step[neq] = step[neq-1];
}
// Search for error maxima and minima
void AlgRemez::search(bigfloat *step) {
bigfloat a, q, xm, ym, xn, yn, xx0, xx1;
int i, j, meq, emsign, ensign, steps;
meq = neq + 1;
bigfloat *yy = new bigfloat[meq];
bigfloat eclose = 1.0e30;
bigfloat farther = 0l;
j = 1;
xx0 = apstrt;
for (i = 0; i < meq; i++) {
steps = 0;
xx1 = xx[i]; // Next zero
if (i == meq-1) xx1 = apend;
xm = mm[i];
ym = getErr(xm,&emsign);
q = step[i];
xn = xm + q;
if (xn < xx0 || xn >= xx1) { // Cannot skip over adjacent boundaries
q = -q;
xn = xm;
yn = ym;
ensign = emsign;
} else {
yn = getErr(xn,&ensign);
if (yn < ym) {
q = -q;
xn = xm;
yn = ym;
ensign = emsign;
}
}
while(yn >= ym) { // March until error becomes smaller.
if (++steps > 10) break;
ym = yn;
xm = xn;
emsign = ensign;
a = xm + q;
if (a == xm || a <= xx0 || a >= xx1) break;// Must not skip over the zeros either side.
xn = a;
yn = getErr(xn,&ensign);
}
mm[i] = xm; // Position of maximum
yy[i] = ym; // Value of maximum
if (eclose > ym) eclose = ym;
if (farther < ym) farther = ym;
xx0 = xx1; // Walk to next zero.
} // end of search loop
q = (farther - eclose); // Decrease step size if error spread increased
if (eclose != 0.0) q /= eclose; // Relative error spread
if (q >= spread) delta *= 0.5; // Spread is increasing; decrease step size
spread = q;
for (i = 0; i < neq; i++) {
q = yy[i+1];
if (q != 0.0) q = yy[i] / q - (bigfloat)1l;
else q = 0.0625;
if (q > (bigfloat)0.25) q = 0.25;
q *= mm[i+1] - mm[i];
step[i] = q * delta;
}
step[neq] = step[neq-1];
for (i = 0; i < neq; i++) { // Insert new locations for the zeros.
xm = xx[i] - step[i];
if (xm <= apstrt) continue;
if (xm >= apend) continue;
if (xm <= mm[i]) xm = (bigfloat)0.5 * (mm[i] + xx[i]);
if (xm >= mm[i+1]) xm = (bigfloat)0.5 * (mm[i+1] + xx[i]);
xx[i] = xm;
}
delete [] yy;
}
// Solve the equations
void AlgRemez::equations(void) {
bigfloat x, y, z;
int i, j, ip;
bigfloat *aa;
bigfloat *AA = new bigfloat[(neq)*(neq)];
bigfloat *BB = new bigfloat[neq];
for (i = 0; i < neq; i++) { // set up the equations for solution by simq()
ip = neq * i; // offset to 1st element of this row of matrix
x = xx[i]; // the guess for this row
y = func(x); // right-hand-side vector
z = (bigfloat)1l;
aa = AA+ip;
for (j = 0; j <= n; j++) {
*aa++ = z;
z *= x;
}
z = (bigfloat)1l;
for (j = 0; j < d; j++) {
*aa++ = -y * z;
z *= x;
}
BB[i] = y * z; // Right hand side vector
}
// Solve the simultaneous linear equations.
if (simq(AA, BB, param, neq)) {
std::cout<<"simq failed\n";
exit(0);
}
delete [] AA;
delete [] BB;
}
// Evaluate the rational form P(x)/Q(x) using coefficients
// from the solution vector param
bigfloat AlgRemez::approx(const bigfloat x) {
bigfloat yn, yd;
int i;
// Work backwards toward the constant term.
yn = param[n]; // Highest order numerator coefficient
for (i = n-1; i >= 0; i--) yn = x * yn + param[i];
yd = x + param[n+d]; // Highest degree coefficient = 1.0
for (i = n+d-1; i > n; i--) yd = x * yd + param[i];
return(yn/yd);
}
// Compute size and sign of the approximation error at x
bigfloat AlgRemez::getErr(bigfloat x, int *sign) {
bigfloat e, f;
f = func(x);
e = approx(x) - f;
if (f != 0) e /= f;
if (e < (bigfloat)0.0) {
*sign = -1;
e = -e;
}
else *sign = 1;
return(e);
}
// Calculate function required for the approximation.
bigfloat AlgRemez::func(const bigfloat x) {
bigfloat z = (bigfloat)power_num / (bigfloat)power_den;
bigfloat y;
if (x == (bigfloat)1.0) y = (bigfloat)1.0;
else y = pow_bf(x,z);
if (a_length > 0) {
bigfloat sum = 0l;
for (int j=0; j<a_length; j++) sum += a[j]*pow_bf(x,a_power[j]);
return y * exp_bf(sum);
} else {
return y;
}
}
// Solve the system AX=B
int AlgRemez::simq(bigfloat A[], bigfloat B[], bigfloat X[], int n) {
int i, j, ij, ip, ipj, ipk, ipn;
int idxpiv, iback;
int k, kp, kp1, kpk, kpn;
int nip, nkp, nm1;
bigfloat em, q, rownrm, big, size, pivot, sum;
bigfloat *aa;
// simq() work vector
int *IPS = new int[(neq) * sizeof(int)];
nm1 = n - 1;
// Initialize IPS and X
ij = 0;
for (i = 0; i < n; i++) {
IPS[i] = i;
rownrm = 0.0;
for(j = 0; j < n; j++) {
q = abs_bf(A[ij]);
if(rownrm < q) rownrm = q;
++ij;
}
if (rownrm == (bigfloat)0l) {
std::cout<<"simq rownrm=0\n";
delete [] IPS;
return(1);
}
X[i] = (bigfloat)1.0 / rownrm;
}
for (k = 0; k < nm1; k++) {
big = 0.0;
idxpiv = 0;
for (i = k; i < n; i++) {
ip = IPS[i];
ipk = n*ip + k;
size = abs_bf(A[ipk]) * X[ip];
if (size > big) {
big = size;
idxpiv = i;
}
}
if (big == (bigfloat)0l) {
std::cout<<"simq big=0\n";
delete [] IPS;
return(2);
}
if (idxpiv != k) {
j = IPS[k];
IPS[k] = IPS[idxpiv];
IPS[idxpiv] = j;
}
kp = IPS[k];
kpk = n*kp + k;
pivot = A[kpk];
kp1 = k+1;
for (i = kp1; i < n; i++) {
ip = IPS[i];
ipk = n*ip + k;
em = -A[ipk] / pivot;
A[ipk] = -em;
nip = n*ip;
nkp = n*kp;
aa = A+nkp+kp1;
for (j = kp1; j < n; j++) {
ipj = nip + j;
A[ipj] = A[ipj] + em * *aa++;
}
}
}
kpn = n * IPS[n-1] + n - 1; // last element of IPS[n] th row
if (A[kpn] == (bigfloat)0l) {
std::cout<<"simq A[kpn]=0\n";
delete [] IPS;
return(3);
}
ip = IPS[0];
X[0] = B[ip];
for (i = 1; i < n; i++) {
ip = IPS[i];
ipj = n * ip;
sum = 0.0;
for (j = 0; j < i; j++) {
sum += A[ipj] * X[j];
++ipj;
}
X[i] = B[ip] - sum;
}
ipn = n * IPS[n-1] + n - 1;
X[n-1] = X[n-1] / A[ipn];
for (iback = 1; iback < n; iback++) {
//i goes (n-1),...,1
i = nm1 - iback;
ip = IPS[i];
nip = n*ip;
sum = 0.0;
aa = A+nip+i+1;
for (j= i + 1; j < n; j++)
sum += *aa++ * X[j];
X[i] = (X[i] - sum) / A[nip+i];
}
delete [] IPS;
return(0);
}
// Calculate the roots of the approximation
int AlgRemez::root() {
long i,j;
bigfloat x,dx=0.05;
bigfloat upper=1, lower=-100000;
bigfloat tol = 1e-20;
bigfloat *poly = new bigfloat[neq+1];
// First find the numerator roots
for (i=0; i<=n; i++) poly[i] = param[i];
for (i=n-1; i>=0; i--) {
roots[i] = rtnewt(poly,i+1,lower,upper,tol);
if (roots[i] == 0.0) {
std::cout<<"Failure to converge on root "<<i+1<<"/"<<n<<"\n";
return 0;
}
poly[0] = -poly[0]/roots[i];
for (j=1; j<=i; j++) poly[j] = (poly[j-1] - poly[j])/roots[i];
}
// Now find the denominator roots
poly[d] = 1l;
for (i=0; i<d; i++) poly[i] = param[n+1+i];
for (i=d-1; i>=0; i--) {
poles[i]=rtnewt(poly,i+1,lower,upper,tol);
if (poles[i] == 0.0) {
std::cout<<"Failure to converge on pole "<<i+1<<"/"<<d<<"\n";
return 0;
}
poly[0] = -poly[0]/poles[i];
for (j=1; j<=i; j++) poly[j] = (poly[j-1] - poly[j])/poles[i];
}
norm = param[n];
delete [] poly;
return 1;
}
// Evaluate the polynomial
bigfloat AlgRemez::polyEval(bigfloat x, bigfloat *poly, long size) {
bigfloat f = poly[size];
for (int i=size-1; i>=0; i--) f = f*x + poly[i];
return f;
}
// Evaluate the differential of the polynomial
bigfloat AlgRemez::polyDiff(bigfloat x, bigfloat *poly, long size) {
bigfloat df = (bigfloat)size*poly[size];
for (int i=size-1; i>0; i--) df = df*x + (bigfloat)i*poly[i];
return df;
}
// Newton's method to calculate roots
bigfloat AlgRemez::rtnewt(bigfloat *poly, long i, bigfloat x1,
bigfloat x2, bigfloat xacc) {
int j;
bigfloat df, dx, f, rtn;
rtn=(bigfloat)0.5*(x1+x2);
for (j=1; j<=JMAX;j++) {
f = polyEval(rtn, poly, i);
df = polyDiff(rtn, poly, i);
dx = f/df;
rtn -= dx;
if (abs_bf(dx) < xacc) return rtn;
}
std::cout<<"Maximum number of iterations exceeded in rtnewt\n";
return 0.0;
}
// Evaluate the partial fraction expansion of the rational function
// with res roots and poles poles. Result is overwritten on input
// arrays.
void AlgRemez::pfe(bigfloat *res, bigfloat *poles, bigfloat norm) {
int i,j,small;
bigfloat temp;
bigfloat *numerator = new bigfloat[n];
bigfloat *denominator = new bigfloat[d];
// Construct the polynomials explicitly
for (i=1; i<n; i++) {
numerator[i] = 0l;
denominator[i] = 0l;
}
numerator[0]=1l;
denominator[0]=1l;
for (j=0; j<n; j++) {
for (i=n-1; i>=0; i--) {
numerator[i] *= -res[j];
denominator[i] *= -poles[j];
if (i>0) {
numerator[i] += numerator[i-1];
denominator[i] += denominator[i-1];
}
}
}
// Convert to proper fraction form.
// Fraction is now in the form 1 + n/d, where O(n)+1=O(d)
for (i=0; i<n; i++) numerator[i] -= denominator[i];
// Find the residues of the partial fraction expansion and absorb the
// coefficients.
for (i=0; i<n; i++) {
res[i] = 0l;
for (j=n-1; j>=0; j--) {
res[i] = poles[i]*res[i]+numerator[j];
}
for (j=n-1; j>=0; j--) {
if (i!=j) res[i] /= poles[i]-poles[j];
}
res[i] *= norm;
}
// res now holds the residues
j = 0;
for (i=0; i<n; i++) poles[i] = -poles[i];
// Move the ordering of the poles from smallest to largest
for (j=0; j<n; j++) {
small = j;
for (i=j+1; i<n; i++) {
if (poles[i] < poles[small]) small = i;
}
if (small != j) {
temp = poles[small];
poles[small] = poles[j];
poles[j] = temp;
temp = res[small];
res[small] = res[j];
res[j] = temp;
}
}
delete [] numerator;
delete [] denominator;
}
double AlgRemez::evaluateApprox(double x) {
return (double)approx((bigfloat)x);
}
double AlgRemez::evaluateInverseApprox(double x) {
return 1.0/(double)approx((bigfloat)x);
}
double AlgRemez::evaluateFunc(double x) {
return (double)func((bigfloat)x);
}
double AlgRemez::evaluateInverseFunc(double x) {
return 1.0/(double)func((bigfloat)x);
}
void AlgRemez::csv(std::ostream & os)
{
double lambda_low = apstrt;
double lambda_high= apend;
for (double x=lambda_low; x<lambda_high; x*=1.05) {
double f = evaluateFunc(x);
double r = evaluateApprox(x);
os<< x<<","<<r<<","<<f<<","<<r-f<<std::endl;
}
return;
}
+184
View File
@@ -0,0 +1,184 @@
/*
Mike Clark - 25th May 2005
alg_remez.h
AlgRemez is an implementation of the Remez algorithm, which in this
case is used for generating the optimal nth root rational
approximation.
Note this class requires the gnu multiprecision (GNU MP) library.
*/
#ifndef INCLUDED_ALG_REMEZ_H
#define INCLUDED_ALG_REMEZ_H
#include <stddef.h>
#include <Grid/GridStd.h>
#ifdef HAVE_LIBGMP
#include "bigfloat.h"
#else
#include "bigfloat_double.h"
#endif
#define JMAX 10000 //Maximum number of iterations of Newton's approximation
#define SUM_MAX 10 // Maximum number of terms in exponential
/*
*Usage examples
AlgRemez remez(lambda_low,lambda_high,precision);
error = remez.generateApprox(n,d,y,z);
remez.getPFE(res,pole,&norm);
remez.getIPFE(res,pole,&norm);
remez.csv(ostream &os);
*/
class AlgRemez
{
private:
char *cname;
// The approximation parameters
bigfloat *param, *roots, *poles;
bigfloat norm;
// The numerator and denominator degree (n=d)
int n, d;
// The bounds of the approximation
bigfloat apstrt, apwidt, apend;
// the numerator and denominator of the power we are approximating
unsigned long power_num;
unsigned long power_den;
// Flag to determine whether the arrays have been allocated
int alloc;
// Flag to determine whether the roots have been found
int foundRoots;
// Variables used to calculate the approximation
int nd1, iter;
bigfloat *xx, *mm, *step;
bigfloat delta, spread, tolerance;
// The exponential summation coefficients
bigfloat *a;
int *a_power;
int a_length;
// The number of equations we must solve at each iteration (n+d+1)
int neq;
// The precision of the GNU MP library
long prec;
// Initial values of maximal and minmal errors
void initialGuess();
// Solve the equations
void equations();
// Search for error maxima and minima
void search(bigfloat *step);
// Initialise step sizes
void stpini(bigfloat *step);
// Calculate the roots of the approximation
int root();
// Evaluate the polynomial
bigfloat polyEval(bigfloat x, bigfloat *poly, long size);
//complex_bf polyEval(complex_bf x, complex_bf *poly, long size);
// Evaluate the differential of the polynomial
bigfloat polyDiff(bigfloat x, bigfloat *poly, long size);
//complex_bf polyDiff(complex_bf x, complex_bf *poly, long size);
// Newton's method to calculate roots
bigfloat rtnewt(bigfloat *poly, long i, bigfloat x1, bigfloat x2, bigfloat xacc);
//complex_bf rtnewt(complex_bf *poly, long i, bigfloat x1, bigfloat x2, bigfloat xacc);
// Evaluate the partial fraction expansion of the rational function
// with res roots and poles poles. Result is overwritten on input
// arrays.
void pfe(bigfloat *res, bigfloat* poles, bigfloat norm);
// Calculate function required for the approximation
bigfloat func(bigfloat x);
// Compute size and sign of the approximation error at x
bigfloat getErr(bigfloat x, int *sign);
// Solve the system AX=B
int simq(bigfloat *A, bigfloat *B, bigfloat *X, int n);
// Free memory and reallocate as necessary
void allocate(int num_degree, int den_degree);
// Evaluate the rational form P(x)/Q(x) using coefficients from the
// solution vector param
bigfloat approx(bigfloat x);
public:
// Constructor
AlgRemez(double lower, double upper, long prec);
// Destructor
virtual ~AlgRemez();
int getDegree(void){
assert(n==d);
return n;
}
// Reset the bounds of the approximation
void setBounds(double lower, double upper);
// Reset the bounds of the approximation
void getBounds(double &lower, double &upper) {
lower=(double)apstrt;
upper=(double)apend;
}
// Generate the rational approximation x^(pnum/pden)
double generateApprox(int num_degree, int den_degree,
unsigned long power_num, unsigned long power_den,
int a_len, double* a_param, int* a_pow);
double generateApprox(int num_degree, int den_degree,
unsigned long power_num, unsigned long power_den);
double generateApprox(int degree, unsigned long power_num,
unsigned long power_den);
// Return the partial fraction expansion of the approximation x^(pnum/pden)
int getPFE(double *res, double *pole, double *norm);
// Return the partial fraction expansion of the approximation x^(-pnum/pden)
int getIPFE(double *res, double *pole, double *norm);
// Evaluate the rational form P(x)/Q(x) using coefficients from the
// solution vector param
double evaluateApprox(double x);
// Evaluate the rational form Q(x)/P(x) using coefficients from the
// solution vector param
double evaluateInverseApprox(double x);
// Calculate function required for the approximation
double evaluateFunc(double x);
// Calculate inverse function required for the approximation
double evaluateInverseFunc(double x);
// Dump csv of function, approx and error
void csv(std::ostream &os);
};
#endif // Include guard
+727
View File
@@ -0,0 +1,727 @@
/* -*- Mode: C; comment-column: 22; fill-column: 79; compile-command: "gcc -o zolotarev zolotarev.c -ansi -pedantic -lm -DTEST"; -*- */
#define VERSION Source Time-stamp: <2015-05-18 16:32:08 neo>
/* These C routines evalute the optimal rational approximation to the signum
* function for epsilon < |x| < 1 using Zolotarev's theorem.
*
* To obtain reliable results for high degree approximations (large n) it is
* necessary to compute using sufficiently high precision arithmetic. To this
* end the code has been parameterised to work with the preprocessor names
* INTERNAL_PRECISION and PRECISION set to float, double, or long double as
* appropriate. INTERNAL_PRECISION is used in computing the Zolotarev
* coefficients, which are converted to PRECISION before being returned to the
* caller. Presumably even higher precision could be obtained using GMP or
* similar package, but bear in mind that rounding errors might also be
* significant in evaluating the resulting polynomial. The convergence criteria
* have been written in a precision-independent form. */
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#define MAX(a,b) ((a) > (b) ? (a) : (b))
#define MIN(a,b) ((a) < (b) ? (a) : (b))
#ifndef INTERNAL_PRECISION
#define INTERNAL_PRECISION double
#endif
#include "Zolotarev.h"
#define ZOLOTAREV_INTERNAL
#undef ZOLOTAREV_DATA
#define ZOLOTAREV_DATA izd
#undef ZPRECISION
#define ZPRECISION INTERNAL_PRECISION
#include "Zolotarev.h"
#undef ZOLOTAREV_INTERNAL
/* The ANSI standard appears not to know what pi is */
#ifndef M_PI
#define M_PI ((INTERNAL_PRECISION) 3.141592653589793238462643383279502884197\
169399375105820974944592307816406286208998628034825342117068)
#endif
#define ZERO ((INTERNAL_PRECISION) 0)
#define ONE ((INTERNAL_PRECISION) 1)
#define TWO ((INTERNAL_PRECISION) 2)
#define THREE ((INTERNAL_PRECISION) 3)
#define FOUR ((INTERNAL_PRECISION) 4)
#define HALF (ONE/TWO)
/* The following obscenity seems to be the simplest (?) way to coerce the C
* preprocessor to convert the value of a preprocessor token into a string. */
#define PP2(x) #x
#define PP1(a,b,c) a ## b(c)
#define STRINGIFY(name) PP1(PP,2,name)
/* Compute the partial fraction expansion coefficients (alpha) from the
* factored form */
namespace Grid {
namespace Approx {
static void construct_partfrac(izd *z) {
int dn = z -> dn, dd = z -> dd, type = z -> type;
int j, k, da = dd + 1 + type;
INTERNAL_PRECISION A = z -> A, *a = z -> a, *ap = z -> ap, *alpha;
alpha = (INTERNAL_PRECISION*) malloc(da * sizeof(INTERNAL_PRECISION));
for (j = 0; j < dd; j++)
for (k = 0, alpha[j] = A; k < dd; k++)
alpha[j] *=
(k < dn ? ap[j] - a[k] : ONE) / (k == j ? ONE : ap[j] - ap[k]);
if(type == 1) /* implicit pole at zero? */
for (k = 0, alpha[dd] = A * (dn > dd ? - a[dd] : ONE); k < dd; k++) {
alpha[dd] *= a[k] / ap[k];
alpha[k] *= (dn > dd ? ap[k] - a[dd] : ONE) / ap[k];
}
alpha[da-1] = dn == da - 1 ? A : ZERO;
z -> alpha = alpha;
z -> da = da;
return;
}
/* Convert factored polynomial into dense polynomial. The input is the overall
* factor A and the roots a[i], such that p = A product(x - a[i], i = 1..d) */
static INTERNAL_PRECISION *poly_factored_to_dense(INTERNAL_PRECISION A,
INTERNAL_PRECISION *a,
int d) {
INTERNAL_PRECISION *p;
int i, j;
p = (INTERNAL_PRECISION *) malloc((d + 2) * sizeof(INTERNAL_PRECISION));
p[0] = A;
for (i = 0; i < d; i++) {
p[i+1] = p[i];
for (j = i; j > 0; j--) p[j] = p[j-1] - a[i]*p[j];
p[0] *= - a[i];
}
return p;
}
/* Convert a rational function of the form R0(x) = x p(x^2)/q(x^2) (type 0) or
* R1(x) = p(x^2)/[x q(x^2)] (type 1) into its continued fraction
* representation. We assume that 0 <= deg(q) - deg(p) <= 1 for type 0 and 0 <=
* deg(p) - deg(q) <= 1 for type 1. On input p and q are in factored form, and
* deg(q) = dq, deg(p) = dp. The output is the continued fraction coefficients
* beta, where R(x) = beta[0] x + 1/(beta[1] x + 1/(...)).
*
* The method used is as follows. There are four cases to consider:
*
* 0.i. Type 0, deg p = deg q
*
* 0.ii. Type 0, deg p = deg q - 1
*
* 1.i. Type 1, deg p = deg q
*
* 1.ii. Type 1, deg p = deg q + 1
*
* and these are connected by two transformations:
*
* A. To obtain a continued fraction expansion of type 1 we use a single-step
* polynomial division we find beta and r(x) such that p(x) = beta x q(x) +
* r(x), with deg(r) = deg(q). This implies that p(x^2) = beta x^2 q(x^2) +
* r(x^2), and thus R1(x) = x beta + r(x^2)/(x q(x^2)) = x beta + 1/R0(x)
* with R0(x) = x q(x^2)/r(x^2).
*
* B. A continued fraction expansion of type 0 is obtained in a similar, but
* not identical, manner. We use the polynomial division algorithm to compute
* the quotient beta and the remainder r that satisfy p(x) = beta q(x) + r(x)
* with deg(r) = deg(q) - 1. We thus have x p(x^2) = x beta q(x^2) + x r(x^2),
* so R0(x) = x beta + x r(x^2)/q(x^2) = x beta + 1/R1(x) with R1(x) = q(x^2) /
* (x r(x^2)).
*
* Note that the deg(r) must be exactly deg(q) for (A) and deg(q) - 1 for (B)
* because p and q have disjoint roots all of multiplicity 1. This means that
* the division algorithm requires only a single polynomial subtraction step.
*
* The transformations between the cases form the following finite state
* automaton:
*
* +------+ +------+ +------+ +------+
* | | | | ---(A)---> | | | |
* | 0.ii | ---(B)---> | 1.ii | | 0.i | <---(A)--- | 1.i |
* | | | | <---(B)--- | | | |
* +------+ +------+ +------+ +------+
*/
static INTERNAL_PRECISION *contfrac_A(INTERNAL_PRECISION *,
INTERNAL_PRECISION *,
INTERNAL_PRECISION *,
INTERNAL_PRECISION *, int, int);
static INTERNAL_PRECISION *contfrac_B(INTERNAL_PRECISION *,
INTERNAL_PRECISION *,
INTERNAL_PRECISION *,
INTERNAL_PRECISION *, int, int);
static void construct_contfrac(izd *z){
INTERNAL_PRECISION *r, A = z -> A, *p = z -> a, *q = z -> ap;
int dp = z -> dn, dq = z -> dd, type = z -> type;
z -> db = 2 * dq + 1 + type;
z -> beta = (INTERNAL_PRECISION *)
malloc(z -> db * sizeof(INTERNAL_PRECISION));
p = poly_factored_to_dense(A, p, dp);
q = poly_factored_to_dense(ONE, q, dq);
r = (INTERNAL_PRECISION *) malloc((MAX(dp,dq) + 1) *
sizeof(INTERNAL_PRECISION));
if (type == 0) (void) contfrac_B(z -> beta, p, q, r, dp, dq);
else (void) contfrac_A(z -> beta, p, q, r, dp, dq);
free(p); free(q); free(r);
return;
}
static INTERNAL_PRECISION *contfrac_A(INTERNAL_PRECISION *beta,
INTERNAL_PRECISION *p,
INTERNAL_PRECISION *q,
INTERNAL_PRECISION *r, int dp, int dq) {
INTERNAL_PRECISION quot, *rb;
int j;
/* p(x) = x beta q(x) + r(x); dp = dq or dp = dq + 1 */
quot = dp == dq ? ZERO : p[dp] / q[dq];
r[0] = p[0];
for (j = 1; j <= dp; j++) r[j] = p[j] - quot * q[j-1];
#ifdef DEBUG
printf("%s: Continued Fraction form: deg p = %2d, deg q = %2d, beta = %g\n",
__FUNCTION__, dp, dq, (float) quot);
for (j = 0; j <= dq + 1; j++)
printf("\tp[%2d] = %14.6g\tq[%2d] = %14.6g\tr[%2d] = %14.6g\n",
j, (float) (j > dp ? ZERO : p[j]),
j, (float) (j == 0 ? ZERO : q[j-1]),
j, (float) (j == dp ? ZERO : r[j]));
#endif /* DEBUG */
*(rb = contfrac_B(beta, q, r, p, dq, dq)) = quot;
return rb + 1;
}
static INTERNAL_PRECISION *contfrac_B(INTERNAL_PRECISION *beta,
INTERNAL_PRECISION *p,
INTERNAL_PRECISION *q,
INTERNAL_PRECISION *r, int dp, int dq) {
INTERNAL_PRECISION quot, *rb;
int j;
/* p(x) = beta q(x) + r(x); dp = dq or dp = dq - 1 */
quot = dp == dq ? p[dp] / q[dq] : ZERO;
for (j = 0; j < dq; j++) r[j] = p[j] - quot * q[j];
#ifdef DEBUG
printf("%s: Continued Fraction form: deg p = %2d, deg q = %2d, beta = %g\n",
__FUNCTION__, dp, dq, (float) quot);
for (j = 0; j <= dq; j++)
printf("\tp[%2d] = %14.6g\tq[%2d] = %14.6g\tr[%2d] = %14.6g\n",
j, (float) (j > dp ? ZERO : p[j]),
j, (float) q[j],
j, (float) (j == dq ? ZERO : r[j]));
#endif /* DEBUG */
*(rb = dq > 0 ? contfrac_A(beta, q, r, p, dq, dq-1) : beta) = quot;
return rb + 1;
}
/* The global variable U is used to hold the argument u throughout the AGM
* recursion. The global variables F and K are set in the innermost
* instantiation of the recursive function AGM to the values of the elliptic
* integrals F(u,k) and K(k) respectively. They must be made thread local to
* make this code thread-safe in a multithreaded environment. */
static INTERNAL_PRECISION U, F, K; /* THREAD LOCAL */
/* Recursive implementation of Gauss' arithmetico-geometric mean, which is the
* kernel of the method used to compute the Jacobian elliptic functions
* sn(u,k), cn(u,k), and dn(u,k) with parameter k (where 0 < k < 1), as well
* as the elliptic integral F(s,k) satisfying F(sn(u,k)) = u and the complete
* elliptic integral K(k).
*
* The algorithm used is a recursive implementation of the Gauss (Landen)
* transformation.
*
* The function returns the value of sn(u,k'), where k' is the dual parameter,
* and also sets the values of the global variables F and K. The latter is
* used to determine the sign of cn(u,k').
*
* The algorithm is deemed to have converged when b ceases to increase. This
* works whatever INTERNAL_PRECISION is specified. */
static INTERNAL_PRECISION AGM(INTERNAL_PRECISION a,
INTERNAL_PRECISION b,
INTERNAL_PRECISION s) {
static INTERNAL_PRECISION pb = -ONE;
INTERNAL_PRECISION c, d, xi;
if (b <= pb) {
pb = -ONE;
F = asin(s) / a; /* Here, a is the AGM */
K = M_PI / (TWO * a);
return sin(U * a);
}
pb = b;
c = a - b;
d = a + b;
xi = AGM(HALF*d, sqrt(a*b), ONE + c*c == ONE ?
HALF*s*d/a : (a - sqrt(a*a - s*s*c*d))/(c*s));
return 2*a*xi / (d + c*xi*xi);
}
/* Computes sn(u,k), cn(u,k), dn(u,k), F(u,k), and K(k). It is essentially a
* wrapper for the routine AGM. The sign of cn(u,k) is defined to be -1 if
* K(k) < u < 3*K(k) and +1 otherwise, and thus sign is computed by some quite
* unnecessarily obfuscated bit manipulations. */
static void sncndnFK(INTERNAL_PRECISION u, INTERNAL_PRECISION k,
INTERNAL_PRECISION* sn, INTERNAL_PRECISION* cn,
INTERNAL_PRECISION* dn, INTERNAL_PRECISION* elF,
INTERNAL_PRECISION* elK) {
int sgn;
U = u;
*sn = AGM(ONE, sqrt(ONE - k*k), u);
sgn = ((int) (fabs(u) / K)) % 4; /* sgn = 0, 1, 2, 3 */
sgn ^= sgn >> 1; /* (sgn & 1) = 0, 1, 1, 0 */
sgn = 1 - ((sgn & 1) << 1); /* sgn = 1, -1, -1, 1 */
*cn = ((INTERNAL_PRECISION) sgn) * sqrt(ONE - *sn * *sn);
*dn = sqrt(ONE - k*k* *sn * *sn);
*elF = F;
*elK = K;
}
/* Compute the coefficients for the optimal rational approximation R(x) to
* sgn(x) of degree n over the interval epsilon < |x| < 1 using Zolotarev's
* formula.
*
* Set type = 0 for the Zolotarev approximation, which is zero at x = 0, and
* type = 1 for the approximation which is infinite at x = 0. */
zolotarev_data* zolotarev(PRECISION epsilon, int n, int type) {
INTERNAL_PRECISION A, c, cp, kp, ksq, sn, cn, dn, Kp, Kj, z, z0, t, M, F,
l, invlambda, xi, xisq, *tv, s, opl;
int m, czero, ts;
zolotarev_data *zd;
izd *d = (izd*) malloc(sizeof(izd));
d -> type = type;
d -> epsilon = (INTERNAL_PRECISION) epsilon;
d -> n = n;
d -> dd = n / 2;
d -> dn = d -> dd - 1 + n % 2; /* n even: dn = dd - 1, n odd: dn = dd */
d -> deg_denom = 2 * d -> dd;
d -> deg_num = 2 * d -> dn + 1;
d -> a = (INTERNAL_PRECISION*) malloc((1 + d -> dn) *
sizeof(INTERNAL_PRECISION));
d -> ap = (INTERNAL_PRECISION*) malloc(d -> dd *
sizeof(INTERNAL_PRECISION));
ksq = d -> epsilon * d -> epsilon;
kp = sqrt(ONE - ksq);
sncndnFK(ZERO, kp, &sn, &cn, &dn, &F, &Kp); /* compute Kp = K(kp) */
z0 = TWO * Kp / (INTERNAL_PRECISION) n;
M = ONE;
A = ONE / d -> epsilon;
sncndnFK(HALF * z0, kp, &sn, &cn, &dn, &F, &Kj); /* compute xi */
xi = ONE / dn;
xisq = xi * xi;
invlambda = xi;
for (m = 0; m < d -> dd; m++) {
czero = 2 * (m + 1) == n; /* n even and m = dd -1 */
z = z0 * ((INTERNAL_PRECISION) m + ONE);
sncndnFK(z, kp, &sn, &cn, &dn, &F, &Kj);
t = cn / sn;
c = - t*t;
if (!czero) (d -> a)[d -> dn - 1 - m] = ksq / c;
z = z0 * ((INTERNAL_PRECISION) m + HALF);
sncndnFK(z, kp, &sn, &cn, &dn, &F, &Kj);
t = cn / sn;
cp = - t*t;
(d -> ap)[d -> dd - 1 - m] = ksq / cp;
M *= (ONE - c) / (ONE - cp);
A *= (czero ? -ksq : c) * (ONE - cp) / (cp * (ONE - c));
invlambda *= (ONE - c*xisq) / (ONE - cp*xisq);
}
invlambda /= M;
d -> A = TWO / (ONE + invlambda) * A;
d -> Delta = (invlambda - ONE) / (invlambda + ONE);
d -> gamma = (INTERNAL_PRECISION*) malloc((1 + d -> n) *
sizeof(INTERNAL_PRECISION));
l = ONE / invlambda;
opl = ONE + l;
sncndnFK(sqrt( d -> type == 1
? (THREE + l) / (FOUR * opl)
: (ONE + THREE*l) / (opl*opl*opl)
), sqrt(ONE - l*l), &sn, &cn, &dn, &F, &Kj);
s = M * F;
for (m = 0; m < d -> n; m++) {
sncndnFK(s + TWO*Kp*m/n, kp, &sn, &cn, &dn, &F, &Kj);
d -> gamma[m] = d -> epsilon / dn;
}
/* If R(x) is a Zolotarev rational approximation of degree (n,m) with maximum
* error Delta, then (1 - Delta^2) / R(x) is also an optimal Chebyshev
* approximation of degree (m,n) */
if (d -> type == 1) {
d -> A = (ONE - d -> Delta * d -> Delta) / d -> A;
tv = d -> a; d -> a = d -> ap; d -> ap = tv;
ts = d -> dn; d -> dn = d -> dd; d -> dd = ts;
ts = d -> deg_num; d -> deg_num = d -> deg_denom; d -> deg_denom = ts;
}
construct_partfrac(d);
construct_contfrac(d);
/* Converting everything to PRECISION for external use only */
zd = (zolotarev_data*) malloc(sizeof(zolotarev_data));
zd -> A = (PRECISION) d -> A;
zd -> Delta = (PRECISION) d -> Delta;
zd -> epsilon = (PRECISION) d -> epsilon;
zd -> n = d -> n;
zd -> type = d -> type;
zd -> dn = d -> dn;
zd -> dd = d -> dd;
zd -> da = d -> da;
zd -> db = d -> db;
zd -> deg_num = d -> deg_num;
zd -> deg_denom = d -> deg_denom;
zd -> a = (PRECISION*) malloc(zd -> dn * sizeof(PRECISION));
for (m = 0; m < zd -> dn; m++) zd -> a[m] = (PRECISION) d -> a[m];
free(d -> a);
zd -> ap = (PRECISION*) malloc(zd -> dd * sizeof(PRECISION));
for (m = 0; m < zd -> dd; m++) zd -> ap[m] = (PRECISION) d -> ap[m];
free(d -> ap);
zd -> alpha = (PRECISION*) malloc(zd -> da * sizeof(PRECISION));
for (m = 0; m < zd -> da; m++) zd -> alpha[m] = (PRECISION) d -> alpha[m];
free(d -> alpha);
zd -> beta = (PRECISION*) malloc(zd -> db * sizeof(PRECISION));
for (m = 0; m < zd -> db; m++) zd -> beta[m] = (PRECISION) d -> beta[m];
free(d -> beta);
zd -> gamma = (PRECISION*) malloc(zd -> n * sizeof(PRECISION));
for (m = 0; m < zd -> n; m++) zd -> gamma[m] = (PRECISION) d -> gamma[m];
free(d -> gamma);
free(d);
return zd;
}
void zolotarev_free(zolotarev_data *zdata)
{
free(zdata -> a);
free(zdata -> ap);
free(zdata -> alpha);
free(zdata -> beta);
free(zdata -> gamma);
free(zdata);
}
zolotarev_data* higham(PRECISION epsilon, int n) {
INTERNAL_PRECISION A, M, c, cp, z, z0, t, epssq;
int m, czero;
zolotarev_data *zd;
izd *d = (izd*) malloc(sizeof(izd));
d -> type = 0;
d -> epsilon = (INTERNAL_PRECISION) epsilon;
d -> n = n;
d -> dd = n / 2;
d -> dn = d -> dd - 1 + n % 2; /* n even: dn = dd - 1, n odd: dn = dd */
d -> deg_denom = 2 * d -> dd;
d -> deg_num = 2 * d -> dn + 1;
d -> a = (INTERNAL_PRECISION*) malloc((1 + d -> dn) *
sizeof(INTERNAL_PRECISION));
d -> ap = (INTERNAL_PRECISION*) malloc(d -> dd *
sizeof(INTERNAL_PRECISION));
A = (INTERNAL_PRECISION) n;
z0 = M_PI / A;
A = n % 2 == 0 ? A : ONE / A;
M = d -> epsilon * A;
epssq = d -> epsilon * d -> epsilon;
for (m = 0; m < d -> dd; m++) {
czero = 2 * (m + 1) == n; /* n even and m = dd - 1*/
if (!czero) {
z = z0 * ((INTERNAL_PRECISION) m + ONE);
t = tan(z);
c = - t*t;
(d -> a)[d -> dn - 1 - m] = c;
M *= epssq - c;
}
z = z0 * ((INTERNAL_PRECISION) m + HALF);
t = tan(z);
cp = - t*t;
(d -> ap)[d -> dd - 1 - m] = cp;
M /= epssq - cp;
}
d -> gamma = (INTERNAL_PRECISION*) malloc((1 + d -> n) *
sizeof(INTERNAL_PRECISION));
for (m = 0; m < d -> n; m++) d -> gamma[m] = ONE;
d -> A = A;
d -> Delta = ONE - M;
construct_partfrac(d);
construct_contfrac(d);
/* Converting everything to PRECISION for external use only */
zd = (zolotarev_data*) malloc(sizeof(zolotarev_data));
zd -> A = (PRECISION) d -> A;
zd -> Delta = (PRECISION) d -> Delta;
zd -> epsilon = (PRECISION) d -> epsilon;
zd -> n = d -> n;
zd -> type = d -> type;
zd -> dn = d -> dn;
zd -> dd = d -> dd;
zd -> da = d -> da;
zd -> db = d -> db;
zd -> deg_num = d -> deg_num;
zd -> deg_denom = d -> deg_denom;
zd -> a = (PRECISION*) malloc(zd -> dn * sizeof(PRECISION));
for (m = 0; m < zd -> dn; m++) zd -> a[m] = (PRECISION) d -> a[m];
free(d -> a);
zd -> ap = (PRECISION*) malloc(zd -> dd * sizeof(PRECISION));
for (m = 0; m < zd -> dd; m++) zd -> ap[m] = (PRECISION) d -> ap[m];
free(d -> ap);
zd -> alpha = (PRECISION*) malloc(zd -> da * sizeof(PRECISION));
for (m = 0; m < zd -> da; m++) zd -> alpha[m] = (PRECISION) d -> alpha[m];
free(d -> alpha);
zd -> beta = (PRECISION*) malloc(zd -> db * sizeof(PRECISION));
for (m = 0; m < zd -> db; m++) zd -> beta[m] = (PRECISION) d -> beta[m];
free(d -> beta);
zd -> gamma = (PRECISION*) malloc(zd -> n * sizeof(PRECISION));
for (m = 0; m < zd -> n; m++) zd -> gamma[m] = (PRECISION) d -> gamma[m];
free(d -> gamma);
free(d);
return zd;
}
}}
#ifdef TEST
#undef ZERO
#define ZERO ((PRECISION) 0)
#undef ONE
#define ONE ((PRECISION) 1)
#undef TWO
#define TWO ((PRECISION) 2)
/* Evaluate the rational approximation R(x) using the factored form */
static PRECISION zolotarev_eval(PRECISION x, zolotarev_data* rdata) {
int m;
PRECISION R;
if (rdata -> type == 0) {
R = rdata -> A * x;
for (m = 0; m < rdata -> deg_denom/2; m++)
R *= (2*(m+1) > rdata -> deg_num ? ONE : x*x - rdata -> a[m]) /
(x*x - rdata -> ap[m]);
} else {
R = rdata -> A / x;
for (m = 0; m < rdata -> deg_num/2; m++)
R *= (x*x - rdata -> a[m]) /
(2*(m+1) > rdata -> deg_denom ? ONE : x*x - rdata -> ap[m]);
}
return R;
}
/* Evaluate the rational approximation R(x) using the partial fraction form */
static PRECISION zolotarev_partfrac_eval(PRECISION x, zolotarev_data* rdata) {
int m;
PRECISION R = rdata -> alpha[rdata -> da - 1];
for (m = 0; m < rdata -> dd; m++)
R += rdata -> alpha[m] / (x * x - rdata -> ap[m]);
if (rdata -> type == 1) R += rdata -> alpha[rdata -> dd] / (x * x);
return R * x;
}
/* Evaluate the rational approximation R(x) using continued fraction form.
*
* If x = 0 and type = 1 then the result should be INF, whereas if x = 0 and
* type = 0 then the result should be 0, but division by zero will occur at
* intermediate stages of the evaluation. For IEEE implementations with
* non-signalling overflow this will work correctly since 1/(1/0) = 1/INF = 0,
* but with signalling overflow you will get an error message. */
static PRECISION zolotarev_contfrac_eval(PRECISION x, zolotarev_data* rdata) {
int m;
PRECISION R = rdata -> beta[0] * x;
for (m = 1; m < rdata -> db; m++) R = rdata -> beta[m] * x + ONE / R;
return R;
}
/* Evaluate the rational approximation R(x) using Cayley form */
static PRECISION zolotarev_cayley_eval(PRECISION x, zolotarev_data* rdata) {
int m;
PRECISION T;
T = rdata -> type == 0 ? ONE : -ONE;
for (m = 0; m < rdata -> n; m++)
T *= (rdata -> gamma[m] - x) / (rdata -> gamma[m] + x);
return (ONE - T) / (ONE + T);
}
/* Test program. Apart from printing out the parameters for R(x) it produces
* the following data files for plotting (unless NPLOT is defined):
*
* zolotarev-fn is a plot of R(x) for |x| < 1.2. This should look like sgn(x).
*
* zolotarev-err is a plot of the error |R(x) - sgn(x)| scaled by 1/Delta. This
* should oscillate deg_num + deg_denom + 2 times between +1 and -1 over the
* domain epsilon <= |x| <= 1.
*
* If ALLPLOTS is defined then zolotarev-partfrac (zolotarev-contfrac) is a
* plot of the difference between the values of R(x) computed using the
* factored form and the partial fraction (continued fraction) form, scaled by
* 1/Delta. It should be zero everywhere. */
int main(int argc, char** argv) {
int m, n, plotpts = 5000, type = 0;
float eps, x, ypferr, ycferr, ycaylerr, maxypferr, maxycferr, maxycaylerr;
zolotarev_data *rdata;
PRECISION y;
FILE *plot_function, *plot_error,
*plot_partfrac, *plot_contfrac, *plot_cayley;
if (argc < 3 || argc > 4) {
fprintf(stderr, "Usage: %s epsilon n [type]\n", *argv);
exit(EXIT_FAILURE);
}
sscanf(argv[1], "%g", &eps); /* First argument is epsilon */
sscanf(argv[2], "%d", &n); /* Second argument is n */
if (argc == 4) sscanf(argv[3], "%d", &type); /* Third argument is type */
if (type < 0 || type > 2) {
fprintf(stderr, "%s: type must be 0 (Zolotarev R(0) = 0),\n"
"\t\t1 (Zolotarev R(0) = Inf, or 2 (Higham)\n", *argv);
exit(EXIT_FAILURE);
}
rdata = type == 2
? higham((PRECISION) eps, n)
: zolotarev((PRECISION) eps, n, type);
printf("Zolotarev Test: R(epsilon = %g, n = %d, type = %d)\n\t"
STRINGIFY(VERSION) "\n\t" STRINGIFY(HVERSION)
"\n\tINTERNAL_PRECISION = " STRINGIFY(INTERNAL_PRECISION)
"\tPRECISION = " STRINGIFY(PRECISION)
"\n\n\tRational approximation of degree (%d,%d), %s at x = 0\n"
"\tDelta = %g (maximum error)\n\n"
"\tA = %g (overall factor)\n",
(float) rdata -> epsilon, rdata -> n, type,
rdata -> deg_num, rdata -> deg_denom,
rdata -> type == 1 ? "infinite" : "zero",
(float) rdata -> Delta, (float) rdata -> A);
for (m = 0; m < MIN(rdata -> dd, rdata -> dn); m++)
printf("\ta[%2d] = %14.8g\t\ta'[%2d] = %14.8g\n",
m + 1, (float) rdata -> a[m], m + 1, (float) rdata -> ap[m]);
if (rdata -> dd > rdata -> dn)
printf("\t\t\t\t\ta'[%2d] = %14.8g\n",
rdata -> dn + 1, (float) rdata -> ap[rdata -> dn]);
if (rdata -> dd < rdata -> dn)
printf("\ta[%2d] = %14.8g\n",
rdata -> dd + 1, (float) rdata -> a[rdata -> dd]);
printf("\n\tPartial fraction coefficients\n");
printf("\talpha[ 0] = %14.8g\n",
(float) rdata -> alpha[rdata -> da - 1]);
for (m = 0; m < rdata -> dd; m++)
printf("\talpha[%2d] = %14.8g\ta'[%2d] = %14.8g\n",
m + 1, (float) rdata -> alpha[m], m + 1, (float) rdata -> ap[m]);
if (rdata -> type == 1)
printf("\talpha[%2d] = %14.8g\ta'[%2d] = %14.8g\n",
rdata -> dd + 1, (float) rdata -> alpha[rdata -> dd],
rdata -> dd + 1, (float) ZERO);
printf("\n\tContinued fraction coefficients\n");
for (m = 0; m < rdata -> db; m++)
printf("\tbeta[%2d] = %14.8g\n", m, (float) rdata -> beta[m]);
printf("\n\tCayley transform coefficients\n");
for (m = 0; m < rdata -> n; m++)
printf("\tgamma[%2d] = %14.8g\n", m, (float) rdata -> gamma[m]);
#ifndef NPLOT
plot_function = fopen("zolotarev-fn.dat", "w");
plot_error = fopen("zolotarev-err.dat", "w");
#ifdef ALLPLOTS
plot_partfrac = fopen("zolotarev-partfrac.dat", "w");
plot_contfrac = fopen("zolotarev-contfrac.dat", "w");
plot_cayley = fopen("zolotarev-cayley.dat", "w");
#endif /* ALLPLOTS */
for (m = 0, maxypferr = maxycferr = maxycaylerr = 0.0; m <= plotpts; m++) {
x = 2.4 * (float) m / plotpts - 1.2;
if (rdata -> type == 0 || fabs(x) * (float) plotpts > 1.0) {
/* skip x = 0 for type 1, as R(0) is singular */
y = zolotarev_eval((PRECISION) x, rdata);
fprintf(plot_function, "%g %g\n", x, (float) y);
fprintf(plot_error, "%g %g\n",
x, (float)((y - ((x > 0.0 ? ONE : -ONE))) / rdata -> Delta));
ypferr = (float)((zolotarev_partfrac_eval((PRECISION) x, rdata) - y)
/ rdata -> Delta);
ycferr = (float)((zolotarev_contfrac_eval((PRECISION) x, rdata) - y)
/ rdata -> Delta);
ycaylerr = (float)((zolotarev_cayley_eval((PRECISION) x, rdata) - y)
/ rdata -> Delta);
if (fabs(x) < 1.0 && fabs(x) > rdata -> epsilon) {
maxypferr = MAX(maxypferr, fabs(ypferr));
maxycferr = MAX(maxycferr, fabs(ycferr));
maxycaylerr = MAX(maxycaylerr, fabs(ycaylerr));
}
#ifdef ALLPLOTS
fprintf(plot_partfrac, "%g %g\n", x, ypferr);
fprintf(plot_contfrac, "%g %g\n", x, ycferr);
fprintf(plot_cayley, "%g %g\n", x, ycaylerr);
#endif /* ALLPLOTS */
}
}
#ifdef ALLPLOTS
fclose(plot_cayley);
fclose(plot_contfrac);
fclose(plot_partfrac);
#endif /* ALLPLOTS */
fclose(plot_error);
fclose(plot_function);
printf("\n\tMaximum PF error = %g (relative to Delta)\n", maxypferr);
printf("\tMaximum CF error = %g (relative to Delta)\n", maxycferr);
printf("\tMaximum Cayley error = %g (relative to Delta)\n", maxycaylerr);
#endif /* NPLOT */
free(rdata -> a);
free(rdata -> ap);
free(rdata -> alpha);
free(rdata -> beta);
free(rdata -> gamma);
free(rdata);
return EXIT_SUCCESS;
}
#endif /* TEST */
+87
View File
@@ -0,0 +1,87 @@
/* -*- Mode: C; comment-column: 22; fill-column: 79; -*- */
#ifdef __cplusplus
namespace Grid {
namespace Approx {
#endif
#define HVERSION Header Time-stamp: <14-OCT-2004 09:26:51.00 adk@MISSCONTRARY>
#ifndef ZOLOTAREV_INTERNAL
#ifndef PRECISION
#define PRECISION double
#endif
#define ZPRECISION PRECISION
#define ZOLOTAREV_DATA zolotarev_data
#endif
/* This struct contains the coefficients which parameterise an optimal rational
* approximation to the signum function.
*
* The parameterisations used are:
*
* Factored form for type 0 (R0(0) = 0)
*
* R0(x) = A * x * prod(x^2 - a[j], j = 0 .. dn-1) / prod(x^2 - ap[j], j = 0
* .. dd-1),
*
* where deg_num = 2*dn + 1 and deg_denom = 2*dd.
*
* Factored form for type 1 (R1(0) = infinity)
*
* R1(x) = (A / x) * prod(x^2 - a[j], j = 0 .. dn-1) / prod(x^2 - ap[j], j = 0
* .. dd-1),
*
* where deg_num = 2*dn and deg_denom = 2*dd + 1.
*
* Partial fraction form
*
* R(x) = alpha[da] * x + sum(alpha[j] * x / (x^2 - ap[j]), j = 0 .. da-1)
*
* where da = dd for type 0 and da = dd + 1 with ap[dd] = 0 for type 1.
*
* Continued fraction form
*
* R(x) = beta[db-1] * x + 1 / (beta[db-2] * x + 1 / (beta[db-3] * x + ...))
*
* with the final coefficient being beta[0], with d' = 2 * dd + 1 for type 0
* and db = 2 * dd + 2 for type 1.
*
* Cayley form (Chiu's domain wall formulation)
*
* R(x) = (1 - T(x)) / (1 + T(x))
*
* where T(x) = prod((x - gamma[j]) / (x + gamma[j]), j = 0 .. n-1)
*/
typedef struct {
ZPRECISION *a, /* zeros of numerator, a[0 .. dn-1] */
*ap, /* poles (zeros of denominator), ap[0 .. dd-1] */
A, /* overall factor */
*alpha, /* coefficients of partial fraction, alpha[0 .. da-1] */
*beta, /* coefficients of continued fraction, beta[0 .. db-1] */
*gamma, /* zeros of numerator of T in Cayley form */
Delta, /* maximum error, |R(x) - sgn(x)| <= Delta */
epsilon; /* minimum x value, epsilon < |x| < 1 */
int n, /* approximation degree */
type, /* 0: R(0) = 0, 1: R(0) = infinity */
dn, dd, da, db, /* number of elements of a, ap, alpha, and beta */
deg_num, /* degree of numerator = deg_denom +/- 1 */
deg_denom; /* degree of denominator */
} ZOLOTAREV_DATA;
#ifndef ZOLOTAREV_INTERNAL
/* zolotarev(epsilon, n, type) returns a pointer to an initialised
* zolotarev_data structure. The arguments must satisfy the constraints that
* epsilon > 0, n > 0, and type = 0 or 1. */
ZOLOTAREV_DATA* higham(PRECISION epsilon, int n) ;
ZOLOTAREV_DATA* zolotarev(PRECISION epsilon, int n, int type);
void zolotarev_free(zolotarev_data *zdata);
#endif
#ifdef __cplusplus
}}
#endif
+206
View File
@@ -0,0 +1,206 @@
/*
Mike Clark - 25th May 2005
bigfloat.h
Simple C++ wrapper for multiprecision datatype used by AlgRemez
algorithm
*/
#ifndef INCLUDED_BIGFLOAT_H
#define INCLUDED_BIGFLOAT_H
#include <gmp.h>
#include <mpf2mpfr.h>
#include <mpfr.h>
class bigfloat {
private:
mpf_t x;
public:
bigfloat() { mpf_init(x); }
bigfloat(const bigfloat& y) { mpf_init_set(x, y.x); }
bigfloat(const unsigned long u) { mpf_init_set_ui(x, u); }
bigfloat(const long i) { mpf_init_set_si(x, i); }
bigfloat(const int i) {mpf_init_set_si(x,(long)i);}
bigfloat(const float d) { mpf_init_set_d(x, (double)d); }
bigfloat(const double d) { mpf_init_set_d(x, d); }
bigfloat(const char *str) { mpf_init_set_str(x, (char*)str, 10); }
~bigfloat(void) { mpf_clear(x); }
operator double (void) const { return (double)mpf_get_d(x); }
static void setDefaultPrecision(unsigned long dprec) {
unsigned long bprec = (unsigned long)(3.321928094 * (double)dprec);
mpf_set_default_prec(bprec);
}
void setPrecision(unsigned long dprec) {
unsigned long bprec = (unsigned long)(3.321928094 * (double)dprec);
mpf_set_prec(x,bprec);
}
unsigned long getPrecision(void) const { return mpf_get_prec(x); }
unsigned long getDefaultPrecision(void) const { return mpf_get_default_prec(); }
bigfloat& operator=(const bigfloat& y) {
mpf_set(x, y.x);
return *this;
}
bigfloat& operator=(const unsigned long y) {
mpf_set_ui(x, y);
return *this;
}
bigfloat& operator=(const signed long y) {
mpf_set_si(x, y);
return *this;
}
bigfloat& operator=(const float y) {
mpf_set_d(x, (double)y);
return *this;
}
bigfloat& operator=(const double y) {
mpf_set_d(x, y);
return *this;
}
size_t write(void);
size_t read(void);
/* Arithmetic Functions */
bigfloat& operator+=(const bigfloat& y) { return *this = *this + y; }
bigfloat& operator-=(const bigfloat& y) { return *this = *this - y; }
bigfloat& operator*=(const bigfloat& y) { return *this = *this * y; }
bigfloat& operator/=(const bigfloat& y) { return *this = *this / y; }
friend bigfloat operator+(const bigfloat& x, const bigfloat& y) {
bigfloat a;
mpf_add(a.x,x.x,y.x);
return a;
}
friend bigfloat operator+(const bigfloat& x, const unsigned long y) {
bigfloat a;
mpf_add_ui(a.x,x.x,y);
return a;
}
friend bigfloat operator-(const bigfloat& x, const bigfloat& y) {
bigfloat a;
mpf_sub(a.x,x.x,y.x);
return a;
}
friend bigfloat operator-(const unsigned long x, const bigfloat& y) {
bigfloat a;
mpf_ui_sub(a.x,x,y.x);
return a;
}
friend bigfloat operator-(const bigfloat& x, const unsigned long y) {
bigfloat a;
mpf_sub_ui(a.x,x.x,y);
return a;
}
friend bigfloat operator-(const bigfloat& x) {
bigfloat a;
mpf_neg(a.x,x.x);
return a;
}
friend bigfloat operator*(const bigfloat& x, const bigfloat& y) {
bigfloat a;
mpf_mul(a.x,x.x,y.x);
return a;
}
friend bigfloat operator*(const bigfloat& x, const unsigned long y) {
bigfloat a;
mpf_mul_ui(a.x,x.x,y);
return a;
}
friend bigfloat operator/(const bigfloat& x, const bigfloat& y){
bigfloat a;
mpf_div(a.x,x.x,y.x);
return a;
}
friend bigfloat operator/(const unsigned long x, const bigfloat& y){
bigfloat a;
mpf_ui_div(a.x,x,y.x);
return a;
}
friend bigfloat operator/(const bigfloat& x, const unsigned long y){
bigfloat a;
mpf_div_ui(a.x,x.x,y);
return a;
}
friend bigfloat sqrt_bf(const bigfloat& x){
bigfloat a;
mpf_sqrt(a.x,x.x);
return a;
}
friend bigfloat sqrt_bf(const unsigned long x){
bigfloat a;
mpf_sqrt_ui(a.x,x);
return a;
}
friend bigfloat abs_bf(const bigfloat& x){
bigfloat a;
mpf_abs(a.x,x.x);
return a;
}
friend bigfloat pow_bf(const bigfloat& a, long power) {
bigfloat b;
mpf_pow_ui(b.x,a.x,power);
return b;
}
friend bigfloat pow_bf(const bigfloat& a, bigfloat &power) {
bigfloat b;
mpfr_pow(b.x,a.x,power.x,GMP_RNDN);
return b;
}
friend bigfloat exp_bf(const bigfloat& a) {
bigfloat b;
mpfr_exp(b.x,a.x,GMP_RNDN);
return b;
}
/* Comparison Functions */
friend int operator>(const bigfloat& x, const bigfloat& y) {
int test;
test = mpf_cmp(x.x,y.x);
if (test > 0) return 1;
else return 0;
}
friend int operator<(const bigfloat& x, const bigfloat& y) {
int test;
test = mpf_cmp(x.x,y.x);
if (test < 0) return 1;
else return 0;
}
friend int sgn(const bigfloat&);
};
#endif
+189
View File
@@ -0,0 +1,189 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/approx/bigfloat_double.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <math.h>
typedef double mfloat;
class bigfloat {
private:
mfloat x;
public:
bigfloat() { }
bigfloat(const bigfloat& y) { x=y.x; }
bigfloat(const unsigned long u) { x=u; }
bigfloat(const long i) { x=i; }
bigfloat(const int i) { x=i;}
bigfloat(const float d) { x=d;}
bigfloat(const double d) { x=d;}
bigfloat(const char *str) { x=std::stod(std::string(str));}
~bigfloat(void) { }
operator double (void) const { return (double)x; }
static void setDefaultPrecision(unsigned long dprec) {
}
void setPrecision(unsigned long dprec) {
}
unsigned long getPrecision(void) const { return 64; }
unsigned long getDefaultPrecision(void) const { return 64; }
bigfloat& operator=(const bigfloat& y) { x=y.x; return *this; }
bigfloat& operator=(const unsigned long y) { x=y; return *this; }
bigfloat& operator=(const signed long y) { x=y; return *this; }
bigfloat& operator=(const float y) { x=y; return *this; }
bigfloat& operator=(const double y) { x=y; return *this; }
size_t write(void);
size_t read(void);
/* Arithmetic Functions */
bigfloat& operator+=(const bigfloat& y) { return *this = *this + y; }
bigfloat& operator-=(const bigfloat& y) { return *this = *this - y; }
bigfloat& operator*=(const bigfloat& y) { return *this = *this * y; }
bigfloat& operator/=(const bigfloat& y) { return *this = *this / y; }
friend bigfloat operator+(const bigfloat& x, const bigfloat& y) {
bigfloat a;
a.x=x.x+y.x;
return a;
}
friend bigfloat operator+(const bigfloat& x, const unsigned long y) {
bigfloat a;
a.x=x.x+y;
return a;
}
friend bigfloat operator-(const bigfloat& x, const bigfloat& y) {
bigfloat a;
a.x=x.x-y.x;
return a;
}
friend bigfloat operator-(const unsigned long x, const bigfloat& y) {
bigfloat bx(x);
return bx-y;
}
friend bigfloat operator-(const bigfloat& x, const unsigned long y) {
bigfloat by(y);
return x-by;
}
friend bigfloat operator-(const bigfloat& x) {
bigfloat a;
a.x=-x.x;
return a;
}
friend bigfloat operator*(const bigfloat& x, const bigfloat& y) {
bigfloat a;
a.x=x.x*y.x;
return a;
}
friend bigfloat operator*(const bigfloat& x, const unsigned long y) {
bigfloat a;
a.x=x.x*y;
return a;
}
friend bigfloat operator/(const bigfloat& x, const bigfloat& y){
bigfloat a;
a.x=x.x/y.x;
return a;
}
friend bigfloat operator/(const unsigned long x, const bigfloat& y){
bigfloat bx(x);
return bx/y;
}
friend bigfloat operator/(const bigfloat& x, const unsigned long y){
bigfloat by(y);
return x/by;
}
friend bigfloat sqrt_bf(const bigfloat& x){
bigfloat a;
a.x= sqrt(x.x);
return a;
}
friend bigfloat sqrt_bf(const unsigned long x){
bigfloat a(x);
return sqrt_bf(a);
}
friend bigfloat abs_bf(const bigfloat& x){
bigfloat a;
a.x=fabs(x.x);
return a;
}
friend bigfloat pow_bf(const bigfloat& a, long power) {
bigfloat b;
b.x=pow(a.x,power);
return b;
}
friend bigfloat pow_bf(const bigfloat& a, bigfloat &power) {
bigfloat b;
b.x=pow(a.x,power.x);
return b;
}
friend bigfloat exp_bf(const bigfloat& a) {
bigfloat b;
b.x=exp(a.x);
return b;
}
/* Comparison Functions */
friend int operator>(const bigfloat& x, const bigfloat& y) {
return x.x>y.x;
}
friend int operator<(const bigfloat& x, const bigfloat& y) {
return x.x<y.x;
}
friend int sgn(const bigfloat& x) {
if ( x.x>=0 ) return 1;
else return 0;
}
/* Miscellaneous Functions */
// friend bigfloat& random(void);
};
+397
View File
@@ -0,0 +1,397 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/AdefGeneric.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_ALGORITHMS_ITERATIVE_GENERIC_PCG
#define GRID_ALGORITHMS_ITERATIVE_GENERIC_PCG
/*
* Compared to Tang-2009: P=Pleft. P^T = PRight Q=MssInv.
* Script A = SolverMatrix
* Script P = Preconditioner
*
* Deflation methods considered
* -- Solve P A x = P b [ like Luscher ]
* DEF-1 M P A x = M P b [i.e. left precon]
* DEF-2 P^T M A x = P^T M b
* ADEF-1 Preconditioner = M P + Q [ Q + M + M A Q]
* ADEF-2 Preconditioner = P^T M + Q
* BNN Preconditioner = P^T M P + Q
* BNN2 Preconditioner = M P + P^TM +Q - M P A M
*
* Implement ADEF-2
*
* Vstart = P^Tx + Qb
* M1 = P^TM + Q
* M2=M3=1
* Vout = x
*/
// abstract base
template<class Field, class CoarseField>
class TwoLevelFlexiblePcg : public LinearFunction<Field>
{
public:
int verbose;
RealD Tolerance;
Integer MaxIterations;
const int mmax = 5;
GridBase *grid;
GridBase *coarsegrid;
LinearOperatorBase<Field> *_Linop
OperatorFunction<Field> *_Smoother,
LinearFunction<CoarseField> *_CoarseSolver;
// Need somthing that knows how to get from Coarse to fine and back again
// more most opertor functions
TwoLevelFlexiblePcg(RealD tol,
Integer maxit,
LinearOperatorBase<Field> *Linop,
LinearOperatorBase<Field> *SmootherLinop,
OperatorFunction<Field> *Smoother,
OperatorFunction<CoarseField> CoarseLinop
) :
Tolerance(tol),
MaxIterations(maxit),
_Linop(Linop),
_PreconditionerLinop(PrecLinop),
_Preconditioner(Preconditioner)
{
verbose=0;
};
// The Pcg routine is common to all, but the various matrices differ from derived
// implementation to derived implmentation
void operator() (const Field &src, Field &psi){
void operator() (const Field &src, Field &psi){
psi.checkerboard = src.checkerboard;
grid = src._grid;
RealD f;
RealD rtzp,rtz,a,d,b;
RealD rptzp;
RealD tn;
RealD guess = norm2(psi);
RealD ssq = norm2(src);
RealD rsq = ssq*Tolerance*Tolerance;
/////////////////////////////
// Set up history vectors
/////////////////////////////
std::vector<Field> p (mmax,grid);
std::vector<Field> mmp(mmax,grid);
std::vector<RealD> pAp(mmax);
Field x (grid); x = psi;
Field z (grid);
Field tmp(grid);
Field r (grid);
Field mu (grid);
//////////////////////////
// x0 = Vstart -- possibly modify guess
//////////////////////////
x=src;
Vstart(x,src);
// r0 = b -A x0
HermOp(x,mmp); // Shouldn't this be something else?
axpy (r, -1.0,mmp[0], src); // Recomputes r=src-Ax0
//////////////////////////////////
// Compute z = M1 x
//////////////////////////////////
M1(r,z,tmp,mp,SmootherMirs);
rtzp =real(innerProduct(r,z));
///////////////////////////////////////
// Solve for Mss mu = P A z and set p = z-mu
// Def2: p = 1 - Q Az = Pright z
// Other algos M2 is trivial
///////////////////////////////////////
M2(z,p[0]);
for (int k=0;k<=MaxIterations;k++){
int peri_k = k % mmax;
int peri_kp = (k+1) % mmax;
rtz=rtzp;
d= M3(p[peri_k],mp,mmp[peri_k],tmp);
a = rtz/d;
// Memorise this
pAp[peri_k] = d;
axpy(x,a,p[peri_k],x);
RealD rn = axpy_norm(r,-a,mmp[peri_k],r);
// Compute z = M x
M1(r,z,tmp,mp);
rtzp =real(innerProduct(r,z));
M2(z,mu); // ADEF-2 this is identity. Axpy possible to eliminate
p[peri_kp]=p[peri_k];
// Standard search direction p -> z + b p ; b =
b = (rtzp)/rtz;
int northog;
// northog = (peri_kp==0)?1:peri_kp; // This is the fCG(mmax) algorithm
northog = (k>mmax-1)?(mmax-1):k; // This is the fCG-Tr(mmax-1) algorithm
for(int back=0; back < northog; back++){
int peri_back = (k-back)%mmax;
RealD pbApk= real(innerProduct(mmp[peri_back],p[peri_kp]));
RealD beta = -pbApk/pAp[peri_back];
axpy(p[peri_kp],beta,p[peri_back],p[peri_kp]);
}
RealD rrn=sqrt(rn/ssq);
std::cout<<GridLogMessage<<"TwoLevelfPcg: k= "<<k<<" residual = "<<rrn<<std::endl;
// Stopping condition
if ( rn <= rsq ) {
HermOp(x,mmp); // Shouldn't this be something else?
axpy(tmp,-1.0,src,mmp[0]);
RealD psinorm = sqrt(norm2(x));
RealD srcnorm = sqrt(norm2(src));
RealD tmpnorm = sqrt(norm2(tmp));
RealD true_residual = tmpnorm/srcnorm;
std::cout<<GridLogMessage<<"TwoLevelfPcg: true residual is "<<true_residual<<std::endl;
std::cout<<GridLogMessage<<"TwoLevelfPcg: target residual was"<<Tolerance<<std::endl;
return k;
}
}
// Non-convergence
assert(0);
}
public:
virtual void M(Field & in,Field & out,Field & tmp) {
}
virtual void M1(Field & in, Field & out) {// the smoother
// [PTM+Q] in = [1 - Q A] M in + Q in = Min + Q [ in -A Min]
Field tmp(grid);
Field Min(grid);
PcgM(in,Min); // Smoother call
HermOp(Min,out);
axpy(tmp,-1.0,out,in); // tmp = in - A Min
ProjectToSubspace(tmp,PleftProj);
ApplyInverse(PleftProj,PleftMss_proj); // Ass^{-1} [in - A Min]_s
PromoteFromSubspace(PleftMss_proj,tmp);// tmp = Q[in - A Min]
axpy(out,1.0,Min,tmp); // Min+tmp
}
virtual void M2(const Field & in, Field & out) {
out=in;
// Must override for Def2 only
// case PcgDef2:
// Pright(in,out);
// break;
}
virtual RealD M3(const Field & p, Field & mmp){
double d,dd;
HermOpAndNorm(p,mmp,d,dd);
return dd;
// Must override for Def1 only
// case PcgDef1:
// d=linop_d->Mprec(p,mmp,tmp,0,1);// Dag no
// linop_d->Mprec(mmp,mp,tmp,1);// Dag yes
// Pleft(mp,mmp);
// d=real(linop_d->inner(p,mmp));
}
virtual void VstartDef2(Field & xconst Field & src){
//case PcgDef2:
//case PcgAdef2:
//case PcgAdef2f:
//case PcgV11f:
///////////////////////////////////
// Choose x_0 such that
// x_0 = guess + (A_ss^inv) r_s = guess + Ass_inv [src -Aguess]
// = [1 - Ass_inv A] Guess + Assinv src
// = P^T guess + Assinv src
// = Vstart [Tang notation]
// This gives:
// W^T (src - A x_0) = src_s - A guess_s - r_s
// = src_s - (A guess)_s - src_s + (A guess)_s
// = 0
///////////////////////////////////
Field r(grid);
Field mmp(grid);
HermOp(x,mmp);
axpy (r, -1.0, mmp, src); // r_{-1} = src - A x
ProjectToSubspace(r,PleftProj);
ApplyInverseCG(PleftProj,PleftMss_proj); // Ass^{-1} r_s
PromoteFromSubspace(PleftMss_proj,mmp);
x=x+mmp;
}
virtual void Vstart(Field & x,const Field & src){
return;
}
/////////////////////////////////////////////////////////////////////
// Only Def1 has non-trivial Vout. Override in Def1
/////////////////////////////////////////////////////////////////////
virtual void Vout (Field & in, Field & out,Field & src){
out = in;
//case PcgDef1:
// //Qb + PT x
// ProjectToSubspace(src,PleftProj);
// ApplyInverse(PleftProj,PleftMss_proj); // Ass^{-1} r_s
// PromoteFromSubspace(PleftMss_proj,tmp);
//
// Pright(in,out);
//
// linop_d->axpy(out,tmp,out,1.0);
// break;
}
////////////////////////////////////////////////////////////////////////////////////////////////
// Pright and Pleft are common to all implementations
////////////////////////////////////////////////////////////////////////////////////////////////
virtual void Pright(Field & in,Field & out){
// P_R = [ 1 0 ]
// [ -Mss^-1 Msb 0 ]
Field in_sbar(grid);
ProjectToSubspace(in,PleftProj);
PromoteFromSubspace(PleftProj,out);
axpy(in_sbar,-1.0,out,in); // in_sbar = in - in_s
HermOp(in_sbar,out);
ProjectToSubspace(out,PleftProj); // Mssbar in_sbar (project)
ApplyInverse (PleftProj,PleftMss_proj); // Mss^{-1} Mssbar
PromoteFromSubspace(PleftMss_proj,out); //
axpy(out,-1.0,out,in_sbar); // in_sbar - Mss^{-1} Mssbar in_sbar
}
virtual void Pleft (Field & in,Field & out){
// P_L = [ 1 -Mbs Mss^-1]
// [ 0 0 ]
Field in_sbar(grid);
Field tmp2(grid);
Field Mtmp(grid);
ProjectToSubspace(in,PleftProj);
PromoteFromSubspace(PleftProj,out);
axpy(in_sbar,-1.0,out,in); // in_sbar = in - in_s
ApplyInverse(PleftProj,PleftMss_proj); // Mss^{-1} in_s
PromoteFromSubspace(PleftMss_proj,out);
HermOp(out,Mtmp);
ProjectToSubspace(Mtmp,PleftProj); // Msbar s Mss^{-1}
PromoteFromSubspace(PleftProj,tmp2);
axpy(out,-1.0,tmp2,Mtmp);
axpy(out,-1.0,out,in_sbar); // in_sbar - Msbars Mss^{-1} in_s
}
}
template<class Field>
class TwoLevelFlexiblePcgADef2 : public TwoLevelFlexiblePcg<Field> {
public:
virtual void M(Field & in,Field & out,Field & tmp){
}
virtual void M1(Field & in, Field & out,Field & tmp,Field & mp){
}
virtual void M2(Field & in, Field & out){
}
virtual RealD M3(Field & p, Field & mp,Field & mmp, Field & tmp){
}
virtual void Vstart(Field & in, Field & src, Field & r, Field & mp, Field & mmp, Field & tmp){
}
}
/*
template<class Field>
class TwoLevelFlexiblePcgAD : public TwoLevelFlexiblePcg<Field> {
public:
virtual void M(Field & in,Field & out,Field & tmp);
virtual void M1(Field & in, Field & out,Field & tmp,Field & mp);
virtual void M2(Field & in, Field & out);
virtual RealD M3(Field & p, Field & mp,Field & mmp, Field & tmp);
virtual void Vstart(Field & in, Field & src, Field & r, Field & mp, Field & mmp, Field & tmp);
}
template<class Field>
class TwoLevelFlexiblePcgDef1 : public TwoLevelFlexiblePcg<Field> {
public:
virtual void M(Field & in,Field & out,Field & tmp);
virtual void M1(Field & in, Field & out,Field & tmp,Field & mp);
virtual void M2(Field & in, Field & out);
virtual RealD M3(Field & p, Field & mp,Field & mmp, Field & tmp);
virtual void Vstart(Field & in, Field & src, Field & r, Field & mp, Field & mmp, Field & tmp);
virtual void Vout (Field & in, Field & out,Field & src,Field & tmp);
}
template<class Field>
class TwoLevelFlexiblePcgDef2 : public TwoLevelFlexiblePcg<Field> {
public:
virtual void M(Field & in,Field & out,Field & tmp);
virtual void M1(Field & in, Field & out,Field & tmp,Field & mp);
virtual void M2(Field & in, Field & out);
virtual RealD M3(Field & p, Field & mp,Field & mmp, Field & tmp);
virtual void Vstart(Field & in, Field & src, Field & r, Field & mp, Field & mmp, Field & tmp);
}
template<class Field>
class TwoLevelFlexiblePcgV11: public TwoLevelFlexiblePcg<Field> {
public:
virtual void M(Field & in,Field & out,Field & tmp);
virtual void M1(Field & in, Field & out,Field & tmp,Field & mp);
virtual void M2(Field & in, Field & out);
virtual RealD M3(Field & p, Field & mp,Field & mmp, Field & tmp);
virtual void Vstart(Field & in, Field & src, Field & r, Field & mp, Field & mmp, Field & tmp);
}
*/
#endif
@@ -0,0 +1,606 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/BlockConjugateGradient.h
Copyright (C) 2017
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_BLOCK_CONJUGATE_GRADIENT_H
#define GRID_BLOCK_CONJUGATE_GRADIENT_H
namespace Grid {
enum BlockCGtype { BlockCG, BlockCGrQ, CGmultiRHS };
//////////////////////////////////////////////////////////////////////////
// Block conjugate gradient. Dimension zero should be the block direction
//////////////////////////////////////////////////////////////////////////
template <class Field>
class BlockConjugateGradient : public OperatorFunction<Field> {
public:
typedef typename Field::scalar_type scomplex;
int blockDim ;
int Nblock;
BlockCGtype CGtype;
bool ErrorOnNoConverge; // throw an assert when the CG fails to converge.
// Defaults true.
RealD Tolerance;
Integer MaxIterations;
Integer IterationsToComplete; //Number of iterations the CG took to finish. Filled in upon completion
BlockConjugateGradient(BlockCGtype cgtype,int _Orthog,RealD tol, Integer maxit, bool err_on_no_conv = true)
: Tolerance(tol), CGtype(cgtype), blockDim(_Orthog), MaxIterations(maxit), ErrorOnNoConverge(err_on_no_conv)
{};
////////////////////////////////////////////////////////////////////////////////////////////////////
// Thin QR factorisation (google it)
////////////////////////////////////////////////////////////////////////////////////////////////////
void ThinQRfact (Eigen::MatrixXcd &m_rr,
Eigen::MatrixXcd &C,
Eigen::MatrixXcd &Cinv,
Field & Q,
const Field & R)
{
int Orthog = blockDim; // First dimension is block dim; this is an assumption
////////////////////////////////////////////////////////////////////////////////////////////////////
//Dimensions
// R_{ferm x Nblock} = Q_{ferm x Nblock} x C_{Nblock x Nblock} -> ferm x Nblock
//
// Rdag R = m_rr = Herm = L L^dag <-- Cholesky decomposition (LLT routine in Eigen)
//
// Q C = R => Q = R C^{-1}
//
// Want Ident = Q^dag Q = C^{-dag} R^dag R C^{-1} = C^{-dag} L L^dag C^{-1} = 1_{Nblock x Nblock}
//
// Set C = L^{dag}, and then Q^dag Q = ident
//
// Checks:
// Cdag C = Rdag R ; passes.
// QdagQ = 1 ; passes
////////////////////////////////////////////////////////////////////////////////////////////////////
sliceInnerProductMatrix(m_rr,R,R,Orthog);
// Force manifest hermitian to avoid rounding related
m_rr = 0.5*(m_rr+m_rr.adjoint());
#if 0
std::cout << " Calling Cholesky ldlt on m_rr " << m_rr <<std::endl;
Eigen::MatrixXcd L_ldlt = m_rr.ldlt().matrixL();
std::cout << " Called Cholesky ldlt on m_rr " << L_ldlt <<std::endl;
auto D_ldlt = m_rr.ldlt().vectorD();
std::cout << " Called Cholesky ldlt on m_rr " << D_ldlt <<std::endl;
#endif
// std::cout << " Calling Cholesky llt on m_rr " <<std::endl;
Eigen::MatrixXcd L = m_rr.llt().matrixL();
// std::cout << " Called Cholesky llt on m_rr " << L <<std::endl;
C = L.adjoint();
Cinv = C.inverse();
////////////////////////////////////////////////////////////////////////////////////////////////////
// Q = R C^{-1}
//
// Q_j = R_i Cinv(i,j)
//
// NB maddMatrix conventions are Right multiplication X[j] a[j,i] already
////////////////////////////////////////////////////////////////////////////////////////////////////
sliceMulMatrix(Q,Cinv,R,Orthog);
}
////////////////////////////////////////////////////////////////////////////////////////////////////
// Call one of several implementations
////////////////////////////////////////////////////////////////////////////////////////////////////
void operator()(LinearOperatorBase<Field> &Linop, const Field &Src, Field &Psi)
{
if ( CGtype == BlockCGrQ ) {
BlockCGrQsolve(Linop,Src,Psi);
} else if (CGtype == BlockCG ) {
BlockCGsolve(Linop,Src,Psi);
} else if (CGtype == CGmultiRHS ) {
CGmultiRHSsolve(Linop,Src,Psi);
} else {
assert(0);
}
}
////////////////////////////////////////////////////////////////////////////
// BlockCGrQ implementation:
//--------------------------
// X is guess/Solution
// B is RHS
// Solve A X_i = B_i ; i refers to Nblock index
////////////////////////////////////////////////////////////////////////////
void BlockCGrQsolve(LinearOperatorBase<Field> &Linop, const Field &B, Field &X)
{
int Orthog = blockDim; // First dimension is block dim; this is an assumption
Nblock = B._grid->_fdimensions[Orthog];
std::cout<<GridLogMessage<<" Block Conjugate Gradient : Orthog "<<Orthog<<" Nblock "<<Nblock<<std::endl;
X.checkerboard = B.checkerboard;
conformable(X, B);
Field tmp(B);
Field Q(B);
Field D(B);
Field Z(B);
Field AD(B);
Eigen::MatrixXcd m_DZ = Eigen::MatrixXcd::Identity(Nblock,Nblock);
Eigen::MatrixXcd m_M = Eigen::MatrixXcd::Identity(Nblock,Nblock);
Eigen::MatrixXcd m_rr = Eigen::MatrixXcd::Zero(Nblock,Nblock);
Eigen::MatrixXcd m_C = Eigen::MatrixXcd::Zero(Nblock,Nblock);
Eigen::MatrixXcd m_Cinv = Eigen::MatrixXcd::Zero(Nblock,Nblock);
Eigen::MatrixXcd m_S = Eigen::MatrixXcd::Zero(Nblock,Nblock);
Eigen::MatrixXcd m_Sinv = Eigen::MatrixXcd::Zero(Nblock,Nblock);
Eigen::MatrixXcd m_tmp = Eigen::MatrixXcd::Identity(Nblock,Nblock);
Eigen::MatrixXcd m_tmp1 = Eigen::MatrixXcd::Identity(Nblock,Nblock);
// Initial residual computation & set up
std::vector<RealD> residuals(Nblock);
std::vector<RealD> ssq(Nblock);
sliceNorm(ssq,B,Orthog);
RealD sssum=0;
for(int b=0;b<Nblock;b++) sssum+=ssq[b];
sliceNorm(residuals,B,Orthog);
for(int b=0;b<Nblock;b++){ assert(std::isnan(residuals[b])==0); }
sliceNorm(residuals,X,Orthog);
for(int b=0;b<Nblock;b++){ assert(std::isnan(residuals[b])==0); }
/************************************************************************
* Block conjugate gradient rQ (Sebastien Birk Thesis, after Dubrulle 2001)
************************************************************************
* Dimensions:
*
* X,B==(Nferm x Nblock)
* A==(Nferm x Nferm)
*
* Nferm = Nspin x Ncolour x Ncomplex x Nlattice_site
*
* QC = R = B-AX, D = Q ; QC => Thin QR factorisation (google it)
* for k:
* Z = AD
* M = [D^dag Z]^{-1}
* X = X + D MC
* QS = Q - ZM
* D = Q + D S^dag
* C = S C
*/
///////////////////////////////////////
// Initial block: initial search dir is guess
///////////////////////////////////////
std::cout << GridLogMessage<<"BlockCGrQ algorithm initialisation " <<std::endl;
//1. QC = R = B-AX, D = Q ; QC => Thin QR factorisation (google it)
Linop.HermOp(X, AD);
tmp = B - AD;
//std::cout << GridLogMessage << " initial tmp " << norm2(tmp)<< std::endl;
ThinQRfact (m_rr, m_C, m_Cinv, Q, tmp);
//std::cout << GridLogMessage << " initial Q " << norm2(Q)<< std::endl;
//std::cout << GridLogMessage << " m_rr " << m_rr<<std::endl;
//std::cout << GridLogMessage << " m_C " << m_C<<std::endl;
//std::cout << GridLogMessage << " m_Cinv " << m_Cinv<<std::endl;
D=Q;
std::cout << GridLogMessage<<"BlockCGrQ computed initial residual and QR fact " <<std::endl;
///////////////////////////////////////
// Timers
///////////////////////////////////////
GridStopWatch sliceInnerTimer;
GridStopWatch sliceMaddTimer;
GridStopWatch QRTimer;
GridStopWatch MatrixTimer;
GridStopWatch SolverTimer;
SolverTimer.Start();
int k;
for (k = 1; k <= MaxIterations; k++) {
//3. Z = AD
MatrixTimer.Start();
Linop.HermOp(D, Z);
MatrixTimer.Stop();
//std::cout << GridLogMessage << " norm2 Z " <<norm2(Z)<<std::endl;
//4. M = [D^dag Z]^{-1}
sliceInnerTimer.Start();
sliceInnerProductMatrix(m_DZ,D,Z,Orthog);
sliceInnerTimer.Stop();
m_M = m_DZ.inverse();
//std::cout << GridLogMessage << " m_DZ " <<m_DZ<<std::endl;
//5. X = X + D MC
m_tmp = m_M * m_C;
sliceMaddTimer.Start();
sliceMaddMatrix(X,m_tmp, D,X,Orthog);
sliceMaddTimer.Stop();
//6. QS = Q - ZM
sliceMaddTimer.Start();
sliceMaddMatrix(tmp,m_M,Z,Q,Orthog,-1.0);
sliceMaddTimer.Stop();
QRTimer.Start();
ThinQRfact (m_rr, m_S, m_Sinv, Q, tmp);
QRTimer.Stop();
//7. D = Q + D S^dag
m_tmp = m_S.adjoint();
sliceMaddTimer.Start();
sliceMaddMatrix(D,m_tmp,D,Q,Orthog);
sliceMaddTimer.Stop();
//8. C = S C
m_C = m_S*m_C;
/*********************
* convergence monitor
*********************
*/
m_rr = m_C.adjoint() * m_C;
RealD max_resid=0;
RealD rrsum=0;
RealD rr;
for(int b=0;b<Nblock;b++) {
rrsum+=real(m_rr(b,b));
rr = real(m_rr(b,b))/ssq[b];
if ( rr > max_resid ) max_resid = rr;
}
std::cout << GridLogIterative << "\titeration "<<k<<" rr_sum "<<rrsum<<" ssq_sum "<< sssum
<<" ave "<<std::sqrt(rrsum/sssum) << " max "<< max_resid <<std::endl;
if ( max_resid < Tolerance*Tolerance ) {
SolverTimer.Stop();
std::cout << GridLogMessage<<"BlockCGrQ converged in "<<k<<" iterations"<<std::endl;
for(int b=0;b<Nblock;b++){
std::cout << GridLogMessage<< "\t\tblock "<<b<<" computed resid "
<< std::sqrt(real(m_rr(b,b))/ssq[b])<<std::endl;
}
std::cout << GridLogMessage<<"\tMax residual is "<<std::sqrt(max_resid)<<std::endl;
Linop.HermOp(X, AD);
AD = AD-B;
std::cout << GridLogMessage <<"\t True residual is " << std::sqrt(norm2(AD)/norm2(B)) <<std::endl;
std::cout << GridLogMessage << "Time Breakdown "<<std::endl;
std::cout << GridLogMessage << "\tElapsed " << SolverTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tMatrix " << MatrixTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tInnerProd " << sliceInnerTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tMaddMatrix " << sliceMaddTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tThinQRfact " << QRTimer.Elapsed() <<std::endl;
IterationsToComplete = k;
return;
}
}
std::cout << GridLogMessage << "BlockConjugateGradient(rQ) did NOT converge" << std::endl;
if (ErrorOnNoConverge) assert(0);
IterationsToComplete = k;
}
//////////////////////////////////////////////////////////////////////////
// Block conjugate gradient; Original O'Leary Dimension zero should be the block direction
//////////////////////////////////////////////////////////////////////////
void BlockCGsolve(LinearOperatorBase<Field> &Linop, const Field &Src, Field &Psi)
{
int Orthog = blockDim; // First dimension is block dim; this is an assumption
Nblock = Src._grid->_fdimensions[Orthog];
std::cout<<GridLogMessage<<" Block Conjugate Gradient : Orthog "<<Orthog<<" Nblock "<<Nblock<<std::endl;
Psi.checkerboard = Src.checkerboard;
conformable(Psi, Src);
Field P(Src);
Field AP(Src);
Field R(Src);
Eigen::MatrixXcd m_pAp = Eigen::MatrixXcd::Identity(Nblock,Nblock);
Eigen::MatrixXcd m_pAp_inv= Eigen::MatrixXcd::Identity(Nblock,Nblock);
Eigen::MatrixXcd m_rr = Eigen::MatrixXcd::Zero(Nblock,Nblock);
Eigen::MatrixXcd m_rr_inv = Eigen::MatrixXcd::Zero(Nblock,Nblock);
Eigen::MatrixXcd m_alpha = Eigen::MatrixXcd::Zero(Nblock,Nblock);
Eigen::MatrixXcd m_beta = Eigen::MatrixXcd::Zero(Nblock,Nblock);
// Initial residual computation & set up
std::vector<RealD> residuals(Nblock);
std::vector<RealD> ssq(Nblock);
sliceNorm(ssq,Src,Orthog);
RealD sssum=0;
for(int b=0;b<Nblock;b++) sssum+=ssq[b];
sliceNorm(residuals,Src,Orthog);
for(int b=0;b<Nblock;b++){ assert(std::isnan(residuals[b])==0); }
sliceNorm(residuals,Psi,Orthog);
for(int b=0;b<Nblock;b++){ assert(std::isnan(residuals[b])==0); }
// Initial search dir is guess
Linop.HermOp(Psi, AP);
/************************************************************************
* Block conjugate gradient (Stephen Pickles, thesis 1995, pp 71, O Leary 1980)
************************************************************************
* O'Leary : R = B - A X
* O'Leary : P = M R ; preconditioner M = 1
* O'Leary : alpha = PAP^{-1} RMR
* O'Leary : beta = RMR^{-1}_old RMR_new
* O'Leary : X=X+Palpha
* O'Leary : R_new=R_old-AP alpha
* O'Leary : P=MR_new+P beta
*/
R = Src - AP;
P = R;
sliceInnerProductMatrix(m_rr,R,R,Orthog);
GridStopWatch sliceInnerTimer;
GridStopWatch sliceMaddTimer;
GridStopWatch MatrixTimer;
GridStopWatch SolverTimer;
SolverTimer.Start();
int k;
for (k = 1; k <= MaxIterations; k++) {
RealD rrsum=0;
for(int b=0;b<Nblock;b++) rrsum+=real(m_rr(b,b));
std::cout << GridLogIterative << "\titeration "<<k<<" rr_sum "<<rrsum<<" ssq_sum "<< sssum
<<" / "<<std::sqrt(rrsum/sssum) <<std::endl;
MatrixTimer.Start();
Linop.HermOp(P, AP);
MatrixTimer.Stop();
// Alpha
sliceInnerTimer.Start();
sliceInnerProductMatrix(m_pAp,P,AP,Orthog);
sliceInnerTimer.Stop();
m_pAp_inv = m_pAp.inverse();
m_alpha = m_pAp_inv * m_rr ;
// Psi, R update
sliceMaddTimer.Start();
sliceMaddMatrix(Psi,m_alpha, P,Psi,Orthog); // add alpha * P to psi
sliceMaddMatrix(R ,m_alpha,AP, R,Orthog,-1.0);// sub alpha * AP to resid
sliceMaddTimer.Stop();
// Beta
m_rr_inv = m_rr.inverse();
sliceInnerTimer.Start();
sliceInnerProductMatrix(m_rr,R,R,Orthog);
sliceInnerTimer.Stop();
m_beta = m_rr_inv *m_rr;
// Search update
sliceMaddTimer.Start();
sliceMaddMatrix(AP,m_beta,P,R,Orthog);
sliceMaddTimer.Stop();
P= AP;
/*********************
* convergence monitor
*********************
*/
RealD max_resid=0;
RealD rr;
for(int b=0;b<Nblock;b++){
rr = real(m_rr(b,b))/ssq[b];
if ( rr > max_resid ) max_resid = rr;
}
if ( max_resid < Tolerance*Tolerance ) {
SolverTimer.Stop();
std::cout << GridLogMessage<<"BlockCG converged in "<<k<<" iterations"<<std::endl;
for(int b=0;b<Nblock;b++){
std::cout << GridLogMessage<< "\t\tblock "<<b<<" computed resid "
<< std::sqrt(real(m_rr(b,b))/ssq[b])<<std::endl;
}
std::cout << GridLogMessage<<"\tMax residual is "<<std::sqrt(max_resid)<<std::endl;
Linop.HermOp(Psi, AP);
AP = AP-Src;
std::cout << GridLogMessage <<"\t True residual is " << std::sqrt(norm2(AP)/norm2(Src)) <<std::endl;
std::cout << GridLogMessage << "Time Breakdown "<<std::endl;
std::cout << GridLogMessage << "\tElapsed " << SolverTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tMatrix " << MatrixTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tInnerProd " << sliceInnerTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tMaddMatrix " << sliceMaddTimer.Elapsed() <<std::endl;
IterationsToComplete = k;
return;
}
}
std::cout << GridLogMessage << "BlockConjugateGradient did NOT converge" << std::endl;
if (ErrorOnNoConverge) assert(0);
IterationsToComplete = k;
}
//////////////////////////////////////////////////////////////////////////
// multiRHS conjugate gradient. Dimension zero should be the block direction
// Use this for spread out across nodes
//////////////////////////////////////////////////////////////////////////
void CGmultiRHSsolve(LinearOperatorBase<Field> &Linop, const Field &Src, Field &Psi)
{
int Orthog = blockDim; // First dimension is block dim
Nblock = Src._grid->_fdimensions[Orthog];
std::cout<<GridLogMessage<<"MultiRHS Conjugate Gradient : Orthog "<<Orthog<<" Nblock "<<Nblock<<std::endl;
Psi.checkerboard = Src.checkerboard;
conformable(Psi, Src);
Field P(Src);
Field AP(Src);
Field R(Src);
std::vector<ComplexD> v_pAp(Nblock);
std::vector<RealD> v_rr (Nblock);
std::vector<RealD> v_rr_inv(Nblock);
std::vector<RealD> v_alpha(Nblock);
std::vector<RealD> v_beta(Nblock);
// Initial residual computation & set up
std::vector<RealD> residuals(Nblock);
std::vector<RealD> ssq(Nblock);
sliceNorm(ssq,Src,Orthog);
RealD sssum=0;
for(int b=0;b<Nblock;b++) sssum+=ssq[b];
sliceNorm(residuals,Src,Orthog);
for(int b=0;b<Nblock;b++){ assert(std::isnan(residuals[b])==0); }
sliceNorm(residuals,Psi,Orthog);
for(int b=0;b<Nblock;b++){ assert(std::isnan(residuals[b])==0); }
// Initial search dir is guess
Linop.HermOp(Psi, AP);
R = Src - AP;
P = R;
sliceNorm(v_rr,R,Orthog);
GridStopWatch sliceInnerTimer;
GridStopWatch sliceMaddTimer;
GridStopWatch sliceNormTimer;
GridStopWatch MatrixTimer;
GridStopWatch SolverTimer;
SolverTimer.Start();
int k;
for (k = 1; k <= MaxIterations; k++) {
RealD rrsum=0;
for(int b=0;b<Nblock;b++) rrsum+=real(v_rr[b]);
std::cout << GridLogIterative << "\titeration "<<k<<" rr_sum "<<rrsum<<" ssq_sum "<< sssum
<<" / "<<std::sqrt(rrsum/sssum) <<std::endl;
MatrixTimer.Start();
Linop.HermOp(P, AP);
MatrixTimer.Stop();
// Alpha
sliceInnerTimer.Start();
sliceInnerProductVector(v_pAp,P,AP,Orthog);
sliceInnerTimer.Stop();
for(int b=0;b<Nblock;b++){
v_alpha[b] = v_rr[b]/real(v_pAp[b]);
}
// Psi, R update
sliceMaddTimer.Start();
sliceMaddVector(Psi,v_alpha, P,Psi,Orthog); // add alpha * P to psi
sliceMaddVector(R ,v_alpha,AP, R,Orthog,-1.0);// sub alpha * AP to resid
sliceMaddTimer.Stop();
// Beta
for(int b=0;b<Nblock;b++){
v_rr_inv[b] = 1.0/v_rr[b];
}
sliceNormTimer.Start();
sliceNorm(v_rr,R,Orthog);
sliceNormTimer.Stop();
for(int b=0;b<Nblock;b++){
v_beta[b] = v_rr_inv[b] *v_rr[b];
}
// Search update
sliceMaddTimer.Start();
sliceMaddVector(P,v_beta,P,R,Orthog);
sliceMaddTimer.Stop();
/*********************
* convergence monitor
*********************
*/
RealD max_resid=0;
for(int b=0;b<Nblock;b++){
RealD rr = v_rr[b]/ssq[b];
if ( rr > max_resid ) max_resid = rr;
}
if ( max_resid < Tolerance*Tolerance ) {
SolverTimer.Stop();
std::cout << GridLogMessage<<"MultiRHS solver converged in " <<k<<" iterations"<<std::endl;
for(int b=0;b<Nblock;b++){
std::cout << GridLogMessage<< "\t\tBlock "<<b<<" computed resid "<< std::sqrt(v_rr[b]/ssq[b])<<std::endl;
}
std::cout << GridLogMessage<<"\tMax residual is "<<std::sqrt(max_resid)<<std::endl;
Linop.HermOp(Psi, AP);
AP = AP-Src;
std::cout <<GridLogMessage << "\tTrue residual is " << std::sqrt(norm2(AP)/norm2(Src)) <<std::endl;
std::cout << GridLogMessage << "Time Breakdown "<<std::endl;
std::cout << GridLogMessage << "\tElapsed " << SolverTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tMatrix " << MatrixTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tInnerProd " << sliceInnerTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tNorm " << sliceNormTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tMaddMatrix " << sliceMaddTimer.Elapsed() <<std::endl;
IterationsToComplete = k;
return;
}
}
std::cout << GridLogMessage << "MultiRHSConjugateGradient did NOT converge" << std::endl;
if (ErrorOnNoConverge) assert(0);
IterationsToComplete = k;
}
};
}
#endif
@@ -0,0 +1,177 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/ConjugateGradient.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_CONJUGATE_GRADIENT_H
#define GRID_CONJUGATE_GRADIENT_H
namespace Grid {
/////////////////////////////////////////////////////////////
// Base classes for iterative processes based on operators
// single input vec, single output vec.
/////////////////////////////////////////////////////////////
template <class Field>
class ConjugateGradient : public OperatorFunction<Field> {
public:
bool ErrorOnNoConverge; // throw an assert when the CG fails to converge.
// Defaults true.
RealD Tolerance;
Integer MaxIterations;
Integer IterationsToComplete; //Number of iterations the CG took to finish. Filled in upon completion
ConjugateGradient(RealD tol, Integer maxit, bool err_on_no_conv = true)
: Tolerance(tol),
MaxIterations(maxit),
ErrorOnNoConverge(err_on_no_conv){};
void operator()(LinearOperatorBase<Field> &Linop, const Field &src, Field &psi) {
psi.checkerboard = src.checkerboard;
conformable(psi, src);
RealD cp, c, a, d, b, ssq, qq, b_pred;
Field p(src);
Field mmp(src);
Field r(src);
// Initial residual computation & set up
RealD guess = norm2(psi);
assert(std::isnan(guess) == 0);
Linop.HermOpAndNorm(psi, mmp, d, b);
r = src - mmp;
p = r;
a = norm2(p);
cp = a;
ssq = norm2(src);
std::cout << GridLogIterative << std::setprecision(8) << "ConjugateGradient: guess " << guess << std::endl;
std::cout << GridLogIterative << std::setprecision(8) << "ConjugateGradient: src " << ssq << std::endl;
std::cout << GridLogIterative << std::setprecision(8) << "ConjugateGradient: mp " << d << std::endl;
std::cout << GridLogIterative << std::setprecision(8) << "ConjugateGradient: mmp " << b << std::endl;
std::cout << GridLogIterative << std::setprecision(8) << "ConjugateGradient: cp,r " << cp << std::endl;
std::cout << GridLogIterative << std::setprecision(8) << "ConjugateGradient: p " << a << std::endl;
RealD rsq = Tolerance * Tolerance * ssq;
// Check if guess is really REALLY good :)
if (cp <= rsq) {
return;
}
std::cout << GridLogIterative << std::setprecision(8)
<< "ConjugateGradient: k=0 residual " << cp << " target " << rsq << std::endl;
GridStopWatch LinalgTimer;
GridStopWatch InnerTimer;
GridStopWatch AxpyNormTimer;
GridStopWatch LinearCombTimer;
GridStopWatch MatrixTimer;
GridStopWatch SolverTimer;
SolverTimer.Start();
int k;
for (k = 1; k <= MaxIterations*1000; k++) {
c = cp;
MatrixTimer.Start();
Linop.HermOp(p, mmp);
MatrixTimer.Stop();
LinalgTimer.Start();
InnerTimer.Start();
ComplexD dc = innerProduct(p,mmp);
InnerTimer.Stop();
d = dc.real();
a = c / d;
AxpyNormTimer.Start();
cp = axpy_norm(r, -a, mmp, r);
AxpyNormTimer.Stop();
b = cp / c;
LinearCombTimer.Start();
parallel_for(int ss=0;ss<src._grid->oSites();ss++){
vstream(psi[ss], a * p[ss] + psi[ss]);
vstream(p [ss], b * p[ss] + r[ss]);
}
LinearCombTimer.Stop();
LinalgTimer.Stop();
std::cout << GridLogIterative << "ConjugateGradient: Iteration " << k
<< " residual " << cp << " target " << rsq << std::endl;
// Stopping condition
if (cp <= rsq) {
SolverTimer.Stop();
Linop.HermOpAndNorm(psi, mmp, d, qq);
p = mmp - src;
RealD srcnorm = sqrt(norm2(src));
RealD resnorm = sqrt(norm2(p));
RealD true_residual = resnorm / srcnorm;
std::cout << GridLogMessage << "ConjugateGradient Converged on iteration " << k << std::endl;
std::cout << GridLogMessage << "\tComputed residual " << sqrt(cp / ssq)<<std::endl;
std::cout << GridLogMessage << "\tTrue residual " << true_residual<<std::endl;
std::cout << GridLogMessage << "\tTarget " << Tolerance << std::endl;
std::cout << GridLogPerformance << "Time breakdown "<<std::endl;
std::cout << GridLogPerformance << "\tElapsed " << SolverTimer.Elapsed() <<std::endl;
std::cout << GridLogPerformance << "\tMatrix " << MatrixTimer.Elapsed() <<std::endl;
std::cout << GridLogPerformance << "\tLinalg " << LinalgTimer.Elapsed() <<std::endl;
std::cout << GridLogPerformance << "\tInner " << InnerTimer.Elapsed() <<std::endl;
std::cout << GridLogPerformance << "\tAxpyNorm " << AxpyNormTimer.Elapsed() <<std::endl;
std::cout << GridLogPerformance << "\tLinearComb " << LinearCombTimer.Elapsed() <<std::endl;
if (ErrorOnNoConverge) assert(true_residual / Tolerance < 10000.0);
IterationsToComplete = k;
return;
}
}
std::cout << GridLogMessage << "ConjugateGradient did NOT converge"
<< std::endl;
if (ErrorOnNoConverge) assert(0);
IterationsToComplete = k;
}
};
}
#endif
@@ -0,0 +1,154 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/ConjugateGradientMixedPrec.h
Copyright (C) 2015
Author: Christopher Kelly <ckelly@phys.columbia.edu>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_CONJUGATE_GRADIENT_MIXED_PREC_H
#define GRID_CONJUGATE_GRADIENT_MIXED_PREC_H
namespace Grid {
//Mixed precision restarted defect correction CG
template<class FieldD,class FieldF, typename std::enable_if< getPrecision<FieldD>::value == 2, int>::type = 0,typename std::enable_if< getPrecision<FieldF>::value == 1, int>::type = 0>
class MixedPrecisionConjugateGradient : public LinearFunction<FieldD> {
public:
RealD Tolerance;
RealD InnerTolerance; //Initial tolerance for inner CG. Defaults to Tolerance but can be changed
Integer MaxInnerIterations;
Integer MaxOuterIterations;
GridBase* SinglePrecGrid; //Grid for single-precision fields
RealD OuterLoopNormMult; //Stop the outer loop and move to a final double prec solve when the residual is OuterLoopNormMult * Tolerance
LinearOperatorBase<FieldF> &Linop_f;
LinearOperatorBase<FieldD> &Linop_d;
Integer TotalInnerIterations; //Number of inner CG iterations
Integer TotalOuterIterations; //Number of restarts
Integer TotalFinalStepIterations; //Number of CG iterations in final patch-up step
//Option to speed up *inner single precision* solves using a LinearFunction that produces a guess
LinearFunction<FieldF> *guesser;
MixedPrecisionConjugateGradient(RealD tol, Integer maxinnerit, Integer maxouterit, GridBase* _sp_grid, LinearOperatorBase<FieldF> &_Linop_f, LinearOperatorBase<FieldD> &_Linop_d) :
Linop_f(_Linop_f), Linop_d(_Linop_d),
Tolerance(tol), InnerTolerance(tol), MaxInnerIterations(maxinnerit), MaxOuterIterations(maxouterit), SinglePrecGrid(_sp_grid),
OuterLoopNormMult(100.), guesser(NULL){ };
void useGuesser(LinearFunction<FieldF> &g){
guesser = &g;
}
void operator() (const FieldD &src_d_in, FieldD &sol_d){
TotalInnerIterations = 0;
GridStopWatch TotalTimer;
TotalTimer.Start();
int cb = src_d_in.checkerboard;
sol_d.checkerboard = cb;
RealD src_norm = norm2(src_d_in);
RealD stop = src_norm * Tolerance*Tolerance;
GridBase* DoublePrecGrid = src_d_in._grid;
FieldD tmp_d(DoublePrecGrid);
tmp_d.checkerboard = cb;
FieldD tmp2_d(DoublePrecGrid);
tmp2_d.checkerboard = cb;
FieldD src_d(DoublePrecGrid);
src_d = src_d_in; //source for next inner iteration, computed from residual during operation
RealD inner_tol = InnerTolerance;
FieldF src_f(SinglePrecGrid);
src_f.checkerboard = cb;
FieldF sol_f(SinglePrecGrid);
sol_f.checkerboard = cb;
ConjugateGradient<FieldF> CG_f(inner_tol, MaxInnerIterations);
CG_f.ErrorOnNoConverge = false;
GridStopWatch InnerCGtimer;
GridStopWatch PrecChangeTimer;
Integer &outer_iter = TotalOuterIterations; //so it will be equal to the final iteration count
for(outer_iter = 0; outer_iter < MaxOuterIterations; outer_iter++){
//Compute double precision rsd and also new RHS vector.
Linop_d.HermOp(sol_d, tmp_d);
RealD norm = axpy_norm(src_d, -1., tmp_d, src_d_in); //src_d is residual vector
std::cout<<GridLogMessage<<"MixedPrecisionConjugateGradient: Outer iteration " <<outer_iter<<" residual "<< norm<< " target "<< stop<<std::endl;
if(norm < OuterLoopNormMult * stop){
std::cout<<GridLogMessage<<"MixedPrecisionConjugateGradient: Outer iteration converged on iteration " <<outer_iter <<std::endl;
break;
}
while(norm * inner_tol * inner_tol < stop) inner_tol *= 2; // inner_tol = sqrt(stop/norm) ??
PrecChangeTimer.Start();
precisionChange(src_f, src_d);
PrecChangeTimer.Stop();
zeroit(sol_f);
//Optionally improve inner solver guess (eg using known eigenvectors)
if(guesser != NULL)
(*guesser)(src_f, sol_f);
//Inner CG
CG_f.Tolerance = inner_tol;
InnerCGtimer.Start();
CG_f(Linop_f, src_f, sol_f);
InnerCGtimer.Stop();
TotalInnerIterations += CG_f.IterationsToComplete;
//Convert sol back to double and add to double prec solution
PrecChangeTimer.Start();
precisionChange(tmp_d, sol_f);
PrecChangeTimer.Stop();
axpy(sol_d, 1.0, tmp_d, sol_d);
}
//Final trial CG
std::cout<<GridLogMessage<<"MixedPrecisionConjugateGradient: Starting final patch-up double-precision solve"<<std::endl;
ConjugateGradient<FieldD> CG_d(Tolerance, MaxInnerIterations);
CG_d(Linop_d, src_d_in, sol_d);
TotalFinalStepIterations = CG_d.IterationsToComplete;
TotalTimer.Stop();
std::cout<<GridLogMessage<<"MixedPrecisionConjugateGradient: Inner CG iterations " << TotalInnerIterations << " Restarts " << TotalOuterIterations << " Final CG iterations " << TotalFinalStepIterations << std::endl;
std::cout<<GridLogMessage<<"MixedPrecisionConjugateGradient: Total time " << TotalTimer.Elapsed() << " Precision change " << PrecChangeTimer.Elapsed() << " Inner CG total " << InnerCGtimer.Elapsed() << std::endl;
}
};
}
#endif
@@ -0,0 +1,322 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/ConjugateGradientMultiShift.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_CONJUGATE_MULTI_SHIFT_GRADIENT_H
#define GRID_CONJUGATE_MULTI_SHIFT_GRADIENT_H
namespace Grid {
/////////////////////////////////////////////////////////////
// Base classes for iterative processes based on operators
// single input vec, single output vec.
/////////////////////////////////////////////////////////////
template<class Field>
class ConjugateGradientMultiShift : public OperatorMultiFunction<Field>,
public OperatorFunction<Field>
{
public:
RealD Tolerance;
Integer MaxIterations;
Integer IterationsToComplete; //Number of iterations the CG took to finish. Filled in upon completion
int verbose;
MultiShiftFunction shifts;
ConjugateGradientMultiShift(Integer maxit,MultiShiftFunction &_shifts) :
MaxIterations(maxit),
shifts(_shifts)
{
verbose=1;
}
void operator() (LinearOperatorBase<Field> &Linop, const Field &src, Field &psi)
{
GridBase *grid = src._grid;
int nshift = shifts.order;
std::vector<Field> results(nshift,grid);
(*this)(Linop,src,results,psi);
}
void operator() (LinearOperatorBase<Field> &Linop, const Field &src, std::vector<Field> &results, Field &psi)
{
int nshift = shifts.order;
(*this)(Linop,src,results);
psi = shifts.norm*src;
for(int i=0;i<nshift;i++){
psi = psi + shifts.residues[i]*results[i];
}
return;
}
void operator() (LinearOperatorBase<Field> &Linop, const Field &src, std::vector<Field> &psi)
{
GridBase *grid = src._grid;
////////////////////////////////////////////////////////////////////////
// Convenience references to the info stored in "MultiShiftFunction"
////////////////////////////////////////////////////////////////////////
int nshift = shifts.order;
std::vector<RealD> &mass(shifts.poles); // Make references to array in "shifts"
std::vector<RealD> &mresidual(shifts.tolerances);
std::vector<RealD> alpha(nshift,1.0);
std::vector<Field> ps(nshift,grid);// Search directions
assert(psi.size()==nshift);
assert(mass.size()==nshift);
assert(mresidual.size()==nshift);
// dynamic sized arrays on stack; 2d is a pain with vector
RealD bs[nshift];
RealD rsq[nshift];
RealD z[nshift][2];
int converged[nshift];
const int primary =0;
//Primary shift fields CG iteration
RealD a,b,c,d;
RealD cp,bp,qq; //prev
// Matrix mult fields
Field r(grid);
Field p(grid);
Field tmp(grid);
Field mmp(grid);
// Check lightest mass
for(int s=0;s<nshift;s++){
assert( mass[s]>= mass[primary] );
converged[s]=0;
}
// Wire guess to zero
// Residuals "r" are src
// First search direction "p" is also src
cp = norm2(src);
for(int s=0;s<nshift;s++){
rsq[s] = cp * mresidual[s] * mresidual[s];
std::cout<<GridLogMessage<<"ConjugateGradientMultiShift: shift "<<s
<<" target resid "<<rsq[s]<<std::endl;
ps[s] = src;
}
// r and p for primary
r=src;
p=src;
//MdagM+m[0]
Linop.HermOpAndNorm(p,mmp,d,qq);
axpy(mmp,mass[0],p,mmp);
RealD rn = norm2(p);
d += rn*mass[0];
// have verified that inner product of
// p and mmp is equal to d after this since
// the d computation is tricky
// qq = real(innerProduct(p,mmp));
// std::cout<<GridLogMessage << "debug equal ? qq "<<qq<<" d "<< d<<std::endl;
b = -cp /d;
// Set up the various shift variables
int iz=0;
z[0][1-iz] = 1.0;
z[0][iz] = 1.0;
bs[0] = b;
for(int s=1;s<nshift;s++){
z[s][1-iz] = 1.0;
z[s][iz] = 1.0/( 1.0 - b*(mass[s]-mass[0]));
bs[s] = b*z[s][iz];
}
// r += b[0] A.p[0]
// c= norm(r)
c=axpy_norm(r,b,mmp,r);
for(int s=0;s<nshift;s++) {
axpby(psi[s],0.,-bs[s]*alpha[s],src,src);
}
///////////////////////////////////////
// Timers
///////////////////////////////////////
GridStopWatch AXPYTimer;
GridStopWatch ShiftTimer;
GridStopWatch QRTimer;
GridStopWatch MatrixTimer;
GridStopWatch SolverTimer;
SolverTimer.Start();
// Iteration loop
int k;
for (k=1;k<=MaxIterations;k++){
a = c /cp;
AXPYTimer.Start();
axpy(p,a,p,r);
AXPYTimer.Stop();
// Note to self - direction ps is iterated seperately
// for each shift. Does not appear to have any scope
// for avoiding linear algebra in "single" case.
//
// However SAME r is used. Could load "r" and update
// ALL ps[s]. 2/3 Bandwidth saving
// New Kernel: Load r, vector of coeffs, vector of pointers ps
AXPYTimer.Start();
for(int s=0;s<nshift;s++){
if ( ! converged[s] ) {
if (s==0){
axpy(ps[s],a,ps[s],r);
} else{
RealD as =a *z[s][iz]*bs[s] /(z[s][1-iz]*b);
axpby(ps[s],z[s][iz],as,r,ps[s]);
}
}
}
AXPYTimer.Stop();
cp=c;
MatrixTimer.Start();
//Linop.HermOpAndNorm(p,mmp,d,qq); // d is used
// The below is faster on KNL
Linop.HermOp(p,mmp);
d=real(innerProduct(p,mmp));
MatrixTimer.Stop();
AXPYTimer.Start();
axpy(mmp,mass[0],p,mmp);
AXPYTimer.Stop();
RealD rn = norm2(p);
d += rn*mass[0];
bp=b;
b=-cp/d;
AXPYTimer.Start();
c=axpy_norm(r,b,mmp,r);
AXPYTimer.Stop();
// Toggle the recurrence history
bs[0] = b;
iz = 1-iz;
ShiftTimer.Start();
for(int s=1;s<nshift;s++){
if((!converged[s])){
RealD z0 = z[s][1-iz];
RealD z1 = z[s][iz];
z[s][iz] = z0*z1*bp
/ (b*a*(z1-z0) + z1*bp*(1- (mass[s]-mass[0])*b));
bs[s] = b*z[s][iz]/z0; // NB sign rel to Mike
}
}
ShiftTimer.Stop();
for(int s=0;s<nshift;s++){
int ss = s;
// Scope for optimisation here in case of "single".
// Could load psi[0] and pull all ps[s] in.
// if ( single ) ss=primary;
// Bandwith saving in single case is Ls * 3 -> 2+Ls, so ~ 3x saving
// Pipelined CG gain:
//
// New Kernel: Load r, vector of coeffs, vector of pointers ps
// New Kernel: Load psi[0], vector of coeffs, vector of pointers ps
// If can predict the coefficient bs then we can fuse these and avoid write reread cyce
// on ps[s].
// Before: 3 x npole + 3 x npole
// After : 2 x npole (ps[s]) => 3x speed up of multishift CG.
if( (!converged[s]) ) {
axpy(psi[ss],-bs[s]*alpha[s],ps[s],psi[ss]);
}
}
// Convergence checks
int all_converged = 1;
for(int s=0;s<nshift;s++){
if ( (!converged[s]) ){
RealD css = c * z[s][iz]* z[s][iz];
if(css<rsq[s]){
if ( ! converged[s] )
std::cout<<GridLogMessage<<"ConjugateGradientMultiShift k="<<k<<" Shift "<<s<<" has converged"<<std::endl;
converged[s]=1;
} else {
all_converged=0;
}
}
}
if ( all_converged ){
SolverTimer.Stop();
std::cout<<GridLogMessage<< "CGMultiShift: All shifts have converged iteration "<<k<<std::endl;
std::cout<<GridLogMessage<< "CGMultiShift: Checking solutions"<<std::endl;
// Check answers
for(int s=0; s < nshift; s++) {
Linop.HermOpAndNorm(psi[s],mmp,d,qq);
axpy(tmp,mass[s],psi[s],mmp);
axpy(r,-alpha[s],src,tmp);
RealD rn = norm2(r);
RealD cn = norm2(src);
std::cout<<GridLogMessage<<"CGMultiShift: shift["<<s<<"] true residual "<<std::sqrt(rn/cn)<<std::endl;
}
std::cout << GridLogMessage << "Time Breakdown "<<std::endl;
std::cout << GridLogMessage << "\tElapsed " << SolverTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tAXPY " << AXPYTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tMarix " << MatrixTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tShift " << ShiftTimer.Elapsed() <<std::endl;
IterationsToComplete = k;
return;
}
}
// ugly hack
std::cout<<GridLogMessage<<"CG multi shift did not converge"<<std::endl;
// assert(0);
}
};
}
#endif
@@ -0,0 +1,256 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/ConjugateGradientReliableUpdate.h
Copyright (C) 2015
Author: Christopher Kelly <ckelly@phys.columbia.edu>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_CONJUGATE_GRADIENT_RELIABLE_UPDATE_H
#define GRID_CONJUGATE_GRADIENT_RELIABLE_UPDATE_H
namespace Grid {
template<class FieldD,class FieldF, typename std::enable_if< getPrecision<FieldD>::value == 2, int>::type = 0,typename std::enable_if< getPrecision<FieldF>::value == 1, int>::type = 0>
class ConjugateGradientReliableUpdate : public LinearFunction<FieldD> {
public:
bool ErrorOnNoConverge; // throw an assert when the CG fails to converge.
// Defaults true.
RealD Tolerance;
Integer MaxIterations;
Integer IterationsToComplete; //Number of iterations the CG took to finish. Filled in upon completion
Integer ReliableUpdatesPerformed;
bool DoFinalCleanup; //Final DP cleanup, defaults to true
Integer IterationsToCleanup; //Final DP cleanup step iterations
LinearOperatorBase<FieldF> &Linop_f;
LinearOperatorBase<FieldD> &Linop_d;
GridBase* SinglePrecGrid;
RealD Delta; //reliable update parameter
//Optional ability to switch to a different linear operator once the tolerance reaches a certain point. Useful for single/half -> single/single
LinearOperatorBase<FieldF> *Linop_fallback;
RealD fallback_transition_tol;
ConjugateGradientReliableUpdate(RealD tol, Integer maxit, RealD _delta, GridBase* _sp_grid, LinearOperatorBase<FieldF> &_Linop_f, LinearOperatorBase<FieldD> &_Linop_d, bool err_on_no_conv = true)
: Tolerance(tol),
MaxIterations(maxit),
Delta(_delta),
Linop_f(_Linop_f),
Linop_d(_Linop_d),
SinglePrecGrid(_sp_grid),
ErrorOnNoConverge(err_on_no_conv),
DoFinalCleanup(true),
Linop_fallback(NULL)
{};
void setFallbackLinop(LinearOperatorBase<FieldF> &_Linop_fallback, const RealD _fallback_transition_tol){
Linop_fallback = &_Linop_fallback;
fallback_transition_tol = _fallback_transition_tol;
}
void operator()(const FieldD &src, FieldD &psi) {
LinearOperatorBase<FieldF> *Linop_f_use = &Linop_f;
bool using_fallback = false;
psi.checkerboard = src.checkerboard;
conformable(psi, src);
RealD cp, c, a, d, b, ssq, qq, b_pred;
FieldD p(src);
FieldD mmp(src);
FieldD r(src);
// Initial residual computation & set up
RealD guess = norm2(psi);
assert(std::isnan(guess) == 0);
Linop_d.HermOpAndNorm(psi, mmp, d, b);
r = src - mmp;
p = r;
a = norm2(p);
cp = a;
ssq = norm2(src);
std::cout << GridLogIterative << std::setprecision(4) << "ConjugateGradientReliableUpdate: guess " << guess << std::endl;
std::cout << GridLogIterative << std::setprecision(4) << "ConjugateGradientReliableUpdate: src " << ssq << std::endl;
std::cout << GridLogIterative << std::setprecision(4) << "ConjugateGradientReliableUpdate: mp " << d << std::endl;
std::cout << GridLogIterative << std::setprecision(4) << "ConjugateGradientReliableUpdate: mmp " << b << std::endl;
std::cout << GridLogIterative << std::setprecision(4) << "ConjugateGradientReliableUpdate: cp,r " << cp << std::endl;
std::cout << GridLogIterative << std::setprecision(4) << "ConjugateGradientReliableUpdate: p " << a << std::endl;
RealD rsq = Tolerance * Tolerance * ssq;
// Check if guess is really REALLY good :)
if (cp <= rsq) {
std::cout << GridLogMessage << "ConjugateGradientReliableUpdate guess was REALLY good\n";
std::cout << GridLogMessage << "\tComputed residual " << sqrt(cp / ssq)<<std::endl;
return;
}
//Single prec initialization
FieldF r_f(SinglePrecGrid);
r_f.checkerboard = r.checkerboard;
precisionChange(r_f, r);
FieldF psi_f(r_f);
psi_f = zero;
FieldF p_f(r_f);
FieldF mmp_f(r_f);
RealD MaxResidSinceLastRelUp = cp; //initial residual
std::cout << GridLogIterative << std::setprecision(4)
<< "ConjugateGradient: k=0 residual " << cp << " target " << rsq << std::endl;
GridStopWatch LinalgTimer;
GridStopWatch MatrixTimer;
GridStopWatch SolverTimer;
SolverTimer.Start();
int k = 0;
int l = 0;
for (k = 1; k <= MaxIterations; k++) {
c = cp;
MatrixTimer.Start();
Linop_f_use->HermOpAndNorm(p_f, mmp_f, d, qq);
MatrixTimer.Stop();
LinalgTimer.Start();
a = c / d;
b_pred = a * (a * qq - d) / c;
cp = axpy_norm(r_f, -a, mmp_f, r_f);
b = cp / c;
// Fuse these loops ; should be really easy
psi_f = a * p_f + psi_f;
//p_f = p_f * b + r_f;
LinalgTimer.Stop();
std::cout << GridLogIterative << "ConjugateGradientReliableUpdate: Iteration " << k
<< " residual " << cp << " target " << rsq << std::endl;
std::cout << GridLogDebug << "a = "<< a << " b_pred = "<< b_pred << " b = "<< b << std::endl;
std::cout << GridLogDebug << "qq = "<< qq << " d = "<< d << " c = "<< c << std::endl;
if(cp > MaxResidSinceLastRelUp){
std::cout << GridLogIterative << "ConjugateGradientReliableUpdate: updating MaxResidSinceLastRelUp : " << MaxResidSinceLastRelUp << " -> " << cp << std::endl;
MaxResidSinceLastRelUp = cp;
}
// Stopping condition
if (cp <= rsq) {
//Although not written in the paper, I assume that I have to add on the final solution
precisionChange(mmp, psi_f);
psi = psi + mmp;
SolverTimer.Stop();
Linop_d.HermOpAndNorm(psi, mmp, d, qq);
p = mmp - src;
RealD srcnorm = sqrt(norm2(src));
RealD resnorm = sqrt(norm2(p));
RealD true_residual = resnorm / srcnorm;
std::cout << GridLogMessage << "ConjugateGradientReliableUpdate Converged on iteration " << k << " after " << l << " reliable updates" << std::endl;
std::cout << GridLogMessage << "\tComputed residual " << sqrt(cp / ssq)<<std::endl;
std::cout << GridLogMessage << "\tTrue residual " << true_residual<<std::endl;
std::cout << GridLogMessage << "\tTarget " << Tolerance << std::endl;
std::cout << GridLogMessage << "Time breakdown "<<std::endl;
std::cout << GridLogMessage << "\tElapsed " << SolverTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tMatrix " << MatrixTimer.Elapsed() <<std::endl;
std::cout << GridLogMessage << "\tLinalg " << LinalgTimer.Elapsed() <<std::endl;
IterationsToComplete = k;
ReliableUpdatesPerformed = l;
if(DoFinalCleanup){
//Do a final CG to cleanup
std::cout << GridLogMessage << "ConjugateGradientReliableUpdate performing final cleanup.\n";
ConjugateGradient<FieldD> CG(Tolerance,MaxIterations);
CG.ErrorOnNoConverge = ErrorOnNoConverge;
CG(Linop_d,src,psi);
IterationsToCleanup = CG.IterationsToComplete;
}
else if (ErrorOnNoConverge) assert(true_residual / Tolerance < 10000.0);
std::cout << GridLogMessage << "ConjugateGradientReliableUpdate complete.\n";
return;
}
else if(cp < Delta * MaxResidSinceLastRelUp) { //reliable update
std::cout << GridLogMessage << "ConjugateGradientReliableUpdate "
<< cp << "(residual) < " << Delta << "(Delta) * " << MaxResidSinceLastRelUp << "(MaxResidSinceLastRelUp) on iteration " << k << " : performing reliable update\n";
precisionChange(mmp, psi_f);
psi = psi + mmp;
Linop_d.HermOpAndNorm(psi, mmp, d, qq);
r = src - mmp;
psi_f = zero;
precisionChange(r_f, r);
cp = norm2(r);
MaxResidSinceLastRelUp = cp;
b = cp/c;
std::cout << GridLogMessage << "ConjugateGradientReliableUpdate new residual " << cp << std::endl;
l = l+1;
}
p_f = p_f * b + r_f; //update search vector after reliable update appears to help convergence
if(!using_fallback && Linop_fallback != NULL && cp < fallback_transition_tol){
std::cout << GridLogMessage << "ConjugateGradientReliableUpdate switching to fallback linear operator on iteration " << k << " at residual " << cp << std::endl;
Linop_f_use = Linop_fallback;
using_fallback = true;
}
}
std::cout << GridLogMessage << "ConjugateGradientReliableUpdate did NOT converge"
<< std::endl;
if (ErrorOnNoConverge) assert(0);
IterationsToComplete = k;
ReliableUpdatesPerformed = l;
}
};
};
#endif
@@ -0,0 +1,111 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/ConjugateResidual.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_CONJUGATE_RESIDUAL_H
#define GRID_CONJUGATE_RESIDUAL_H
namespace Grid {
/////////////////////////////////////////////////////////////
// Base classes for iterative processes based on operators
// single input vec, single output vec.
/////////////////////////////////////////////////////////////
template<class Field>
class ConjugateResidual : public OperatorFunction<Field> {
public:
RealD Tolerance;
Integer MaxIterations;
int verbose;
ConjugateResidual(RealD tol,Integer maxit) : Tolerance(tol), MaxIterations(maxit) {
verbose=0;
};
void operator() (LinearOperatorBase<Field> &Linop,const Field &src, Field &psi){
RealD a, b, c, d;
RealD cp, ssq,rsq;
RealD rAr, rAAr, rArp;
RealD pAp, pAAp;
GridBase *grid = src._grid;
psi=zero;
Field r(grid), p(grid), Ap(grid), Ar(grid);
r=src;
p=src;
Linop.HermOpAndNorm(p,Ap,pAp,pAAp);
Linop.HermOpAndNorm(r,Ar,rAr,rAAr);
cp =norm2(r);
ssq=norm2(src);
rsq=Tolerance*Tolerance*ssq;
if (verbose) std::cout<<GridLogMessage<<"ConjugateResidual: iteration " <<0<<" residual "<<cp<< " target"<< rsq<<std::endl;
for(int k=1;k<MaxIterations;k++){
a = rAr/pAAp;
axpy(psi,a,p,psi);
cp = axpy_norm(r,-a,Ap,r);
rArp=rAr;
Linop.HermOpAndNorm(r,Ar,rAr,rAAr);
b =rAr/rArp;
axpy(p,b,p,r);
pAAp=axpy_norm(Ap,b,Ap,Ar);
if(verbose) std::cout<<GridLogMessage<<"ConjugateResidual: iteration " <<k<<" residual "<<cp<< " target"<< rsq<<std::endl;
if(cp<rsq) {
Linop.HermOp(psi,Ap);
axpy(r,-1.0,src,Ap);
RealD true_resid = norm2(r)/ssq;
std::cout<<GridLogMessage<<"ConjugateResidual: Converged on iteration " <<k
<< " computed residual "<<sqrt(cp/ssq)
<< " true residual "<<sqrt(true_resid)
<< " target " <<Tolerance <<std::endl;
return;
}
}
std::cout<<GridLogMessage<<"ConjugateResidual did NOT converge"<<std::endl;
assert(0);
}
};
}
#endif
+104
View File
@@ -0,0 +1,104 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/ImplicitlyRestartedLanczos.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_DEFLATION_H
#define GRID_DEFLATION_H
namespace Grid {
template<class Field>
class ZeroGuesser: public LinearFunction<Field> {
public:
virtual void operator()(const Field &src, Field &guess) { guess = zero; };
};
template<class Field>
class SourceGuesser: public LinearFunction<Field> {
public:
virtual void operator()(const Field &src, Field &guess) { guess = src; };
};
////////////////////////////////
// Fine grid deflation
////////////////////////////////
template<class Field>
class DeflatedGuesser: public LinearFunction<Field> {
private:
const std::vector<Field> &evec;
const std::vector<RealD> &eval;
public:
DeflatedGuesser(const std::vector<Field> & _evec,const std::vector<RealD> & _eval) : evec(_evec), eval(_eval) {};
virtual void operator()(const Field &src,Field &guess) {
guess = zero;
assert(evec.size()==eval.size());
auto N = evec.size();
for (int i=0;i<N;i++) {
const Field& tmp = evec[i];
axpy(guess,TensorRemove(innerProduct(tmp,src)) / eval[i],tmp,guess);
}
guess.checkerboard = src.checkerboard;
}
};
template<class FineField, class CoarseField>
class LocalCoherenceDeflatedGuesser: public LinearFunction<FineField> {
private:
const std::vector<FineField> &subspace;
const std::vector<CoarseField> &evec_coarse;
const std::vector<RealD> &eval_coarse;
public:
LocalCoherenceDeflatedGuesser(const std::vector<FineField> &_subspace,
const std::vector<CoarseField> &_evec_coarse,
const std::vector<RealD> &_eval_coarse)
: subspace(_subspace),
evec_coarse(_evec_coarse),
eval_coarse(_eval_coarse)
{
}
void operator()(const FineField &src,FineField &guess) {
int N = (int)evec_coarse.size();
CoarseField src_coarse(evec_coarse[0]._grid);
CoarseField guess_coarse(evec_coarse[0]._grid); guess_coarse = zero;
blockProject(src_coarse,src,subspace);
for (int i=0;i<N;i++) {
const CoarseField & tmp = evec_coarse[i];
axpy(guess_coarse,TensorRemove(innerProduct(tmp,src_coarse)) / eval_coarse[i],tmp,guess_coarse);
}
blockPromote(guess_coarse,guess,subspace);
guess.checkerboard = src.checkerboard;
};
};
}
#endif
@@ -0,0 +1,842 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/ImplicitlyRestartedLanczos.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: paboyle <paboyle@ph.ed.ac.uk>
Author: Chulwoo Jung <chulwoo@bnl.gov>
Author: Christoph Lehner <clehner@bnl.gov>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_BIRL_H
#define GRID_BIRL_H
#include <string.h> //memset
//#include <zlib.h>
#include <sys/stat.h>
namespace Grid {
////////////////////////////////////////////////////////
// Move following 100 LOC to lattice/Lattice_basis.h
////////////////////////////////////////////////////////
template<class Field>
void basisOrthogonalize(std::vector<Field> &basis,Field &w,int k)
{
for(int j=0; j<k; ++j){
auto ip = innerProduct(basis[j],w);
w = w - ip*basis[j];
}
}
template<class Field>
void basisRotate(std::vector<Field> &basis,Eigen::MatrixXd& Qt,int j0, int j1, int k0,int k1,int Nm)
{
typedef typename Field::vector_object vobj;
GridBase* grid = basis[0]._grid;
parallel_region
{
std::vector < vobj , commAllocator<vobj> > B(Nm); // Thread private
parallel_for_internal(int ss=0;ss < grid->oSites();ss++){
for(int j=j0; j<j1; ++j) B[j]=0.;
for(int j=j0; j<j1; ++j){
for(int k=k0; k<k1; ++k){
B[j] +=Qt(j,k) * basis[k]._odata[ss];
}
}
for(int j=j0; j<j1; ++j){
basis[j]._odata[ss] = B[j];
}
}
}
}
// Extract a single rotated vector
template<class Field>
void basisRotateJ(Field &result,std::vector<Field> &basis,Eigen::MatrixXd& Qt,int j, int k0,int k1,int Nm)
{
typedef typename Field::vector_object vobj;
GridBase* grid = basis[0]._grid;
result.checkerboard = basis[0].checkerboard;
parallel_for(int ss=0;ss < grid->oSites();ss++){
vobj B = zero;
for(int k=k0; k<k1; ++k){
B +=Qt(j,k) * basis[k]._odata[ss];
}
result._odata[ss] = B;
}
}
template<class Field>
void basisReorderInPlace(std::vector<Field> &_v,std::vector<RealD>& sort_vals, std::vector<int>& idx)
{
int vlen = idx.size();
assert(vlen>=1);
assert(vlen<=sort_vals.size());
assert(vlen<=_v.size());
for (size_t i=0;i<vlen;i++) {
if (idx[i] != i) {
//////////////////////////////////////
// idx[i] is a table of desired sources giving a permutation.
// Swap v[i] with v[idx[i]].
// Find j>i for which _vnew[j] = _vold[i],
// track the move idx[j] => idx[i]
// track the move idx[i] => i
//////////////////////////////////////
size_t j;
for (j=i;j<idx.size();j++)
if (idx[j]==i)
break;
assert(idx[i] > i); assert(j!=idx.size()); assert(idx[j]==i);
std::swap(_v[i]._odata,_v[idx[i]]._odata); // should use vector move constructor, no data copy
std::swap(sort_vals[i],sort_vals[idx[i]]);
idx[j] = idx[i];
idx[i] = i;
}
}
}
inline std::vector<int> basisSortGetIndex(std::vector<RealD>& sort_vals)
{
std::vector<int> idx(sort_vals.size());
std::iota(idx.begin(), idx.end(), 0);
// sort indexes based on comparing values in v
std::sort(idx.begin(), idx.end(), [&sort_vals](int i1, int i2) {
return ::fabs(sort_vals[i1]) < ::fabs(sort_vals[i2]);
});
return idx;
}
template<class Field>
void basisSortInPlace(std::vector<Field> & _v,std::vector<RealD>& sort_vals, bool reverse)
{
std::vector<int> idx = basisSortGetIndex(sort_vals);
if (reverse)
std::reverse(idx.begin(), idx.end());
basisReorderInPlace(_v,sort_vals,idx);
}
/////////////////////////////////////////////////////////////
// Implicitly restarted lanczos
/////////////////////////////////////////////////////////////
template<class Field> class ImplicitlyRestartedLanczosTester
{
public:
virtual int TestConvergence(int j,RealD resid,Field &evec, RealD &eval,RealD evalMaxApprox)=0;
virtual int ReconstructEval(int j,RealD resid,Field &evec, RealD &eval,RealD evalMaxApprox)=0;
};
enum IRLdiagonalisation {
IRLdiagonaliseWithDSTEGR,
IRLdiagonaliseWithQR,
IRLdiagonaliseWithEigen
};
template<class Field> class ImplicitlyRestartedLanczosHermOpTester : public ImplicitlyRestartedLanczosTester<Field>
{
public:
LinearFunction<Field> &_HermOp;
ImplicitlyRestartedLanczosHermOpTester(LinearFunction<Field> &HermOp) : _HermOp(HermOp) { };
int ReconstructEval(int j,RealD resid,Field &B, RealD &eval,RealD evalMaxApprox)
{
return TestConvergence(j,resid,B,eval,evalMaxApprox);
}
int TestConvergence(int j,RealD eresid,Field &B, RealD &eval,RealD evalMaxApprox)
{
Field v(B);
RealD eval_poly = eval;
// Apply operator
_HermOp(B,v);
RealD vnum = real(innerProduct(B,v)); // HermOp.
RealD vden = norm2(B);
RealD vv0 = norm2(v);
eval = vnum/vden;
v -= eval*B;
RealD vv = norm2(v) / ::pow(evalMaxApprox,2.0);
std::cout.precision(13);
std::cout<<GridLogIRL << "[" << std::setw(3)<<j<<"] "
<<"eval = "<<std::setw(25)<< eval << " (" << eval_poly << ")"
<<" |H B[i] - eval[i]B[i]|^2 / evalMaxApprox^2 " << std::setw(25) << vv
<<std::endl;
int conv=0;
if( (vv<eresid*eresid) ) conv = 1;
return conv;
}
};
template<class Field>
class ImplicitlyRestartedLanczos {
private:
const RealD small = 1.0e-8;
int MaxIter;
int MinRestart; // Minimum number of restarts; only check for convergence after
int Nstop; // Number of evecs checked for convergence
int Nk; // Number of converged sought
// int Np; // Np -- Number of spare vecs in krylov space // == Nm - Nk
int Nm; // Nm -- total number of vectors
IRLdiagonalisation diagonalisation;
int orth_period;
RealD OrthoTime;
RealD eresid, betastp;
////////////////////////////////
// Embedded objects
////////////////////////////////
LinearFunction<Field> &_PolyOp;
LinearFunction<Field> &_HermOp;
ImplicitlyRestartedLanczosTester<Field> &_Tester;
// Default tester provided (we need a ref to something in default case)
ImplicitlyRestartedLanczosHermOpTester<Field> SimpleTester;
/////////////////////////
// Constructor
/////////////////////////
public:
//////////////////////////////////////////////////////////////////
// PAB:
//////////////////////////////////////////////////////////////////
// Too many options & knobs.
// Eliminate:
// orth_period
// betastp
// MinRestart
//
// Do we really need orth_period
// What is the theoretical basis & guarantees of betastp ?
// Nstop=Nk viable?
// MinRestart avoidable with new convergence test?
// Could cut to PolyOp, HermOp, Tester, Nk, Nm, resid, maxiter (+diagonalisation)
// HermOp could be eliminated if we dropped the Power method for max eval.
// -- also: The eval, eval2, eval2_copy stuff is still unnecessarily unclear
//////////////////////////////////////////////////////////////////
ImplicitlyRestartedLanczos(LinearFunction<Field> & PolyOp,
LinearFunction<Field> & HermOp,
ImplicitlyRestartedLanczosTester<Field> & Tester,
int _Nstop, // sought vecs
int _Nk, // sought vecs
int _Nm, // spare vecs
RealD _eresid, // resid in lmdue deficit
int _MaxIter, // Max iterations
RealD _betastp=0.0, // if beta(k) < betastp: converged
int _MinRestart=1, int _orth_period = 1,
IRLdiagonalisation _diagonalisation= IRLdiagonaliseWithEigen) :
SimpleTester(HermOp), _PolyOp(PolyOp), _HermOp(HermOp), _Tester(Tester),
Nstop(_Nstop) , Nk(_Nk), Nm(_Nm),
eresid(_eresid), betastp(_betastp),
MaxIter(_MaxIter) , MinRestart(_MinRestart),
orth_period(_orth_period), diagonalisation(_diagonalisation) { };
ImplicitlyRestartedLanczos(LinearFunction<Field> & PolyOp,
LinearFunction<Field> & HermOp,
int _Nstop, // sought vecs
int _Nk, // sought vecs
int _Nm, // spare vecs
RealD _eresid, // resid in lmdue deficit
int _MaxIter, // Max iterations
RealD _betastp=0.0, // if beta(k) < betastp: converged
int _MinRestart=1, int _orth_period = 1,
IRLdiagonalisation _diagonalisation= IRLdiagonaliseWithEigen) :
SimpleTester(HermOp), _PolyOp(PolyOp), _HermOp(HermOp), _Tester(SimpleTester),
Nstop(_Nstop) , Nk(_Nk), Nm(_Nm),
eresid(_eresid), betastp(_betastp),
MaxIter(_MaxIter) , MinRestart(_MinRestart),
orth_period(_orth_period), diagonalisation(_diagonalisation) { };
////////////////////////////////
// Helpers
////////////////////////////////
template<typename T> static RealD normalise(T& v)
{
RealD nn = norm2(v);
nn = sqrt(nn);
v = v * (1.0/nn);
return nn;
}
void orthogonalize(Field& w, std::vector<Field>& evec,int k)
{
OrthoTime-=usecond()/1e6;
basisOrthogonalize(evec,w,k);
normalise(w);
OrthoTime+=usecond()/1e6;
}
/* Rudy Arthur's thesis pp.137
------------------------
Require: M > K P = M K †
Compute the factorization AVM = VM HM + fM eM
repeat
Q=I
for i = 1,...,P do
QiRi =HM −θiI Q = QQi
H M = Q †i H M Q i
end for
βK =HM(K+1,K) σK =Q(M,K)
r=vK+1βK +rσK
VK =VM(1:M)Q(1:M,1:K)
HK =HM(1:K,1:K)
→AVK =VKHK +fKe†K † Extend to an M = K + P step factorization AVM = VMHM + fMeM
until convergence
*/
void calc(std::vector<RealD>& eval, std::vector<Field>& evec, const Field& src, int& Nconv, bool reverse=false)
{
GridBase *grid = src._grid;
assert(grid == evec[0]._grid);
GridLogIRL.TimingMode(1);
std::cout << GridLogIRL <<"**************************************************************************"<< std::endl;
std::cout << GridLogIRL <<" ImplicitlyRestartedLanczos::calc() starting iteration 0 / "<< MaxIter<< std::endl;
std::cout << GridLogIRL <<"**************************************************************************"<< std::endl;
std::cout << GridLogIRL <<" -- seek Nk = " << Nk <<" vectors"<< std::endl;
std::cout << GridLogIRL <<" -- accept Nstop = " << Nstop <<" vectors"<< std::endl;
std::cout << GridLogIRL <<" -- total Nm = " << Nm <<" vectors"<< std::endl;
std::cout << GridLogIRL <<" -- size of eval = " << eval.size() << std::endl;
std::cout << GridLogIRL <<" -- size of evec = " << evec.size() << std::endl;
if ( diagonalisation == IRLdiagonaliseWithDSTEGR ) {
std::cout << GridLogIRL << "Diagonalisation is DSTEGR "<<std::endl;
} else if ( diagonalisation == IRLdiagonaliseWithQR ) {
std::cout << GridLogIRL << "Diagonalisation is QR "<<std::endl;
} else if ( diagonalisation == IRLdiagonaliseWithEigen ) {
std::cout << GridLogIRL << "Diagonalisation is Eigen "<<std::endl;
}
std::cout << GridLogIRL <<"**************************************************************************"<< std::endl;
assert(Nm <= evec.size() && Nm <= eval.size());
// quickly get an idea of the largest eigenvalue to more properly normalize the residuum
RealD evalMaxApprox = 0.0;
{
auto src_n = src;
auto tmp = src;
const int _MAX_ITER_IRL_MEVAPP_ = 50;
for (int i=0;i<_MAX_ITER_IRL_MEVAPP_;i++) {
normalise(src_n);
_HermOp(src_n,tmp);
RealD vnum = real(innerProduct(src_n,tmp)); // HermOp.
RealD vden = norm2(src_n);
RealD na = vnum/vden;
if (fabs(evalMaxApprox/na - 1.0) < 0.05)
i=_MAX_ITER_IRL_MEVAPP_;
evalMaxApprox = na;
std::cout << GridLogIRL << " Approximation of largest eigenvalue: " << evalMaxApprox << std::endl;
src_n = tmp;
}
}
std::vector<RealD> lme(Nm);
std::vector<RealD> lme2(Nm);
std::vector<RealD> eval2(Nm);
std::vector<RealD> eval2_copy(Nm);
Eigen::MatrixXd Qt = Eigen::MatrixXd::Zero(Nm,Nm);
Field f(grid);
Field v(grid);
int k1 = 1;
int k2 = Nk;
RealD beta_k;
Nconv = 0;
// Set initial vector
evec[0] = src;
normalise(evec[0]);
// Initial Nk steps
OrthoTime=0.;
for(int k=0; k<Nk; ++k) step(eval,lme,evec,f,Nm,k);
std::cout<<GridLogIRL <<"Initial "<< Nk <<"steps done "<<std::endl;
std::cout<<GridLogIRL <<"Initial steps:OrthoTime "<<OrthoTime<< "seconds"<<std::endl;
//////////////////////////////////
// Restarting loop begins
//////////////////////////////////
int iter;
for(iter = 0; iter<MaxIter; ++iter){
OrthoTime=0.;
std::cout<< GridLogMessage <<" **********************"<< std::endl;
std::cout<< GridLogMessage <<" Restart iteration = "<< iter << std::endl;
std::cout<< GridLogMessage <<" **********************"<< std::endl;
std::cout<<GridLogIRL <<" running "<<Nm-Nk <<" steps: "<<std::endl;
for(int k=Nk; k<Nm; ++k) step(eval,lme,evec,f,Nm,k);
f *= lme[Nm-1];
std::cout<<GridLogIRL <<" "<<Nm-Nk <<" steps done "<<std::endl;
std::cout<<GridLogIRL <<"Initial steps:OrthoTime "<<OrthoTime<< "seconds"<<std::endl;
//////////////////////////////////
// getting eigenvalues
//////////////////////////////////
for(int k=0; k<Nm; ++k){
eval2[k] = eval[k+k1-1];
lme2[k] = lme[k+k1-1];
}
Qt = Eigen::MatrixXd::Identity(Nm,Nm);
diagonalize(eval2,lme2,Nm,Nm,Qt,grid);
std::cout<<GridLogIRL <<" diagonalized "<<std::endl;
//////////////////////////////////
// sorting
//////////////////////////////////
eval2_copy = eval2;
std::partial_sort(eval2.begin(),eval2.begin()+Nm,eval2.end(),std::greater<RealD>());
std::cout<<GridLogIRL <<" evals sorted "<<std::endl;
const int chunk=8;
for(int io=0; io<k2;io+=chunk){
std::cout<<GridLogIRL << "eval "<< std::setw(3) << io ;
for(int ii=0;ii<chunk;ii++){
if ( (io+ii)<k2 )
std::cout<< " "<< std::setw(12)<< eval2[io+ii];
}
std::cout << std::endl;
}
//////////////////////////////////
// Implicitly shifted QR transformations
//////////////////////////////////
Qt = Eigen::MatrixXd::Identity(Nm,Nm);
for(int ip=k2; ip<Nm; ++ip){
QR_decomp(eval,lme,Nm,Nm,Qt,eval2[ip],k1,Nm);
}
std::cout<<GridLogIRL <<"QR decomposed "<<std::endl;
assert(k2<Nm); assert(k2<Nm); assert(k1>0);
basisRotate(evec,Qt,k1-1,k2+1,0,Nm,Nm); /// big constraint on the basis
std::cout<<GridLogIRL <<"basisRotated by Qt"<<std::endl;
////////////////////////////////////////////////////
// Compressed vector f and beta(k2)
////////////////////////////////////////////////////
f *= Qt(k2-1,Nm-1);
f += lme[k2-1] * evec[k2];
beta_k = norm2(f);
beta_k = sqrt(beta_k);
std::cout<<GridLogIRL<<" beta(k) = "<<beta_k<<std::endl;
RealD betar = 1.0/beta_k;
evec[k2] = betar * f;
lme[k2-1] = beta_k;
////////////////////////////////////////////////////
// Convergence test
////////////////////////////////////////////////////
for(int k=0; k<Nm; ++k){
eval2[k] = eval[k];
lme2[k] = lme[k];
}
Qt = Eigen::MatrixXd::Identity(Nm,Nm);
diagonalize(eval2,lme2,Nk,Nm,Qt,grid);
std::cout<<GridLogIRL <<" Diagonalized "<<std::endl;
Nconv = 0;
if (iter >= MinRestart) {
std::cout << GridLogIRL << "Test convergence: rotate subset of vectors to test convergence " << std::endl;
Field B(grid); B.checkerboard = evec[0].checkerboard;
// power of two search pattern; not every evalue in eval2 is assessed.
int allconv =1;
for(int jj = 1; jj<=Nstop; jj*=2){
int j = Nstop-jj;
RealD e = eval2_copy[j]; // Discard the evalue
basisRotateJ(B,evec,Qt,j,0,Nk,Nm);
if( !_Tester.TestConvergence(j,eresid,B,e,evalMaxApprox) ) {
allconv=0;
}
}
// Do evec[0] for good measure
{
int j=0;
RealD e = eval2_copy[0];
basisRotateJ(B,evec,Qt,j,0,Nk,Nm);
if( !_Tester.TestConvergence(j,eresid,B,e,evalMaxApprox) ) allconv=0;
}
if ( allconv ) Nconv = Nstop;
// test if we converged, if so, terminate
std::cout<<GridLogIRL<<" #modes converged: >= "<<Nconv<<"/"<<Nstop<<std::endl;
// if( Nconv>=Nstop || beta_k < betastp){
if( Nconv>=Nstop){
goto converged;
}
} else {
std::cout << GridLogIRL << "iter < MinRestart: do not yet test for convergence\n";
} // end of iter loop
}
std::cout<<GridLogError<<"\n NOT converged.\n";
abort();
converged:
{
Field B(grid); B.checkerboard = evec[0].checkerboard;
basisRotate(evec,Qt,0,Nk,0,Nk,Nm);
std::cout << GridLogIRL << " Rotated basis"<<std::endl;
Nconv=0;
//////////////////////////////////////////////////////////////////////
// Full final convergence test; unconditionally applied
//////////////////////////////////////////////////////////////////////
for(int j = 0; j<=Nk; j++){
B=evec[j];
if( _Tester.ReconstructEval(j,eresid,B,eval2[j],evalMaxApprox) ) {
Nconv++;
}
}
if ( Nconv < Nstop )
std::cout << GridLogIRL << "Nconv ("<<Nconv<<") < Nstop ("<<Nstop<<")"<<std::endl;
eval=eval2;
//Keep only converged
eval.resize(Nconv);// Nstop?
evec.resize(Nconv,grid);// Nstop?
basisSortInPlace(evec,eval,reverse);
}
std::cout << GridLogIRL <<"**************************************************************************"<< std::endl;
std::cout << GridLogIRL << "ImplicitlyRestartedLanczos CONVERGED ; Summary :\n";
std::cout << GridLogIRL <<"**************************************************************************"<< std::endl;
std::cout << GridLogIRL << " -- Iterations = "<< iter << "\n";
std::cout << GridLogIRL << " -- beta(k) = "<< beta_k << "\n";
std::cout << GridLogIRL << " -- Nconv = "<< Nconv << "\n";
std::cout << GridLogIRL <<"**************************************************************************"<< std::endl;
}
private:
/* Saad PP. 195
1. Choose an initial vector v1 of 2-norm unity. Set β1 ≡ 0, v0 ≡ 0
2. For k = 1,2,...,m Do:
3. wk:=Avk−βkv_{k1}
4. αk:=(wk,vk) //
5. wk:=wk−αkvk // wk orthog vk
6. βk+1 := ∥wk∥2. If βk+1 = 0 then Stop
7. vk+1 := wk/βk+1
8. EndDo
*/
void step(std::vector<RealD>& lmd,
std::vector<RealD>& lme,
std::vector<Field>& evec,
Field& w,int Nm,int k)
{
const RealD tiny = 1.0e-20;
assert( k< Nm );
GridStopWatch gsw_op,gsw_o;
Field& evec_k = evec[k];
_PolyOp(evec_k,w); std::cout<<GridLogIRL << "PolyOp" <<std::endl;
if(k>0) w -= lme[k-1] * evec[k-1];
ComplexD zalph = innerProduct(evec_k,w); // 4. αk:=(wk,vk)
RealD alph = real(zalph);
w = w - alph * evec_k;// 5. wk:=wk−αkvk
RealD beta = normalise(w); // 6. βk+1 := ∥wk∥2. If βk+1 = 0 then Stop
// 7. vk+1 := wk/βk+1
lmd[k] = alph;
lme[k] = beta;
if (k>0 && k % orth_period == 0) {
orthogonalize(w,evec,k); // orthonormalise
std::cout<<GridLogIRL << "Orthogonalised " <<std::endl;
}
if(k < Nm-1) evec[k+1] = w;
std::cout<<GridLogIRL << "alpha[" << k << "] = " << zalph << " beta[" << k << "] = "<<beta<<std::endl;
if ( beta < tiny )
std::cout<<GridLogIRL << " beta is tiny "<<beta<<std::endl;
}
void diagonalize_Eigen(std::vector<RealD>& lmd, std::vector<RealD>& lme,
int Nk, int Nm,
Eigen::MatrixXd & Qt, // Nm x Nm
GridBase *grid)
{
Eigen::MatrixXd TriDiag = Eigen::MatrixXd::Zero(Nk,Nk);
for(int i=0;i<Nk;i++) TriDiag(i,i) = lmd[i];
for(int i=0;i<Nk-1;i++) TriDiag(i,i+1) = lme[i];
for(int i=0;i<Nk-1;i++) TriDiag(i+1,i) = lme[i];
Eigen::SelfAdjointEigenSolver<Eigen::MatrixXd> eigensolver(TriDiag);
for (int i = 0; i < Nk; i++) {
lmd[Nk-1-i] = eigensolver.eigenvalues()(i);
}
for (int i = 0; i < Nk; i++) {
for (int j = 0; j < Nk; j++) {
Qt(Nk-1-i,j) = eigensolver.eigenvectors()(j,i);
}
}
}
///////////////////////////////////////////////////////////////////////////
// File could end here if settle on Eigen ??? !!!
///////////////////////////////////////////////////////////////////////////
void QR_decomp(std::vector<RealD>& lmd, // Nm
std::vector<RealD>& lme, // Nm
int Nk, int Nm, // Nk, Nm
Eigen::MatrixXd& Qt, // Nm x Nm matrix
RealD Dsh, int kmin, int kmax)
{
int k = kmin-1;
RealD x;
RealD Fden = 1.0/hypot(lmd[k]-Dsh,lme[k]);
RealD c = ( lmd[k] -Dsh) *Fden;
RealD s = -lme[k] *Fden;
RealD tmpa1 = lmd[k];
RealD tmpa2 = lmd[k+1];
RealD tmpb = lme[k];
lmd[k] = c*c*tmpa1 +s*s*tmpa2 -2.0*c*s*tmpb;
lmd[k+1] = s*s*tmpa1 +c*c*tmpa2 +2.0*c*s*tmpb;
lme[k] = c*s*(tmpa1-tmpa2) +(c*c-s*s)*tmpb;
x =-s*lme[k+1];
lme[k+1] = c*lme[k+1];
for(int i=0; i<Nk; ++i){
RealD Qtmp1 = Qt(k,i);
RealD Qtmp2 = Qt(k+1,i);
Qt(k,i) = c*Qtmp1 - s*Qtmp2;
Qt(k+1,i)= s*Qtmp1 + c*Qtmp2;
}
// Givens transformations
for(int k = kmin; k < kmax-1; ++k){
RealD Fden = 1.0/hypot(x,lme[k-1]);
RealD c = lme[k-1]*Fden;
RealD s = - x*Fden;
RealD tmpa1 = lmd[k];
RealD tmpa2 = lmd[k+1];
RealD tmpb = lme[k];
lmd[k] = c*c*tmpa1 +s*s*tmpa2 -2.0*c*s*tmpb;
lmd[k+1] = s*s*tmpa1 +c*c*tmpa2 +2.0*c*s*tmpb;
lme[k] = c*s*(tmpa1-tmpa2) +(c*c-s*s)*tmpb;
lme[k-1] = c*lme[k-1] -s*x;
if(k != kmax-2){
x = -s*lme[k+1];
lme[k+1] = c*lme[k+1];
}
for(int i=0; i<Nk; ++i){
RealD Qtmp1 = Qt(k,i);
RealD Qtmp2 = Qt(k+1,i);
Qt(k,i) = c*Qtmp1 -s*Qtmp2;
Qt(k+1,i) = s*Qtmp1 +c*Qtmp2;
}
}
}
void diagonalize(std::vector<RealD>& lmd, std::vector<RealD>& lme,
int Nk, int Nm,
Eigen::MatrixXd & Qt,
GridBase *grid)
{
Qt = Eigen::MatrixXd::Identity(Nm,Nm);
if ( diagonalisation == IRLdiagonaliseWithDSTEGR ) {
diagonalize_lapack(lmd,lme,Nk,Nm,Qt,grid);
} else if ( diagonalisation == IRLdiagonaliseWithQR ) {
diagonalize_QR(lmd,lme,Nk,Nm,Qt,grid);
} else if ( diagonalisation == IRLdiagonaliseWithEigen ) {
diagonalize_Eigen(lmd,lme,Nk,Nm,Qt,grid);
} else {
assert(0);
}
}
#ifdef USE_LAPACK
void LAPACK_dstegr(char *jobz, char *range, int *n, double *d, double *e,
double *vl, double *vu, int *il, int *iu, double *abstol,
int *m, double *w, double *z, int *ldz, int *isuppz,
double *work, int *lwork, int *iwork, int *liwork,
int *info);
#endif
void diagonalize_lapack(std::vector<RealD>& lmd,
std::vector<RealD>& lme,
int Nk, int Nm,
Eigen::MatrixXd& Qt,
GridBase *grid)
{
#ifdef USE_LAPACK
const int size = Nm;
int NN = Nk;
double evals_tmp[NN];
double evec_tmp[NN][NN];
memset(evec_tmp[0],0,sizeof(double)*NN*NN);
double DD[NN];
double EE[NN];
for (int i = 0; i< NN; i++) {
for (int j = i - 1; j <= i + 1; j++) {
if ( j < NN && j >= 0 ) {
if (i==j) DD[i] = lmd[i];
if (i==j) evals_tmp[i] = lmd[i];
if (j==(i-1)) EE[j] = lme[j];
}
}
}
int evals_found;
int lwork = ( (18*NN) > (1+4*NN+NN*NN)? (18*NN):(1+4*NN+NN*NN)) ;
int liwork = 3+NN*10 ;
int iwork[liwork];
double work[lwork];
int isuppz[2*NN];
char jobz = 'V'; // calculate evals & evecs
char range = 'I'; // calculate all evals
// char range = 'A'; // calculate all evals
char uplo = 'U'; // refer to upper half of original matrix
char compz = 'I'; // Compute eigenvectors of tridiagonal matrix
int ifail[NN];
int info;
int total = grid->_Nprocessors;
int node = grid->_processor;
int interval = (NN/total)+1;
double vl = 0.0, vu = 0.0;
int il = interval*node+1 , iu = interval*(node+1);
if (iu > NN) iu=NN;
double tol = 0.0;
if (1) {
memset(evals_tmp,0,sizeof(double)*NN);
if ( il <= NN){
LAPACK_dstegr(&jobz, &range, &NN,
(double*)DD, (double*)EE,
&vl, &vu, &il, &iu, // these four are ignored if second parameteris 'A'
&tol, // tolerance
&evals_found, evals_tmp, (double*)evec_tmp, &NN,
isuppz,
work, &lwork, iwork, &liwork,
&info);
for (int i = iu-1; i>= il-1; i--){
evals_tmp[i] = evals_tmp[i - (il-1)];
if (il>1) evals_tmp[i-(il-1)]=0.;
for (int j = 0; j< NN; j++){
evec_tmp[i][j] = evec_tmp[i - (il-1)][j];
if (il>1) evec_tmp[i-(il-1)][j]=0.;
}
}
}
{
grid->GlobalSumVector(evals_tmp,NN);
grid->GlobalSumVector((double*)evec_tmp,NN*NN);
}
}
// Safer to sort instead of just reversing it,
// but the document of the routine says evals are sorted in increasing order.
// qr gives evals in decreasing order.
for(int i=0;i<NN;i++){
lmd [NN-1-i]=evals_tmp[i];
for(int j=0;j<NN;j++){
Qt((NN-1-i),j)=evec_tmp[i][j];
}
}
#else
assert(0);
#endif
}
void diagonalize_QR(std::vector<RealD>& lmd, std::vector<RealD>& lme,
int Nk, int Nm,
Eigen::MatrixXd & Qt,
GridBase *grid)
{
int QRiter = 100*Nm;
int kmin = 1;
int kmax = Nk;
// (this should be more sophisticated)
for(int iter=0; iter<QRiter; ++iter){
// determination of 2x2 leading submatrix
RealD dsub = lmd[kmax-1]-lmd[kmax-2];
RealD dd = sqrt(dsub*dsub + 4.0*lme[kmax-2]*lme[kmax-2]);
RealD Dsh = 0.5*(lmd[kmax-2]+lmd[kmax-1] +dd*(dsub/fabs(dsub)));
// (Dsh: shift)
// transformation
QR_decomp(lmd,lme,Nk,Nm,Qt,Dsh,kmin,kmax); // Nk, Nm
// Convergence criterion (redef of kmin and kamx)
for(int j=kmax-1; j>= kmin; --j){
RealD dds = fabs(lmd[j-1])+fabs(lmd[j]);
if(fabs(lme[j-1])+dds > dds){
kmax = j+1;
goto continued;
}
}
QRiter = iter;
return;
continued:
for(int j=0; j<kmax-1; ++j){
RealD dds = fabs(lmd[j])+fabs(lmd[j+1]);
if(fabs(lme[j])+dds > dds){
kmin = j+1;
break;
}
}
}
std::cout << GridLogError << "[QL method] Error - Too many iteration: "<<QRiter<<"\n";
abort();
}
};
}
#endif
@@ -0,0 +1,406 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/LocalCoherenceLanczos.h
Copyright (C) 2015
Author: Christoph Lehner <clehner@bnl.gov>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LOCAL_COHERENCE_IRL_H
#define GRID_LOCAL_COHERENCE_IRL_H
namespace Grid {
struct LanczosParams : Serializable {
public:
GRID_SERIALIZABLE_CLASS_MEMBERS(LanczosParams,
ChebyParams, Cheby,/*Chebyshev*/
int, Nstop, /*Vecs in Lanczos must converge Nstop < Nk < Nm*/
int, Nk, /*Vecs in Lanczos seek converge*/
int, Nm, /*Total vecs in Lanczos include restart*/
RealD, resid, /*residual*/
int, MaxIt,
RealD, betastp, /* ? */
int, MinRes); // Must restart
};
struct LocalCoherenceLanczosParams : Serializable {
public:
GRID_SERIALIZABLE_CLASS_MEMBERS(LocalCoherenceLanczosParams,
bool, saveEvecs,
bool, doFine,
bool, doFineRead,
bool, doCoarse,
bool, doCoarseRead,
LanczosParams, FineParams,
LanczosParams, CoarseParams,
ChebyParams, Smoother,
RealD , coarse_relax_tol,
std::vector<int>, blockSize,
std::string, config,
std::vector < std::complex<double> >, omega,
RealD, mass,
RealD, M5);
};
// Duplicate functionality; ProjectedFunctionHermOp could be used with the trivial function
template<class Fobj,class CComplex,int nbasis>
class ProjectedHermOp : public LinearFunction<Lattice<iVector<CComplex,nbasis > > > {
public:
typedef iVector<CComplex,nbasis > CoarseSiteVector;
typedef Lattice<CoarseSiteVector> CoarseField;
typedef Lattice<CComplex> CoarseScalar; // used for inner products on fine field
typedef Lattice<Fobj> FineField;
LinearOperatorBase<FineField> &_Linop;
std::vector<FineField> &subspace;
ProjectedHermOp(LinearOperatorBase<FineField>& linop, std::vector<FineField> & _subspace) :
_Linop(linop), subspace(_subspace)
{
assert(subspace.size() >0);
};
void operator()(const CoarseField& in, CoarseField& out) {
GridBase *FineGrid = subspace[0]._grid;
int checkerboard = subspace[0].checkerboard;
FineField fin (FineGrid); fin.checkerboard= checkerboard;
FineField fout(FineGrid); fout.checkerboard = checkerboard;
blockPromote(in,fin,subspace); std::cout<<GridLogIRL<<"ProjectedHermop : Promote to fine"<<std::endl;
_Linop.HermOp(fin,fout); std::cout<<GridLogIRL<<"ProjectedHermop : HermOp (fine) "<<std::endl;
blockProject(out,fout,subspace); std::cout<<GridLogIRL<<"ProjectedHermop : Project to coarse "<<std::endl;
}
};
template<class Fobj,class CComplex,int nbasis>
class ProjectedFunctionHermOp : public LinearFunction<Lattice<iVector<CComplex,nbasis > > > {
public:
typedef iVector<CComplex,nbasis > CoarseSiteVector;
typedef Lattice<CoarseSiteVector> CoarseField;
typedef Lattice<CComplex> CoarseScalar; // used for inner products on fine field
typedef Lattice<Fobj> FineField;
OperatorFunction<FineField> & _poly;
LinearOperatorBase<FineField> &_Linop;
std::vector<FineField> &subspace;
ProjectedFunctionHermOp(OperatorFunction<FineField> & poly,
LinearOperatorBase<FineField>& linop,
std::vector<FineField> & _subspace) :
_poly(poly),
_Linop(linop),
subspace(_subspace)
{ };
void operator()(const CoarseField& in, CoarseField& out) {
GridBase *FineGrid = subspace[0]._grid;
int checkerboard = subspace[0].checkerboard;
FineField fin (FineGrid); fin.checkerboard =checkerboard;
FineField fout(FineGrid);fout.checkerboard =checkerboard;
blockPromote(in,fin,subspace); std::cout<<GridLogIRL<<"ProjectedFunctionHermop : Promote to fine"<<std::endl;
_poly(_Linop,fin,fout); std::cout<<GridLogIRL<<"ProjectedFunctionHermop : Poly "<<std::endl;
blockProject(out,fout,subspace); std::cout<<GridLogIRL<<"ProjectedFunctionHermop : Project to coarse "<<std::endl;
}
};
template<class Fobj,class CComplex,int nbasis>
class ImplicitlyRestartedLanczosSmoothedTester : public ImplicitlyRestartedLanczosTester<Lattice<iVector<CComplex,nbasis > > >
{
public:
typedef iVector<CComplex,nbasis > CoarseSiteVector;
typedef Lattice<CoarseSiteVector> CoarseField;
typedef Lattice<CComplex> CoarseScalar; // used for inner products on fine field
typedef Lattice<Fobj> FineField;
LinearFunction<CoarseField> & _Poly;
OperatorFunction<FineField> & _smoother;
LinearOperatorBase<FineField> &_Linop;
RealD _coarse_relax_tol;
std::vector<FineField> &_subspace;
ImplicitlyRestartedLanczosSmoothedTester(LinearFunction<CoarseField> &Poly,
OperatorFunction<FineField> &smoother,
LinearOperatorBase<FineField> &Linop,
std::vector<FineField> &subspace,
RealD coarse_relax_tol=5.0e3)
: _smoother(smoother), _Linop(Linop), _Poly(Poly), _subspace(subspace),
_coarse_relax_tol(coarse_relax_tol)
{ };
int TestConvergence(int j,RealD eresid,CoarseField &B, RealD &eval,RealD evalMaxApprox)
{
CoarseField v(B);
RealD eval_poly = eval;
// Apply operator
_Poly(B,v);
RealD vnum = real(innerProduct(B,v)); // HermOp.
RealD vden = norm2(B);
RealD vv0 = norm2(v);
eval = vnum/vden;
v -= eval*B;
RealD vv = norm2(v) / ::pow(evalMaxApprox,2.0);
std::cout.precision(13);
std::cout<<GridLogIRL << "[" << std::setw(3)<<j<<"] "
<<"eval = "<<std::setw(25)<< eval << " (" << eval_poly << ")"
<<" |H B[i] - eval[i]B[i]|^2 / evalMaxApprox^2 " << std::setw(25) << vv
<<std::endl;
int conv=0;
if( (vv<eresid*eresid) ) conv = 1;
return conv;
}
int ReconstructEval(int j,RealD eresid,CoarseField &B, RealD &eval,RealD evalMaxApprox)
{
GridBase *FineGrid = _subspace[0]._grid;
int checkerboard = _subspace[0].checkerboard;
FineField fB(FineGrid);fB.checkerboard =checkerboard;
FineField fv(FineGrid);fv.checkerboard =checkerboard;
blockPromote(B,fv,_subspace);
_smoother(_Linop,fv,fB);
RealD eval_poly = eval;
_Linop.HermOp(fB,fv);
RealD vnum = real(innerProduct(fB,fv)); // HermOp.
RealD vden = norm2(fB);
RealD vv0 = norm2(fv);
eval = vnum/vden;
fv -= eval*fB;
RealD vv = norm2(fv) / ::pow(evalMaxApprox,2.0);
std::cout.precision(13);
std::cout<<GridLogIRL << "[" << std::setw(3)<<j<<"] "
<<"eval = "<<std::setw(25)<< eval << " (" << eval_poly << ")"
<<" |H B[i] - eval[i]B[i]|^2 / evalMaxApprox^2 " << std::setw(25) << vv
<<std::endl;
if ( j > nbasis ) eresid = eresid*_coarse_relax_tol;
if( (vv<eresid*eresid) ) return 1;
return 0;
}
};
////////////////////////////////////////////
// Make serializable Lanczos params
////////////////////////////////////////////
template<class Fobj,class CComplex,int nbasis>
class LocalCoherenceLanczos
{
public:
typedef iVector<CComplex,nbasis > CoarseSiteVector;
typedef Lattice<CComplex> CoarseScalar; // used for inner products on fine field
typedef Lattice<CoarseSiteVector> CoarseField;
typedef Lattice<Fobj> FineField;
protected:
GridBase *_CoarseGrid;
GridBase *_FineGrid;
int _checkerboard;
LinearOperatorBase<FineField> & _FineOp;
std::vector<RealD> &evals_fine;
std::vector<RealD> &evals_coarse;
std::vector<FineField> &subspace;
std::vector<CoarseField> &evec_coarse;
private:
std::vector<RealD> _evals_fine;
std::vector<RealD> _evals_coarse;
std::vector<FineField> _subspace;
std::vector<CoarseField> _evec_coarse;
public:
LocalCoherenceLanczos(GridBase *FineGrid,
GridBase *CoarseGrid,
LinearOperatorBase<FineField> &FineOp,
int checkerboard) :
_CoarseGrid(CoarseGrid),
_FineGrid(FineGrid),
_FineOp(FineOp),
_checkerboard(checkerboard),
evals_fine (_evals_fine),
evals_coarse(_evals_coarse),
subspace (_subspace),
evec_coarse(_evec_coarse)
{
evals_fine.resize(0);
evals_coarse.resize(0);
};
//////////////////////////////////////////////////////////////////////////
// Alternate constructore, external storage for use by Hadrons module
//////////////////////////////////////////////////////////////////////////
LocalCoherenceLanczos(GridBase *FineGrid,
GridBase *CoarseGrid,
LinearOperatorBase<FineField> &FineOp,
int checkerboard,
std::vector<FineField> &ext_subspace,
std::vector<CoarseField> &ext_coarse,
std::vector<RealD> &ext_eval_fine,
std::vector<RealD> &ext_eval_coarse
) :
_CoarseGrid(CoarseGrid),
_FineGrid(FineGrid),
_FineOp(FineOp),
_checkerboard(checkerboard),
evals_fine (ext_eval_fine),
evals_coarse(ext_eval_coarse),
subspace (ext_subspace),
evec_coarse (ext_coarse)
{
evals_fine.resize(0);
evals_coarse.resize(0);
};
void Orthogonalise(void ) {
CoarseScalar InnerProd(_CoarseGrid);
std::cout << GridLogMessage <<" Gramm-Schmidt pass 1"<<std::endl;
blockOrthogonalise(InnerProd,subspace);
std::cout << GridLogMessage <<" Gramm-Schmidt pass 2"<<std::endl;
blockOrthogonalise(InnerProd,subspace);
};
template<typename T> static RealD normalise(T& v)
{
RealD nn = norm2(v);
nn = ::sqrt(nn);
v = v * (1.0/nn);
return nn;
}
/*
void fakeFine(void)
{
int Nk = nbasis;
subspace.resize(Nk,_FineGrid);
subspace[0]=1.0;
subspace[0].checkerboard=_checkerboard;
normalise(subspace[0]);
PlainHermOp<FineField> Op(_FineOp);
for(int k=1;k<Nk;k++){
subspace[k].checkerboard=_checkerboard;
Op(subspace[k-1],subspace[k]);
normalise(subspace[k]);
}
}
*/
void testFine(RealD resid)
{
assert(evals_fine.size() == nbasis);
assert(subspace.size() == nbasis);
PlainHermOp<FineField> Op(_FineOp);
ImplicitlyRestartedLanczosHermOpTester<FineField> SimpleTester(Op);
for(int k=0;k<nbasis;k++){
assert(SimpleTester.ReconstructEval(k,resid,subspace[k],evals_fine[k],1.0)==1);
}
}
void testCoarse(RealD resid,ChebyParams cheby_smooth,RealD relax)
{
assert(evals_fine.size() == nbasis);
assert(subspace.size() == nbasis);
//////////////////////////////////////////////////////////////////////////////////////////////////
// create a smoother and see if we can get a cheap convergence test and smooth inside the IRL
//////////////////////////////////////////////////////////////////////////////////////////////////
Chebyshev<FineField> ChebySmooth(cheby_smooth);
ProjectedFunctionHermOp<Fobj,CComplex,nbasis> ChebyOp (ChebySmooth,_FineOp,subspace);
ImplicitlyRestartedLanczosSmoothedTester<Fobj,CComplex,nbasis> ChebySmoothTester(ChebyOp,ChebySmooth,_FineOp,subspace,relax);
for(int k=0;k<evec_coarse.size();k++){
if ( k < nbasis ) {
assert(ChebySmoothTester.ReconstructEval(k,resid,evec_coarse[k],evals_coarse[k],1.0)==1);
} else {
assert(ChebySmoothTester.ReconstructEval(k,resid*relax,evec_coarse[k],evals_coarse[k],1.0)==1);
}
}
}
void calcFine(ChebyParams cheby_parms,int Nstop,int Nk,int Nm,RealD resid,
RealD MaxIt, RealD betastp, int MinRes)
{
assert(nbasis<=Nm);
Chebyshev<FineField> Cheby(cheby_parms);
FunctionHermOp<FineField> ChebyOp(Cheby,_FineOp);
PlainHermOp<FineField> Op(_FineOp);
evals_fine.resize(Nm);
subspace.resize(Nm,_FineGrid);
ImplicitlyRestartedLanczos<FineField> IRL(ChebyOp,Op,Nstop,Nk,Nm,resid,MaxIt,betastp,MinRes);
FineField src(_FineGrid); src=1.0; src.checkerboard = _checkerboard;
int Nconv;
IRL.calc(evals_fine,subspace,src,Nconv,false);
// Shrink down to number saved
assert(Nstop>=nbasis);
assert(Nconv>=nbasis);
evals_fine.resize(nbasis);
subspace.resize(nbasis,_FineGrid);
}
void calcCoarse(ChebyParams cheby_op,ChebyParams cheby_smooth,RealD relax,
int Nstop, int Nk, int Nm,RealD resid,
RealD MaxIt, RealD betastp, int MinRes)
{
Chebyshev<FineField> Cheby(cheby_op);
ProjectedHermOp<Fobj,CComplex,nbasis> Op(_FineOp,subspace);
ProjectedFunctionHermOp<Fobj,CComplex,nbasis> ChebyOp (Cheby,_FineOp,subspace);
//////////////////////////////////////////////////////////////////////////////////////////////////
// create a smoother and see if we can get a cheap convergence test and smooth inside the IRL
//////////////////////////////////////////////////////////////////////////////////////////////////
Chebyshev<FineField> ChebySmooth(cheby_smooth);
ImplicitlyRestartedLanczosSmoothedTester<Fobj,CComplex,nbasis> ChebySmoothTester(ChebyOp,ChebySmooth,_FineOp,subspace,relax);
evals_coarse.resize(Nm);
evec_coarse.resize(Nm,_CoarseGrid);
CoarseField src(_CoarseGrid); src=1.0;
ImplicitlyRestartedLanczos<CoarseField> IRL(ChebyOp,ChebyOp,ChebySmoothTester,Nstop,Nk,Nm,resid,MaxIt,betastp,MinRes);
int Nconv=0;
IRL.calc(evals_coarse,evec_coarse,src,Nconv,false);
assert(Nconv>=Nstop);
evals_coarse.resize(Nstop);
evec_coarse.resize (Nstop,_CoarseGrid);
for (int i=0;i<Nstop;i++){
std::cout << i << " Coarse eval = " << evals_coarse[i] << std::endl;
}
}
};
}
#endif
@@ -0,0 +1,60 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/NormalEquations.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_NORMAL_EQUATIONS_H
#define GRID_NORMAL_EQUATIONS_H
namespace Grid {
///////////////////////////////////////////////////////////////////////////////////////////////////////
// Take a matrix and form an NE solver calling a Herm solver
///////////////////////////////////////////////////////////////////////////////////////////////////////
template<class Field> class NormalEquations : public OperatorFunction<Field>{
private:
SparseMatrixBase<Field> & _Matrix;
OperatorFunction<Field> & _HermitianSolver;
public:
/////////////////////////////////////////////////////
// Wrap the usual normal equations trick
/////////////////////////////////////////////////////
NormalEquations(SparseMatrixBase<Field> &Matrix, OperatorFunction<Field> &HermitianSolver)
: _Matrix(Matrix), _HermitianSolver(HermitianSolver) {};
void operator() (const Field &in, Field &out){
Field src(in._grid);
_Matrix.Mdag(in,src);
_HermitianSolver(src,out); // Mdag M out = Mdag in
}
};
}
#endif
@@ -0,0 +1,119 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/PrecConjugateResidual.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_PREC_CONJUGATE_RESIDUAL_H
#define GRID_PREC_CONJUGATE_RESIDUAL_H
namespace Grid {
/////////////////////////////////////////////////////////////
// Base classes for iterative processes based on operators
// single input vec, single output vec.
/////////////////////////////////////////////////////////////
template<class Field>
class PrecConjugateResidual : public OperatorFunction<Field> {
public:
RealD Tolerance;
Integer MaxIterations;
int verbose;
LinearFunction<Field> &Preconditioner;
PrecConjugateResidual(RealD tol,Integer maxit,LinearFunction<Field> &Prec) : Tolerance(tol), MaxIterations(maxit), Preconditioner(Prec)
{
verbose=1;
};
void operator() (LinearOperatorBase<Field> &Linop,const Field &src, Field &psi){
RealD a, b, c, d;
RealD cp, ssq,rsq;
RealD rAr, rAAr, rArp;
RealD pAp, pAAp;
GridBase *grid = src._grid;
Field r(grid), p(grid), Ap(grid), Ar(grid), z(grid);
psi=zero;
r = src;
Preconditioner(r,p);
Linop.HermOpAndNorm(p,Ap,pAp,pAAp);
Ar=Ap;
rAr=pAp;
rAAr=pAAp;
cp =norm2(r);
ssq=norm2(src);
rsq=Tolerance*Tolerance*ssq;
if (verbose) std::cout<<GridLogMessage<<"PrecConjugateResidual: iteration " <<0<<" residual "<<cp<< " target"<< rsq<<std::endl;
for(int k=0;k<MaxIterations;k++){
Preconditioner(Ap,z);
RealD rq= real(innerProduct(Ap,z));
a = rAr/rq;
axpy(psi,a,p,psi);
cp = axpy_norm(r,-a,z,r);
rArp=rAr;
Linop.HermOpAndNorm(r,Ar,rAr,rAAr);
b =rAr/rArp;
axpy(p,b,p,r);
pAAp=axpy_norm(Ap,b,Ap,Ar);
if(verbose) std::cout<<GridLogMessage<<"PrecConjugateResidual: iteration " <<k<<" residual "<<cp<< " target"<< rsq<<std::endl;
if(cp<rsq) {
Linop.HermOp(psi,Ap);
axpy(r,-1.0,src,Ap);
RealD true_resid = norm2(r)/ssq;
std::cout<<GridLogMessage<<"PrecConjugateResidual: Converged on iteration " <<k
<< " computed residual "<<sqrt(cp/ssq)
<< " true residual "<<sqrt(true_resid)
<< " target " <<Tolerance <<std::endl;
return;
}
}
std::cout<<GridLogMessage<<"PrecConjugateResidual did NOT converge"<<std::endl;
assert(0);
}
};
}
#endif
@@ -0,0 +1,230 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/PrecGeneralisedConjugateResidual.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_PREC_GCR_H
#define GRID_PREC_GCR_H
///////////////////////////////////////////////////////////////////////////////////////////////////////
//VPGCR Abe and Zhang, 2005.
//INTERNATIONAL JOURNAL OF NUMERICAL ANALYSIS AND MODELING
//Computing and Information Volume 2, Number 2, Pages 147-161
//NB. Likely not original reference since they are focussing on a preconditioner variant.
// but VPGCR was nicely written up in their paper
///////////////////////////////////////////////////////////////////////////////////////////////////////
namespace Grid {
template<class Field>
class PrecGeneralisedConjugateResidual : public OperatorFunction<Field> {
public:
RealD Tolerance;
Integer MaxIterations;
int verbose;
int mmax;
int nstep;
int steps;
GridStopWatch PrecTimer;
GridStopWatch MatTimer;
GridStopWatch LinalgTimer;
LinearFunction<Field> &Preconditioner;
PrecGeneralisedConjugateResidual(RealD tol,Integer maxit,LinearFunction<Field> &Prec,int _mmax,int _nstep) :
Tolerance(tol),
MaxIterations(maxit),
Preconditioner(Prec),
mmax(_mmax),
nstep(_nstep)
{
verbose=1;
};
void operator() (LinearOperatorBase<Field> &Linop,const Field &src, Field &psi){
psi=zero;
RealD cp, ssq,rsq;
ssq=norm2(src);
rsq=Tolerance*Tolerance*ssq;
Field r(src._grid);
PrecTimer.Reset();
MatTimer.Reset();
LinalgTimer.Reset();
GridStopWatch SolverTimer;
SolverTimer.Start();
steps=0;
for(int k=0;k<MaxIterations;k++){
cp=GCRnStep(Linop,src,psi,rsq);
std::cout<<GridLogMessage<<"VPGCR("<<mmax<<","<<nstep<<") "<< steps <<" steps cp = "<<cp<<std::endl;
if(cp<rsq) {
SolverTimer.Stop();
Linop.HermOp(psi,r);
axpy(r,-1.0,src,r);
RealD tr = norm2(r);
std::cout<<GridLogMessage<<"PrecGeneralisedConjugateResidual: Converged on iteration " <<steps
<< " computed residual "<<sqrt(cp/ssq)
<< " true residual " <<sqrt(tr/ssq)
<< " target " <<Tolerance <<std::endl;
std::cout<<GridLogMessage<<"VPGCR Time elapsed: Total "<< SolverTimer.Elapsed() <<std::endl;
std::cout<<GridLogMessage<<"VPGCR Time elapsed: Precon "<< PrecTimer.Elapsed() <<std::endl;
std::cout<<GridLogMessage<<"VPGCR Time elapsed: Matrix "<< MatTimer.Elapsed() <<std::endl;
std::cout<<GridLogMessage<<"VPGCR Time elapsed: Linalg "<< LinalgTimer.Elapsed() <<std::endl;
return;
}
}
std::cout<<GridLogMessage<<"Variable Preconditioned GCR did not converge"<<std::endl;
assert(0);
}
RealD GCRnStep(LinearOperatorBase<Field> &Linop,const Field &src, Field &psi,RealD rsq){
RealD cp;
RealD a, b, c, d;
RealD zAz, zAAz;
RealD rAq, rq;
GridBase *grid = src._grid;
Field r(grid);
Field z(grid);
Field tmp(grid);
Field ttmp(grid);
Field Az(grid);
////////////////////////////////
// history for flexible orthog
////////////////////////////////
std::vector<Field> q(mmax,grid);
std::vector<Field> p(mmax,grid);
std::vector<RealD> qq(mmax);
//////////////////////////////////
// initial guess x0 is taken as nonzero.
// r0=src-A x0 = src
//////////////////////////////////
MatTimer.Start();
Linop.HermOpAndNorm(psi,Az,zAz,zAAz);
MatTimer.Stop();
r=src-Az;
/////////////////////
// p = Prec(r)
/////////////////////
PrecTimer.Start();
Preconditioner(r,z);
PrecTimer.Stop();
MatTimer.Start();
Linop.HermOp(z,tmp);
MatTimer.Stop();
ttmp=tmp;
tmp=tmp-r;
/*
std::cout<<GridLogMessage<<r<<std::endl;
std::cout<<GridLogMessage<<z<<std::endl;
std::cout<<GridLogMessage<<ttmp<<std::endl;
std::cout<<GridLogMessage<<tmp<<std::endl;
*/
MatTimer.Start();
Linop.HermOpAndNorm(z,Az,zAz,zAAz);
MatTimer.Stop();
//p[0],q[0],qq[0]
p[0]= z;
q[0]= Az;
qq[0]= zAAz;
cp =norm2(r);
for(int k=0;k<nstep;k++){
steps++;
int kp = k+1;
int peri_k = k %mmax;
int peri_kp= kp%mmax;
rq= real(innerProduct(r,q[peri_k])); // what if rAr not real?
a = rq/qq[peri_k];
axpy(psi,a,p[peri_k],psi);
cp = axpy_norm(r,-a,q[peri_k],r);
if((k==nstep-1)||(cp<rsq)){
return cp;
}
std::cout<<GridLogMessage<< " VPGCR_step["<<steps<<"] resid " <<sqrt(cp/rsq)<<std::endl;
PrecTimer.Start();
Preconditioner(r,z);// solve Az = r
PrecTimer.Stop();
MatTimer.Start();
Linop.HermOpAndNorm(z,Az,zAz,zAAz);
Linop.HermOp(z,tmp);
MatTimer.Stop();
tmp=tmp-r;
std::cout<<GridLogMessage<< " Preconditioner resid " <<sqrt(norm2(tmp)/norm2(r))<<std::endl;
q[peri_kp]=Az;
p[peri_kp]=z;
int northog = ((kp)>(mmax-1))?(mmax-1):(kp); // if more than mmax done, we orthog all mmax history.
for(int back=0;back<northog;back++){
int peri_back=(k-back)%mmax; assert((k-back)>=0);
b=-real(innerProduct(q[peri_back],Az))/qq[peri_back];
p[peri_kp]=p[peri_kp]+b*p[peri_back];
q[peri_kp]=q[peri_kp]+b*q[peri_back];
}
qq[peri_kp]=norm2(q[peri_kp]); // could use axpy_norm
}
assert(0); // never reached
return cp;
}
};
}
#endif
+503
View File
@@ -0,0 +1,503 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/algorithms/iterative/SchurRedBlack.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_SCHUR_RED_BLACK_H
#define GRID_SCHUR_RED_BLACK_H
/*
* Red black Schur decomposition
*
* M = (Mee Meo) = (1 0 ) (Mee 0 ) (1 Mee^{-1} Meo)
* (Moe Moo) (Moe Mee^-1 1 ) (0 Moo-Moe Mee^-1 Meo) (0 1 )
* = L D U
*
* L^-1 = (1 0 )
* (-MoeMee^{-1} 1 )
* L^{dag} = ( 1 Mee^{-dag} Moe^{dag} )
* ( 0 1 )
* L^{-d} = ( 1 -Mee^{-dag} Moe^{dag} )
* ( 0 1 )
*
* U^-1 = (1 -Mee^{-1} Meo)
* (0 1 )
* U^{dag} = ( 1 0)
* (Meo^dag Mee^{-dag} 1)
* U^{-dag} = ( 1 0)
* (-Meo^dag Mee^{-dag} 1)
***********************
* M psi = eta
***********************
*Odd
* i) D_oo psi_o = L^{-1} eta_o
* eta_o' = (D_oo)^dag (eta_o - Moe Mee^{-1} eta_e)
*
* Wilson:
* (D_oo)^{\dag} D_oo psi_o = (D_oo)^dag L^{-1} eta_o
* Stag:
* D_oo psi_o = L^{-1} eta = (eta_o - Moe Mee^{-1} eta_e)
*
* L^-1 eta_o= (1 0 ) (e
* (-MoeMee^{-1} 1 )
*
*Even
* ii) Mee psi_e + Meo psi_o = src_e
*
* => sol_e = M_ee^-1 * ( src_e - Meo sol_o )...
*
*
* TODO: Other options:
*
* a) change checkerboards for Schur e<->o
*
* Left precon by Moo^-1
* b) Doo^{dag} M_oo^-dag Moo^-1 Doo psi_0 = (D_oo)^dag M_oo^-dag Moo^-1 L^{-1} eta_o
* eta_o' = (D_oo)^dag M_oo^-dag Moo^-1 (eta_o - Moe Mee^{-1} eta_e)
*
* Right precon by Moo^-1
* c) M_oo^-dag Doo^{dag} Doo Moo^-1 phi_0 = M_oo^-dag (D_oo)^dag L^{-1} eta_o
* eta_o' = M_oo^-dag (D_oo)^dag (eta_o - Moe Mee^{-1} eta_e)
* psi_o = M_oo^-1 phi_o
* TODO: Deflation
*/
namespace Grid {
///////////////////////////////////////////////////////////////////////////////////////////////////////
// Take a matrix and form a Red Black solver calling a Herm solver
// Use of RB info prevents making SchurRedBlackSolve conform to standard interface
///////////////////////////////////////////////////////////////////////////////////////////////////////
// Now make the norm reflect extra factor of Mee
template<class Field> class SchurRedBlackStaggeredSolve {
private:
OperatorFunction<Field> & _HermitianRBSolver;
int CBfactorise;
bool subGuess;
public:
/////////////////////////////////////////////////////
// Wrap the usual normal equations Schur trick
/////////////////////////////////////////////////////
SchurRedBlackStaggeredSolve(OperatorFunction<Field> &HermitianRBSolver, const bool initSubGuess = false) :
_HermitianRBSolver(HermitianRBSolver)
{
CBfactorise=0;
subtractGuess(initSubGuess);
};
void subtractGuess(const bool initSubGuess)
{
subGuess = initSubGuess;
}
bool isSubtractGuess(void)
{
return subGuess;
}
template<class Matrix>
void operator() (Matrix & _Matrix,const Field &in, Field &out){
ZeroGuesser<Field> guess;
(*this)(_Matrix,in,out,guess);
}
template<class Matrix, class Guesser>
void operator() (Matrix & _Matrix,const Field &in, Field &out, Guesser &guess){
// FIXME CGdiagonalMee not implemented virtual function
// FIXME use CBfactorise to control schur decomp
GridBase *grid = _Matrix.RedBlackGrid();
GridBase *fgrid= _Matrix.Grid();
SchurStaggeredOperator<Matrix,Field> _HermOpEO(_Matrix);
Field src_e(grid);
Field src_o(grid);
Field sol_e(grid);
Field sol_o(grid);
Field tmp(grid);
Field Mtmp(grid);
Field resid(fgrid);
std::cout << GridLogMessage << " SchurRedBlackStaggeredSolve " <<std::endl;
pickCheckerboard(Even,src_e,in);
pickCheckerboard(Odd ,src_o,in);
pickCheckerboard(Even,sol_e,out);
pickCheckerboard(Odd ,sol_o,out);
std::cout << GridLogMessage << " SchurRedBlackStaggeredSolve checkerboards picked" <<std::endl;
/////////////////////////////////////////////////////
// src_o = (source_o - Moe MeeInv source_e)
/////////////////////////////////////////////////////
_Matrix.MooeeInv(src_e,tmp); assert( tmp.checkerboard ==Even);
_Matrix.Meooe (tmp,Mtmp); assert( Mtmp.checkerboard ==Odd);
tmp=src_o-Mtmp; assert( tmp.checkerboard ==Odd);
//src_o = tmp; assert(src_o.checkerboard ==Odd);
_Matrix.Mooee(tmp,src_o); // Extra factor of "m" in source from dumb choice of matrix norm.
//////////////////////////////////////////////////////////////
// Call the red-black solver
//////////////////////////////////////////////////////////////
std::cout<<GridLogMessage << "SchurRedBlackStaggeredSolver calling the Mpc solver" <<std::endl;
guess(src_o, sol_o);
Mtmp = sol_o;
_HermitianRBSolver(_HermOpEO,src_o,sol_o); assert(sol_o.checkerboard==Odd);
std::cout<<GridLogMessage << "SchurRedBlackStaggeredSolver called the Mpc solver" <<std::endl;
// Fionn A2A boolean behavioural control
if (subGuess) sol_o = sol_o-Mtmp;
///////////////////////////////////////////////////
// sol_e = M_ee^-1 * ( src_e - Meo sol_o )...
///////////////////////////////////////////////////
_Matrix.Meooe(sol_o,tmp); assert( tmp.checkerboard ==Even);
src_e = src_e-tmp; assert( src_e.checkerboard ==Even);
_Matrix.MooeeInv(src_e,sol_e); assert( sol_e.checkerboard ==Even);
std::cout<<GridLogMessage << "SchurRedBlackStaggeredSolver reconstructed other CB" <<std::endl;
setCheckerboard(out,sol_e); assert( sol_e.checkerboard ==Even);
setCheckerboard(out,sol_o); assert( sol_o.checkerboard ==Odd );
std::cout<<GridLogMessage << "SchurRedBlackStaggeredSolver inserted solution" <<std::endl;
// Verify the unprec residual
if ( ! subGuess ) {
_Matrix.M(out,resid);
resid = resid-in;
RealD ns = norm2(in);
RealD nr = norm2(resid);
std::cout<<GridLogMessage << "SchurRedBlackStaggered solver true unprec resid "<< std::sqrt(nr/ns) <<" nr "<< nr <<" ns "<<ns << std::endl;
} else {
std::cout << GridLogMessage << "Guess subtracted after solve." << std::endl;
}
}
};
template<class Field> using SchurRedBlackStagSolve = SchurRedBlackStaggeredSolve<Field>;
///////////////////////////////////////////////////////////////////////////////////////////////////////
// Take a matrix and form a Red Black solver calling a Herm solver
// Use of RB info prevents making SchurRedBlackSolve conform to standard interface
///////////////////////////////////////////////////////////////////////////////////////////////////////
template<class Field> class SchurRedBlackDiagMooeeSolve {
private:
OperatorFunction<Field> & _HermitianRBSolver;
int CBfactorise;
bool subGuess;
public:
/////////////////////////////////////////////////////
// Wrap the usual normal equations Schur trick
/////////////////////////////////////////////////////
SchurRedBlackDiagMooeeSolve(OperatorFunction<Field> &HermitianRBSolver,int cb=0, const bool initSubGuess = false) : _HermitianRBSolver(HermitianRBSolver)
{
CBfactorise=cb;
subtractGuess(initSubGuess);
};
void subtractGuess(const bool initSubGuess)
{
subGuess = initSubGuess;
}
bool isSubtractGuess(void)
{
return subGuess;
}
template<class Matrix>
void operator() (Matrix & _Matrix,const Field &in, Field &out){
ZeroGuesser<Field> guess;
(*this)(_Matrix,in,out,guess);
}
template<class Matrix, class Guesser>
void operator() (Matrix & _Matrix,const Field &in, Field &out,Guesser &guess){
// FIXME CGdiagonalMee not implemented virtual function
// FIXME use CBfactorise to control schur decomp
GridBase *grid = _Matrix.RedBlackGrid();
GridBase *fgrid= _Matrix.Grid();
SchurDiagMooeeOperator<Matrix,Field> _HermOpEO(_Matrix);
Field src_e(grid);
Field src_o(grid);
Field sol_e(grid);
Field sol_o(grid);
Field tmp(grid);
Field Mtmp(grid);
Field resid(fgrid);
pickCheckerboard(Even,src_e,in);
pickCheckerboard(Odd ,src_o,in);
pickCheckerboard(Even,sol_e,out);
pickCheckerboard(Odd ,sol_o,out);
/////////////////////////////////////////////////////
// src_o = Mdag * (source_o - Moe MeeInv source_e)
/////////////////////////////////////////////////////
_Matrix.MooeeInv(src_e,tmp); assert( tmp.checkerboard ==Even);
_Matrix.Meooe (tmp,Mtmp); assert( Mtmp.checkerboard ==Odd);
tmp=src_o-Mtmp; assert( tmp.checkerboard ==Odd);
// get the right MpcDag
_HermOpEO.MpcDag(tmp,src_o); assert(src_o.checkerboard ==Odd);
//////////////////////////////////////////////////////////////
// Call the red-black solver
//////////////////////////////////////////////////////////////
std::cout<<GridLogMessage << "SchurRedBlack solver calling the MpcDagMp solver" <<std::endl;
guess(src_o,sol_o);
Mtmp = sol_o;
_HermitianRBSolver(_HermOpEO,src_o,sol_o); assert(sol_o.checkerboard==Odd);
// Fionn A2A boolean behavioural control
if (subGuess) sol_o = sol_o-Mtmp;
///////////////////////////////////////////////////
// sol_e = M_ee^-1 * ( src_e - Meo sol_o )...
///////////////////////////////////////////////////
_Matrix.Meooe(sol_o,tmp); assert( tmp.checkerboard ==Even);
src_e = src_e-tmp; assert( src_e.checkerboard ==Even);
_Matrix.MooeeInv(src_e,sol_e); assert( sol_e.checkerboard ==Even);
setCheckerboard(out,sol_e); assert( sol_e.checkerboard ==Even);
setCheckerboard(out,sol_o); assert( sol_o.checkerboard ==Odd );
// Verify the unprec residual
if ( ! subGuess ) {
_Matrix.M(out,resid);
resid = resid-in;
RealD ns = norm2(in);
RealD nr = norm2(resid);
std::cout<<GridLogMessage << "SchurRedBlackDiagMooee solver true unprec resid "<< std::sqrt(nr/ns) <<" nr "<< nr <<" ns "<<ns << std::endl;
} else {
std::cout << GridLogMessage << "Guess subtracted after solve." << std::endl;
}
}
};
///////////////////////////////////////////////////////////////////////////////////////////////////////
// Take a matrix and form a Red Black solver calling a Herm solver
// Use of RB info prevents making SchurRedBlackSolve conform to standard interface
///////////////////////////////////////////////////////////////////////////////////////////////////////
template<class Field> class SchurRedBlackDiagTwoSolve {
private:
OperatorFunction<Field> & _HermitianRBSolver;
int CBfactorise;
bool subGuess;
public:
/////////////////////////////////////////////////////
// Wrap the usual normal equations Schur trick
/////////////////////////////////////////////////////
SchurRedBlackDiagTwoSolve(OperatorFunction<Field> &HermitianRBSolver, const bool initSubGuess = false) :
_HermitianRBSolver(HermitianRBSolver)
{
CBfactorise = 0;
subtractGuess(initSubGuess);
};
void subtractGuess(const bool initSubGuess)
{
subGuess = initSubGuess;
}
bool isSubtractGuess(void)
{
return subGuess;
}
template<class Matrix>
void operator() (Matrix & _Matrix,const Field &in, Field &out){
ZeroGuesser<Field> guess;
(*this)(_Matrix,in,out,guess);
}
template<class Matrix,class Guesser>
void operator() (Matrix & _Matrix,const Field &in, Field &out,Guesser &guess){
// FIXME CGdiagonalMee not implemented virtual function
// FIXME use CBfactorise to control schur decomp
GridBase *grid = _Matrix.RedBlackGrid();
GridBase *fgrid= _Matrix.Grid();
SchurDiagTwoOperator<Matrix,Field> _HermOpEO(_Matrix);
Field src_e(grid);
Field src_o(grid);
Field sol_e(grid);
Field sol_o(grid);
Field tmp(grid);
Field Mtmp(grid);
Field resid(fgrid);
pickCheckerboard(Even,src_e,in);
pickCheckerboard(Odd ,src_o,in);
pickCheckerboard(Even,sol_e,out);
pickCheckerboard(Odd ,sol_o,out);
/////////////////////////////////////////////////////
// src_o = Mdag * (source_o - Moe MeeInv source_e)
/////////////////////////////////////////////////////
_Matrix.MooeeInv(src_e,tmp); assert( tmp.checkerboard ==Even);
_Matrix.Meooe (tmp,Mtmp); assert( Mtmp.checkerboard ==Odd);
tmp=src_o-Mtmp; assert( tmp.checkerboard ==Odd);
// get the right MpcDag
_HermOpEO.MpcDag(tmp,src_o); assert(src_o.checkerboard ==Odd);
//////////////////////////////////////////////////////////////
// Call the red-black solver
//////////////////////////////////////////////////////////////
std::cout<<GridLogMessage << "SchurRedBlack solver calling the MpcDagMp solver" <<std::endl;
// _HermitianRBSolver(_HermOpEO,src_o,sol_o); assert(sol_o.checkerboard==Odd);
guess(src_o,tmp);
Mtmp = tmp;
_HermitianRBSolver(_HermOpEO,src_o,tmp); assert(tmp.checkerboard==Odd);
// Fionn A2A boolean behavioural control
if (subGuess) tmp = tmp-Mtmp;
_Matrix.MooeeInv(tmp,sol_o); assert( sol_o.checkerboard ==Odd);
///////////////////////////////////////////////////
// sol_e = M_ee^-1 * ( src_e - Meo sol_o )...
///////////////////////////////////////////////////
_Matrix.Meooe(sol_o,tmp); assert( tmp.checkerboard ==Even);
src_e = src_e-tmp; assert( src_e.checkerboard ==Even);
_Matrix.MooeeInv(src_e,sol_e); assert( sol_e.checkerboard ==Even);
setCheckerboard(out,sol_e); assert( sol_e.checkerboard ==Even);
setCheckerboard(out,sol_o); assert( sol_o.checkerboard ==Odd );
// Verify the unprec residual
if ( ! subGuess ) {
_Matrix.M(out,resid);
resid = resid-in;
RealD ns = norm2(in);
RealD nr = norm2(resid);
std::cout<<GridLogMessage << "SchurRedBlackDiagTwo solver true unprec resid "<< std::sqrt(nr/ns) <<" nr "<< nr <<" ns "<<ns << std::endl;
} else {
std::cout << GridLogMessage << "Guess subtracted after solve." << std::endl;
}
}
};
///////////////////////////////////////////////////////////////////////////////////////////////////////
// Take a matrix and form a Red Black solver calling a Herm solver
// Use of RB info prevents making SchurRedBlackSolve conform to standard interface
///////////////////////////////////////////////////////////////////////////////////////////////////////
template<class Field> class SchurRedBlackDiagTwoMixed {
private:
LinearFunction<Field> & _HermitianRBSolver;
int CBfactorise;
bool subGuess;
public:
/////////////////////////////////////////////////////
// Wrap the usual normal equations Schur trick
/////////////////////////////////////////////////////
SchurRedBlackDiagTwoMixed(LinearFunction<Field> &HermitianRBSolver, const bool initSubGuess = false) :
_HermitianRBSolver(HermitianRBSolver)
{
CBfactorise=0;
subtractGuess(initSubGuess);
};
void subtractGuess(const bool initSubGuess)
{
subGuess = initSubGuess;
}
bool isSubtractGuess(void)
{
return subGuess;
}
template<class Matrix>
void operator() (Matrix & _Matrix,const Field &in, Field &out){
ZeroGuesser<Field> guess;
(*this)(_Matrix,in,out,guess);
}
template<class Matrix, class Guesser>
void operator() (Matrix & _Matrix,const Field &in, Field &out,Guesser &guess){
// FIXME CGdiagonalMee not implemented virtual function
// FIXME use CBfactorise to control schur decomp
GridBase *grid = _Matrix.RedBlackGrid();
GridBase *fgrid= _Matrix.Grid();
SchurDiagTwoOperator<Matrix,Field> _HermOpEO(_Matrix);
Field src_e(grid);
Field src_o(grid);
Field sol_e(grid);
Field sol_o(grid);
Field tmp(grid);
Field Mtmp(grid);
Field resid(fgrid);
pickCheckerboard(Even,src_e,in);
pickCheckerboard(Odd ,src_o,in);
pickCheckerboard(Even,sol_e,out);
pickCheckerboard(Odd ,sol_o,out);
/////////////////////////////////////////////////////
// src_o = Mdag * (source_o - Moe MeeInv source_e)
/////////////////////////////////////////////////////
_Matrix.MooeeInv(src_e,tmp); assert( tmp.checkerboard ==Even);
_Matrix.Meooe (tmp,Mtmp); assert( Mtmp.checkerboard ==Odd);
tmp=src_o-Mtmp; assert( tmp.checkerboard ==Odd);
// get the right MpcDag
_HermOpEO.MpcDag(tmp,src_o); assert(src_o.checkerboard ==Odd);
//////////////////////////////////////////////////////////////
// Call the red-black solver
//////////////////////////////////////////////////////////////
std::cout<<GridLogMessage << "SchurRedBlack solver calling the MpcDagMp solver" <<std::endl;
// _HermitianRBSolver(_HermOpEO,src_o,sol_o); assert(sol_o.checkerboard==Odd);
// _HermitianRBSolver(_HermOpEO,src_o,tmp); assert(tmp.checkerboard==Odd);
guess(src_o,tmp);
Mtmp = tmp;
_HermitianRBSolver(_HermOpEO,src_o,tmp); assert(tmp.checkerboard==Odd);
// Fionn A2A boolean behavioural control
if (subGuess) tmp = tmp-Mtmp;
_Matrix.MooeeInv(tmp,sol_o); assert( sol_o.checkerboard ==Odd);
///////////////////////////////////////////////////
// sol_e = M_ee^-1 * ( src_e - Meo sol_o )...
///////////////////////////////////////////////////
_Matrix.Meooe(sol_o,tmp); assert( tmp.checkerboard ==Even);
src_e = src_e-tmp; assert( src_e.checkerboard ==Even);
_Matrix.MooeeInv(src_e,sol_e); assert( sol_e.checkerboard ==Even);
setCheckerboard(out,sol_e); assert( sol_e.checkerboard ==Even);
setCheckerboard(out,sol_o); assert( sol_o.checkerboard ==Odd );
// Verify the unprec residual
if ( ! subGuess ) {
_Matrix.M(out,resid);
resid = resid-in;
RealD ns = norm2(in);
RealD nr = norm2(resid);
std::cout << GridLogMessage << "SchurRedBlackDiagTwo solver true unprec resid " << std::sqrt(nr / ns) << " nr " << nr << " ns " << ns << std::endl;
} else {
std::cout << GridLogMessage << "Guess subtracted after solve." << std::endl;
}
}
};
}
#endif
+125
View File
@@ -0,0 +1,125 @@
#include <Grid/GridCore.h>
#include <fcntl.h>
namespace Grid {
MemoryStats *MemoryProfiler::stats = nullptr;
bool MemoryProfiler::debug = false;
int PointerCache::victim;
PointerCache::PointerCacheEntry PointerCache::Entries[PointerCache::Ncache];
void *PointerCache::Insert(void *ptr,size_t bytes) {
if (bytes < 4096 ) return ptr;
#ifdef GRID_OMP
assert(omp_in_parallel()==0);
#endif
void * ret = NULL;
int v = -1;
for(int e=0;e<Ncache;e++) {
if ( Entries[e].valid==0 ) {
v=e;
break;
}
}
if ( v==-1 ) {
v=victim;
victim = (victim+1)%Ncache;
}
if ( Entries[v].valid ) {
ret = Entries[v].address;
Entries[v].valid = 0;
Entries[v].address = NULL;
Entries[v].bytes = 0;
}
Entries[v].address=ptr;
Entries[v].bytes =bytes;
Entries[v].valid =1;
return ret;
}
void *PointerCache::Lookup(size_t bytes) {
if (bytes < 4096 ) return NULL;
#ifdef _OPENMP
assert(omp_in_parallel()==0);
#endif
for(int e=0;e<Ncache;e++){
if ( Entries[e].valid && ( Entries[e].bytes == bytes ) ) {
Entries[e].valid = 0;
return Entries[e].address;
}
}
return NULL;
}
void check_huge_pages(void *Buf,uint64_t BYTES)
{
#ifdef __linux__
int fd = open("/proc/self/pagemap", O_RDONLY);
assert(fd >= 0);
const int page_size = 4096;
uint64_t virt_pfn = (uint64_t)Buf / page_size;
off_t offset = sizeof(uint64_t) * virt_pfn;
uint64_t npages = (BYTES + page_size-1) / page_size;
uint64_t pagedata[npages];
uint64_t ret = lseek(fd, offset, SEEK_SET);
assert(ret == offset);
ret = ::read(fd, pagedata, sizeof(uint64_t)*npages);
assert(ret == sizeof(uint64_t) * npages);
int nhugepages = npages / 512;
int n4ktotal, nnothuge;
n4ktotal = 0;
nnothuge = 0;
for (int i = 0; i < nhugepages; ++i) {
uint64_t baseaddr = (pagedata[i*512] & 0x7fffffffffffffULL) * page_size;
for (int j = 0; j < 512; ++j) {
uint64_t pageaddr = (pagedata[i*512+j] & 0x7fffffffffffffULL) * page_size;
++n4ktotal;
if (pageaddr != baseaddr + j * page_size)
++nnothuge;
}
}
int rank = CartesianCommunicator::RankWorld();
printf("rank %d Allocated %d 4k pages, %d not in huge pages\n", rank, n4ktotal, nnothuge);
#endif
}
std::string sizeString(const size_t bytes)
{
constexpr unsigned int bufSize = 256;
const char *suffixes[7] = {"", "K", "M", "G", "T", "P", "E"};
char buf[256];
size_t s = 0;
double count = bytes;
while (count >= 1024 && s < 7)
{
s++;
count /= 1024;
}
if (count - floor(count) == 0.0)
{
snprintf(buf, bufSize, "%d %sB", (int)count, suffixes[s]);
}
else
{
snprintf(buf, bufSize, "%.1f %sB", count, suffixes[s]);
}
return std::string(buf);
}
}
+315
View File
@@ -0,0 +1,315 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/AlignedAllocator.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_ALIGNED_ALLOCATOR_H
#define GRID_ALIGNED_ALLOCATOR_H
#ifdef HAVE_MALLOC_MALLOC_H
#include <malloc/malloc.h>
#endif
#ifdef HAVE_MALLOC_H
#include <malloc.h>
#endif
#ifdef HAVE_MM_MALLOC_H
#include <mm_malloc.h>
#endif
namespace Grid {
class PointerCache {
private:
static const int Ncache=8;
static int victim;
typedef struct {
void *address;
size_t bytes;
int valid;
} PointerCacheEntry;
static PointerCacheEntry Entries[Ncache];
public:
static void *Insert(void *ptr,size_t bytes) ;
static void *Lookup(size_t bytes) ;
};
std::string sizeString(size_t bytes);
struct MemoryStats
{
size_t totalAllocated{0}, maxAllocated{0},
currentlyAllocated{0}, totalFreed{0};
};
class MemoryProfiler
{
public:
static MemoryStats *stats;
static bool debug;
};
#define memString(bytes) std::to_string(bytes) + " (" + sizeString(bytes) + ")"
#define profilerDebugPrint \
if (MemoryProfiler::stats)\
{\
auto s = MemoryProfiler::stats;\
std::cout << GridLogDebug << "[Memory debug] Stats " << MemoryProfiler::stats << std::endl;\
std::cout << GridLogDebug << "[Memory debug] total : " << memString(s->totalAllocated) \
<< std::endl;\
std::cout << GridLogDebug << "[Memory debug] max : " << memString(s->maxAllocated) \
<< std::endl;\
std::cout << GridLogDebug << "[Memory debug] current: " << memString(s->currentlyAllocated) \
<< std::endl;\
std::cout << GridLogDebug << "[Memory debug] freed : " << memString(s->totalFreed) \
<< std::endl;\
}
#define profilerAllocate(bytes)\
if (MemoryProfiler::stats)\
{\
auto s = MemoryProfiler::stats;\
s->totalAllocated += (bytes);\
s->currentlyAllocated += (bytes);\
s->maxAllocated = std::max(s->maxAllocated, s->currentlyAllocated);\
}\
if (MemoryProfiler::debug)\
{\
std::cout << GridLogDebug << "[Memory debug] allocating " << memString(bytes) << std::endl;\
profilerDebugPrint;\
}
#define profilerFree(bytes)\
if (MemoryProfiler::stats)\
{\
auto s = MemoryProfiler::stats;\
s->totalFreed += (bytes);\
s->currentlyAllocated -= (bytes);\
}\
if (MemoryProfiler::debug)\
{\
std::cout << GridLogDebug << "[Memory debug] freeing " << memString(bytes) << std::endl;\
profilerDebugPrint;\
}
void check_huge_pages(void *Buf,uint64_t BYTES);
////////////////////////////////////////////////////////////////////
// A lattice of something, but assume the something is SIMDized.
////////////////////////////////////////////////////////////////////
template<typename _Tp>
class alignedAllocator {
public:
typedef std::size_t size_type;
typedef std::ptrdiff_t difference_type;
typedef _Tp* pointer;
typedef const _Tp* const_pointer;
typedef _Tp& reference;
typedef const _Tp& const_reference;
typedef _Tp value_type;
template<typename _Tp1> struct rebind { typedef alignedAllocator<_Tp1> other; };
alignedAllocator() throw() { }
alignedAllocator(const alignedAllocator&) throw() { }
template<typename _Tp1> alignedAllocator(const alignedAllocator<_Tp1>&) throw() { }
~alignedAllocator() throw() { }
pointer address(reference __x) const { return &__x; }
size_type max_size() const throw() { return size_t(-1) / sizeof(_Tp); }
pointer allocate(size_type __n, const void* _p= 0)
{
size_type bytes = __n*sizeof(_Tp);
profilerAllocate(bytes);
_Tp *ptr = (_Tp *) PointerCache::Lookup(bytes);
// if ( ptr != NULL )
// std::cout << "alignedAllocator "<<__n << " cache hit "<< std::hex << ptr <<std::dec <<std::endl;
//////////////////
// Hack 2MB align; could make option probably doesn't need configurability
//////////////////
//define GRID_ALLOC_ALIGN (128)
#define GRID_ALLOC_ALIGN (2*1024*1024)
#ifdef HAVE_MM_MALLOC_H
if ( ptr == (_Tp *) NULL ) ptr = (_Tp *) _mm_malloc(bytes,GRID_ALLOC_ALIGN);
#else
if ( ptr == (_Tp *) NULL ) ptr = (_Tp *) memalign(GRID_ALLOC_ALIGN,bytes);
#endif
// std::cout << "alignedAllocator " << std::hex << ptr <<std::dec <<std::endl;
// First touch optimise in threaded loop
uint8_t *cp = (uint8_t *)ptr;
#ifdef GRID_OMP
#pragma omp parallel for
#endif
for(size_type n=0;n<bytes;n+=4096){
cp[n]=0;
}
return ptr;
}
void deallocate(pointer __p, size_type __n) {
size_type bytes = __n * sizeof(_Tp);
profilerFree(bytes);
pointer __freeme = (pointer)PointerCache::Insert((void *)__p,bytes);
#ifdef HAVE_MM_MALLOC_H
if ( __freeme ) _mm_free((void *)__freeme);
#else
if ( __freeme ) free((void *)__freeme);
#endif
}
void construct(pointer __p, const _Tp& __val) { };
void construct(pointer __p) { };
void destroy(pointer __p) { };
};
template<typename _Tp> inline bool operator==(const alignedAllocator<_Tp>&, const alignedAllocator<_Tp>&){ return true; }
template<typename _Tp> inline bool operator!=(const alignedAllocator<_Tp>&, const alignedAllocator<_Tp>&){ return false; }
//////////////////////////////////////////////////////////////////////////////////////////
// MPI3 : comms must use shm region
// SHMEM: comms must use symmetric heap
//////////////////////////////////////////////////////////////////////////////////////////
#ifdef GRID_COMMS_SHMEM
extern "C" {
#include <mpp/shmem.h>
extern void * shmem_align(size_t, size_t);
extern void shmem_free(void *);
}
#define PARANOID_SYMMETRIC_HEAP
#endif
template<typename _Tp>
class commAllocator {
public:
typedef std::size_t size_type;
typedef std::ptrdiff_t difference_type;
typedef _Tp* pointer;
typedef const _Tp* const_pointer;
typedef _Tp& reference;
typedef const _Tp& const_reference;
typedef _Tp value_type;
template<typename _Tp1> struct rebind { typedef commAllocator<_Tp1> other; };
commAllocator() throw() { }
commAllocator(const commAllocator&) throw() { }
template<typename _Tp1> commAllocator(const commAllocator<_Tp1>&) throw() { }
~commAllocator() throw() { }
pointer address(reference __x) const { return &__x; }
size_type max_size() const throw() { return size_t(-1) / sizeof(_Tp); }
#ifdef GRID_COMMS_SHMEM
pointer allocate(size_type __n, const void* _p= 0)
{
size_type bytes = __n*sizeof(_Tp);
profilerAllocate(bytes);
#ifdef CRAY
_Tp *ptr = (_Tp *) shmem_align(bytes,64);
#else
_Tp *ptr = (_Tp *) shmem_align(64,bytes);
#endif
#ifdef PARANOID_SYMMETRIC_HEAP
static void * bcast;
static long psync[_SHMEM_REDUCE_SYNC_SIZE];
bcast = (void *) ptr;
shmem_broadcast32((void *)&bcast,(void *)&bcast,sizeof(void *)/4,0,0,0,shmem_n_pes(),psync);
if ( bcast != ptr ) {
std::printf("inconsistent alloc pe %d %lx %lx \n",shmem_my_pe(),bcast,ptr);std::fflush(stdout);
// BACKTRACEFILE();
exit(0);
}
assert( bcast == (void *) ptr);
#endif
return ptr;
}
void deallocate(pointer __p, size_type __n) {
size_type bytes = __n*sizeof(_Tp);
profilerFree(bytes);
shmem_free((void *)__p);
}
#else
pointer allocate(size_type __n, const void* _p= 0)
{
size_type bytes = __n*sizeof(_Tp);
profilerAllocate(bytes);
#ifdef HAVE_MM_MALLOC_H
_Tp * ptr = (_Tp *) _mm_malloc(bytes, GRID_ALLOC_ALIGN);
#else
_Tp * ptr = (_Tp *) memalign(GRID_ALLOC_ALIGN, bytes);
#endif
uint8_t *cp = (uint8_t *)ptr;
if ( ptr ) {
// One touch per 4k page, static OMP loop to catch same loop order
#ifdef GRID_OMP
#pragma omp parallel for schedule(static)
#endif
for(size_type n=0;n<bytes;n+=4096){
cp[n]=0;
}
}
return ptr;
}
void deallocate(pointer __p, size_type __n) {
size_type bytes = __n*sizeof(_Tp);
profilerFree(bytes);
#ifdef HAVE_MM_MALLOC_H
_mm_free((void *)__p);
#else
free((void *)__p);
#endif
}
#endif
void construct(pointer __p, const _Tp& __val) { };
void construct(pointer __p) { };
void destroy(pointer __p) { };
};
template<typename _Tp> inline bool operator==(const commAllocator<_Tp>&, const commAllocator<_Tp>&){ return true; }
template<typename _Tp> inline bool operator!=(const commAllocator<_Tp>&, const commAllocator<_Tp>&){ return false; }
////////////////////////////////////////////////////////////////////////////////
// Template typedefs
////////////////////////////////////////////////////////////////////////////////
template<class T> using Vector = std::vector<T,alignedAllocator<T> >;
template<class T> using commVector = std::vector<T,commAllocator<T> >;
template<class T> using Matrix = std::vector<std::vector<T,alignedAllocator<T> > >;
}; // namespace Grid
#endif
+35
View File
@@ -0,0 +1,35 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Cartesian.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_CARTESIAN_H
#define GRID_CARTESIAN_H
#include <Grid/cartesian/Cartesian_base.h>
#include <Grid/cartesian/Cartesian_full.h>
#include <Grid/cartesian/Cartesian_red_black.h>
#endif
+292
View File
@@ -0,0 +1,292 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/cartesian/Cartesian_base.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: paboyle <paboyle@ph.ed.ac.uk>
Author: Guido Cossu <guido.cossu@ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_CARTESIAN_BASE_H
#define GRID_CARTESIAN_BASE_H
namespace Grid{
//////////////////////////////////////////////////////////////////////
// Commicator provides information on the processor grid
//////////////////////////////////////////////////////////////////////
// unsigned long _ndimension;
// std::vector<int> _processors; // processor grid
// int _processor; // linear processor rank
// std::vector<int> _processor_coor; // linear processor rank
//////////////////////////////////////////////////////////////////////
class GridBase : public CartesianCommunicator , public GridThread {
public:
int dummy;
// Give Lattice access
template<class object> friend class Lattice;
GridBase(const std::vector<int> & processor_grid) : CartesianCommunicator(processor_grid) {};
GridBase(const std::vector<int> & processor_grid,
const CartesianCommunicator &parent,
int &split_rank)
: CartesianCommunicator(processor_grid,parent,split_rank) {};
GridBase(const std::vector<int> & processor_grid,
const CartesianCommunicator &parent)
: CartesianCommunicator(processor_grid,parent,dummy) {};
virtual ~GridBase() = default;
// Physics Grid information.
std::vector<int> _simd_layout;// Which dimensions get relayed out over simd lanes.
std::vector<int> _fdimensions;// (full) Global dimensions of array prior to cb removal
std::vector<int> _gdimensions;// Global dimensions of array after cb removal
std::vector<int> _ldimensions;// local dimensions of array with processor images removed
std::vector<int> _rdimensions;// Reduced local dimensions with simd lane images and processor images removed
std::vector<int> _ostride; // Outer stride for each dimension
std::vector<int> _istride; // Inner stride i.e. within simd lane
int _osites; // _isites*_osites = product(dimensions).
int _isites;
int _fsites; // _isites*_osites = product(dimensions).
int _gsites;
std::vector<int> _slice_block;// subslice information
std::vector<int> _slice_stride;
std::vector<int> _slice_nblock;
std::vector<int> _lstart; // local start of array in gcoors _processor_coor[d]*_ldimensions[d]
std::vector<int> _lend ; // local end of array in gcoors _processor_coor[d]*_ldimensions[d]+_ldimensions_[d]-1
bool _isCheckerBoarded;
public:
////////////////////////////////////////////////////////////////
// Checkerboarding interface is virtual and overridden by
// GridCartesian / GridRedBlackCartesian
////////////////////////////////////////////////////////////////
virtual int CheckerBoarded(int dim)=0;
virtual int CheckerBoard(const std::vector<int> &site)=0;
virtual int CheckerBoardDestination(int source_cb,int shift,int dim)=0;
virtual int CheckerBoardShift(int source_cb,int dim,int shift,int osite)=0;
virtual int CheckerBoardShiftForCB(int source_cb,int dim,int shift,int cb)=0;
virtual int CheckerBoardFromOindex (int Oindex)=0;
virtual int CheckerBoardFromOindexTable (int Oindex)=0;
//////////////////////////////////////////////////////////////////////////////////////////////
// Local layout calculations
//////////////////////////////////////////////////////////////////////////////////////////////
// These routines are key. Subdivide the linearised cartesian index into
// "inner" index identifying which simd lane of object<vFcomplex> is associated with coord
// "outer" index identifying which element of _odata in class "Lattice" is associated with coord.
//
// Compared to, say, Blitz++ we simply need to store BOTH an inner stride and an outer
// stride per dimension. The cost of evaluating the indexing information is doubled for an n-dimensional
// coordinate. Note, however, for data parallel operations the "inner" indexing cost is not paid and all
// lanes are operated upon simultaneously.
virtual int oIndex(std::vector<int> &coor)
{
int idx=0;
// Works with either global or local coordinates
for(int d=0;d<_ndimension;d++) idx+=_ostride[d]*(coor[d]%_rdimensions[d]);
return idx;
}
virtual int iIndex(std::vector<int> &lcoor)
{
int idx=0;
for(int d=0;d<_ndimension;d++) idx+=_istride[d]*(lcoor[d]/_rdimensions[d]);
return idx;
}
inline int oIndexReduced(std::vector<int> &ocoor)
{
int idx=0;
// ocoor is already reduced so can eliminate the modulo operation
// for fast indexing and inline the routine
for(int d=0;d<_ndimension;d++) idx+=_ostride[d]*ocoor[d];
return idx;
}
inline void oCoorFromOindex (std::vector<int>& coor,int Oindex){
Lexicographic::CoorFromIndex(coor,Oindex,_rdimensions);
}
inline void InOutCoorToLocalCoor (std::vector<int> &ocoor, std::vector<int> &icoor, std::vector<int> &lcoor) {
lcoor.resize(_ndimension);
for (int d = 0; d < _ndimension; d++)
lcoor[d] = ocoor[d] + _rdimensions[d] * icoor[d];
}
//////////////////////////////////////////////////////////
// SIMD lane addressing
//////////////////////////////////////////////////////////
inline void iCoorFromIindex(std::vector<int> &coor,int lane)
{
Lexicographic::CoorFromIndex(coor,lane,_simd_layout);
}
inline int PermuteDim(int dimension){
return _simd_layout[dimension]>1;
}
inline int PermuteType(int dimension){
int permute_type=0;
//
// FIXME:
//
// Best way to encode this would be to present a mask
// for which simd dimensions are rotated, and the rotation
// size. If there is only one simd dimension rotated, this is just
// a permute.
//
// Cases: PermuteType == 1,2,4,8
// Distance should be either 0,1,2..
//
if ( _simd_layout[dimension] > 2 ) {
for(int d=0;d<_ndimension;d++){
if ( d != dimension ) assert ( (_simd_layout[d]==1) );
}
permute_type = RotateBit; // How to specify distance; this is not just direction.
return permute_type;
}
for(int d=_ndimension-1;d>dimension;d--){
if (_simd_layout[d]>1 ) permute_type++;
}
return permute_type;
}
////////////////////////////////////////////////////////////////
// Array sizing queries
////////////////////////////////////////////////////////////////
inline int iSites(void) const { return _isites; };
inline int Nsimd(void) const { return _isites; };// Synonymous with iSites
inline int oSites(void) const { return _osites; };
inline int lSites(void) const { return _isites*_osites; };
inline int gSites(void) const { return _isites*_osites*_Nprocessors; };
inline int Nd (void) const { return _ndimension;};
inline const std::vector<int> LocalStarts(void) { return _lstart; };
inline const std::vector<int> &FullDimensions(void) { return _fdimensions;};
inline const std::vector<int> &GlobalDimensions(void) { return _gdimensions;};
inline const std::vector<int> &LocalDimensions(void) { return _ldimensions;};
inline const std::vector<int> &VirtualLocalDimensions(void) { return _ldimensions;};
////////////////////////////////////////////////////////////////
// Utility to print the full decomposition details
////////////////////////////////////////////////////////////////
void show_decomposition(){
std::cout << GridLogMessage << "\tFull Dimensions : " << _fdimensions << std::endl;
std::cout << GridLogMessage << "\tSIMD layout : " << _simd_layout << std::endl;
std::cout << GridLogMessage << "\tGlobal Dimensions : " << _gdimensions << std::endl;
std::cout << GridLogMessage << "\tLocal Dimensions : " << _ldimensions << std::endl;
std::cout << GridLogMessage << "\tReduced Dimensions : " << _rdimensions << std::endl;
std::cout << GridLogMessage << "\tOuter strides : " << _ostride << std::endl;
std::cout << GridLogMessage << "\tInner strides : " << _istride << std::endl;
std::cout << GridLogMessage << "\tiSites : " << _isites << std::endl;
std::cout << GridLogMessage << "\toSites : " << _osites << std::endl;
std::cout << GridLogMessage << "\tlSites : " << lSites() << std::endl;
std::cout << GridLogMessage << "\tgSites : " << gSites() << std::endl;
std::cout << GridLogMessage << "\tNd : " << _ndimension << std::endl;
}
////////////////////////////////////////////////////////////////
// Global addressing
////////////////////////////////////////////////////////////////
void GlobalIndexToGlobalCoor(int gidx,std::vector<int> &gcoor){
assert(gidx< gSites());
Lexicographic::CoorFromIndex(gcoor,gidx,_gdimensions);
}
void LocalIndexToLocalCoor(int lidx,std::vector<int> &lcoor){
assert(lidx<lSites());
Lexicographic::CoorFromIndex(lcoor,lidx,_ldimensions);
}
void GlobalCoorToGlobalIndex(const std::vector<int> & gcoor,int & gidx){
gidx=0;
int mult=1;
for(int mu=0;mu<_ndimension;mu++) {
gidx+=mult*gcoor[mu];
mult*=_gdimensions[mu];
}
}
void GlobalCoorToProcessorCoorLocalCoor(std::vector<int> &pcoor,std::vector<int> &lcoor,const std::vector<int> &gcoor)
{
pcoor.resize(_ndimension);
lcoor.resize(_ndimension);
for(int mu=0;mu<_ndimension;mu++){
int _fld = _fdimensions[mu]/_processors[mu];
pcoor[mu] = gcoor[mu]/_fld;
lcoor[mu] = gcoor[mu]%_fld;
}
}
void GlobalCoorToRankIndex(int &rank, int &o_idx, int &i_idx ,const std::vector<int> &gcoor)
{
std::vector<int> pcoor;
std::vector<int> lcoor;
GlobalCoorToProcessorCoorLocalCoor(pcoor,lcoor,gcoor);
rank = RankFromProcessorCoor(pcoor);
/*
std::vector<int> cblcoor(lcoor);
for(int d=0;d<cblcoor.size();d++){
if( this->CheckerBoarded(d) ) {
cblcoor[d] = lcoor[d]/2;
}
}
*/
i_idx= iIndex(lcoor);
o_idx= oIndex(lcoor);
}
void RankIndexToGlobalCoor(int rank, int o_idx, int i_idx , std::vector<int> &gcoor)
{
gcoor.resize(_ndimension);
std::vector<int> coor(_ndimension);
ProcessorCoorFromRank(rank,coor);
for(int mu=0;mu<_ndimension;mu++) gcoor[mu] = _ldimensions[mu]*coor[mu];
iCoorFromIindex(coor,i_idx);
for(int mu=0;mu<_ndimension;mu++) gcoor[mu] += _rdimensions[mu]*coor[mu];
oCoorFromOindex (coor,o_idx);
for(int mu=0;mu<_ndimension;mu++) gcoor[mu] += coor[mu];
}
void RankIndexCbToFullGlobalCoor(int rank, int o_idx, int i_idx, int cb,std::vector<int> &fcoor)
{
RankIndexToGlobalCoor(rank,o_idx,i_idx ,fcoor);
if(CheckerBoarded(0)){
fcoor[0] = fcoor[0]*2+cb;
}
}
void ProcessorCoorLocalCoorToGlobalCoor(std::vector<int> &Pcoor,std::vector<int> &Lcoor,std::vector<int> &gcoor)
{
gcoor.resize(_ndimension);
for(int mu=0;mu<_ndimension;mu++) gcoor[mu] = Pcoor[mu]*_ldimensions[mu]+Lcoor[mu];
}
};
}
#endif
+174
View File
@@ -0,0 +1,174 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/cartesian/Cartesian_full.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_CARTESIAN_FULL_H
#define GRID_CARTESIAN_FULL_H
namespace Grid{
/////////////////////////////////////////////////////////////////////////////////////////
// Grid Support.
/////////////////////////////////////////////////////////////////////////////////////////
class GridCartesian: public GridBase {
public:
int dummy;
virtual int CheckerBoardFromOindexTable (int Oindex) {
return 0;
}
virtual int CheckerBoardFromOindex (int Oindex)
{
return 0;
}
virtual int CheckerBoarded(int dim){
return 0;
}
virtual int CheckerBoard(const std::vector<int> &site){
return 0;
}
virtual int CheckerBoardDestination(int cb,int shift,int dim){
return 0;
}
virtual int CheckerBoardShiftForCB(int source_cb,int dim,int shift, int ocb){
return shift;
}
virtual int CheckerBoardShift(int source_cb,int dim,int shift, int osite){
return shift;
}
/////////////////////////////////////////////////////////////////////////
// Constructor takes a parent grid and possibly subdivides communicator.
/////////////////////////////////////////////////////////////////////////
GridCartesian(const std::vector<int> &dimensions,
const std::vector<int> &simd_layout,
const std::vector<int> &processor_grid,
const GridCartesian &parent) : GridBase(processor_grid,parent,dummy)
{
Init(dimensions,simd_layout,processor_grid);
}
GridCartesian(const std::vector<int> &dimensions,
const std::vector<int> &simd_layout,
const std::vector<int> &processor_grid,
const GridCartesian &parent,int &split_rank) : GridBase(processor_grid,parent,split_rank)
{
Init(dimensions,simd_layout,processor_grid);
}
/////////////////////////////////////////////////////////////////////////
// Construct from comm world
/////////////////////////////////////////////////////////////////////////
GridCartesian(const std::vector<int> &dimensions,
const std::vector<int> &simd_layout,
const std::vector<int> &processor_grid) : GridBase(processor_grid)
{
Init(dimensions,simd_layout,processor_grid);
}
virtual ~GridCartesian() = default;
void Init(const std::vector<int> &dimensions,
const std::vector<int> &simd_layout,
const std::vector<int> &processor_grid)
{
///////////////////////
// Grid information
///////////////////////
_isCheckerBoarded = false;
_ndimension = dimensions.size();
_fdimensions.resize(_ndimension);
_gdimensions.resize(_ndimension);
_ldimensions.resize(_ndimension);
_rdimensions.resize(_ndimension);
_simd_layout.resize(_ndimension);
_lstart.resize(_ndimension);
_lend.resize(_ndimension);
_ostride.resize(_ndimension);
_istride.resize(_ndimension);
_fsites = _gsites = _osites = _isites = 1;
for (int d = 0; d < _ndimension; d++)
{
_fdimensions[d] = dimensions[d]; // Global dimensions
_gdimensions[d] = _fdimensions[d]; // Global dimensions
_simd_layout[d] = simd_layout[d];
_fsites = _fsites * _fdimensions[d];
_gsites = _gsites * _gdimensions[d];
// Use a reduced simd grid
_ldimensions[d] = _gdimensions[d] / _processors[d]; //local dimensions
//std::cout << _ldimensions[d] << " " << _gdimensions[d] << " " << _processors[d] << std::endl;
assert(_ldimensions[d] * _processors[d] == _gdimensions[d]);
_rdimensions[d] = _ldimensions[d] / _simd_layout[d]; //overdecomposition
assert(_rdimensions[d] * _simd_layout[d] == _ldimensions[d]);
_lstart[d] = _processor_coor[d] * _ldimensions[d];
_lend[d] = _processor_coor[d] * _ldimensions[d] + _ldimensions[d] - 1;
_osites *= _rdimensions[d];
_isites *= _simd_layout[d];
// Addressing support
if (d == 0)
{
_ostride[d] = 1;
_istride[d] = 1;
}
else
{
_ostride[d] = _ostride[d - 1] * _rdimensions[d - 1];
_istride[d] = _istride[d - 1] * _simd_layout[d - 1];
}
}
///////////////////////
// subplane information
///////////////////////
_slice_block.resize(_ndimension);
_slice_stride.resize(_ndimension);
_slice_nblock.resize(_ndimension);
int block = 1;
int nblock = 1;
for (int d = 0; d < _ndimension; d++)
nblock *= _rdimensions[d];
for (int d = 0; d < _ndimension; d++)
{
nblock /= _rdimensions[d];
_slice_block[d] = block;
_slice_stride[d] = _ostride[d] * _rdimensions[d];
_slice_nblock[d] = nblock;
block = block * _rdimensions[d];
}
};
};
}
#endif
+320
View File
@@ -0,0 +1,320 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/cartesian/Cartesian_red_black.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_CARTESIAN_RED_BLACK_H
#define GRID_CARTESIAN_RED_BLACK_H
namespace Grid {
static const int CbRed =0;
static const int CbBlack=1;
static const int Even =CbRed;
static const int Odd =CbBlack;
// Specialise this for red black grids storing half the data like a chess board.
class GridRedBlackCartesian : public GridBase
{
public:
std::vector<int> _checker_dim_mask;
int _checker_dim;
std::vector<int> _checker_board;
virtual int CheckerBoarded(int dim){
if( dim==_checker_dim) return 1;
else return 0;
}
virtual int CheckerBoard(const std::vector<int> &site){
int linear=0;
assert(site.size()==_ndimension);
for(int d=0;d<_ndimension;d++){
if(_checker_dim_mask[d])
linear=linear+site[d];
}
return (linear&0x1);
}
// Depending on the cb of site, we toggle source cb.
// for block #b, element #e = (b, e)
// we need
virtual int CheckerBoardShiftForCB(int source_cb,int dim,int shift,int ocb){
if(dim != _checker_dim) return shift;
int fulldim =_fdimensions[dim];
shift = (shift+fulldim)%fulldim;
// Probably faster with table lookup;
// or by looping over x,y,z and multiply rather than computing checkerboard.
if ( (source_cb+ocb)&1 ) {
return (shift)/2;
} else {
return (shift+1)/2;
}
}
virtual int CheckerBoardFromOindexTable (int Oindex) {
return _checker_board[Oindex];
}
virtual int CheckerBoardFromOindex (int Oindex)
{
std::vector<int> ocoor;
oCoorFromOindex(ocoor,Oindex);
return CheckerBoard(ocoor);
}
virtual int CheckerBoardShift(int source_cb,int dim,int shift,int osite){
if(dim != _checker_dim) return shift;
int ocb=CheckerBoardFromOindex(osite);
return CheckerBoardShiftForCB(source_cb,dim,shift,ocb);
}
virtual int CheckerBoardDestination(int source_cb,int shift,int dim){
if ( _checker_dim_mask[dim] ) {
// If _fdimensions[checker_dim] is odd, then shifting by 1 in other dims
// does NOT cause a parity hop.
int add=(dim==_checker_dim) ? 0 : _fdimensions[_checker_dim];
if ( (shift+add) &0x1) {
return 1-source_cb;
} else {
return source_cb;
}
} else {
return source_cb;
}
};
////////////////////////////////////////////////////////////
// Create Redblack from original grid; require full grid pointer ?
////////////////////////////////////////////////////////////
GridRedBlackCartesian(const GridBase *base) : GridBase(base->_processors,*base)
{
int dims = base->_ndimension;
std::vector<int> checker_dim_mask(dims,1);
int checker_dim = 0;
Init(base->_fdimensions,base->_simd_layout,base->_processors,checker_dim_mask,checker_dim);
};
////////////////////////////////////////////////////////////
// Create redblack from original grid, with non-trivial checker dim mask
////////////////////////////////////////////////////////////
GridRedBlackCartesian(const GridBase *base,
const std::vector<int> &checker_dim_mask,
int checker_dim
) : GridBase(base->_processors,*base)
{
Init(base->_fdimensions,base->_simd_layout,base->_processors,checker_dim_mask,checker_dim) ;
}
virtual ~GridRedBlackCartesian() = default;
#if 0
////////////////////////////////////////////////////////////
// Create redblack grid ;; deprecate these. Should not
// need direct creation of redblack without a full grid to base on
////////////////////////////////////////////////////////////
GridRedBlackCartesian(const GridBase *base,
const std::vector<int> &dimensions,
const std::vector<int> &simd_layout,
const std::vector<int> &processor_grid,
const std::vector<int> &checker_dim_mask,
int checker_dim
) : GridBase(processor_grid,*base)
{
Init(dimensions,simd_layout,processor_grid,checker_dim_mask,checker_dim);
}
////////////////////////////////////////////////////////////
// Create redblack grid
////////////////////////////////////////////////////////////
GridRedBlackCartesian(const GridBase *base,
const std::vector<int> &dimensions,
const std::vector<int> &simd_layout,
const std::vector<int> &processor_grid) : GridBase(processor_grid,*base)
{
std::vector<int> checker_dim_mask(dimensions.size(),1);
int checker_dim = 0;
Init(dimensions,simd_layout,processor_grid,checker_dim_mask,checker_dim);
}
#endif
void Init(const std::vector<int> &dimensions,
const std::vector<int> &simd_layout,
const std::vector<int> &processor_grid,
const std::vector<int> &checker_dim_mask,
int checker_dim)
{
_isCheckerBoarded = true;
_checker_dim = checker_dim;
assert(checker_dim_mask[checker_dim] == 1);
_ndimension = dimensions.size();
assert(checker_dim_mask.size() == _ndimension);
assert(processor_grid.size() == _ndimension);
assert(simd_layout.size() == _ndimension);
_fdimensions.resize(_ndimension);
_gdimensions.resize(_ndimension);
_ldimensions.resize(_ndimension);
_rdimensions.resize(_ndimension);
_simd_layout.resize(_ndimension);
_lstart.resize(_ndimension);
_lend.resize(_ndimension);
_ostride.resize(_ndimension);
_istride.resize(_ndimension);
_fsites = _gsites = _osites = _isites = 1;
_checker_dim_mask = checker_dim_mask;
for (int d = 0; d < _ndimension; d++)
{
_fdimensions[d] = dimensions[d];
_gdimensions[d] = _fdimensions[d];
_fsites = _fsites * _fdimensions[d];
_gsites = _gsites * _gdimensions[d];
if (d == _checker_dim)
{
assert((_gdimensions[d] & 0x1) == 0);
_gdimensions[d] = _gdimensions[d] / 2; // Remove a checkerboard
_gsites /= 2;
}
_ldimensions[d] = _gdimensions[d] / _processors[d];
assert(_ldimensions[d] * _processors[d] == _gdimensions[d]);
_lstart[d] = _processor_coor[d] * _ldimensions[d];
_lend[d] = _processor_coor[d] * _ldimensions[d] + _ldimensions[d] - 1;
// Use a reduced simd grid
_simd_layout[d] = simd_layout[d];
_rdimensions[d] = _ldimensions[d] / _simd_layout[d]; // this is not checking if this is integer
assert(_rdimensions[d] * _simd_layout[d] == _ldimensions[d]);
assert(_rdimensions[d] > 0);
// all elements of a simd vector must have same checkerboard.
// If Ls vectorised, this must still be the case; e.g. dwf rb5d
if (_simd_layout[d] > 1)
{
if (checker_dim_mask[d])
{
assert((_rdimensions[d] & 0x1) == 0);
}
}
_osites *= _rdimensions[d];
_isites *= _simd_layout[d];
// Addressing support
if (d == 0)
{
_ostride[d] = 1;
_istride[d] = 1;
}
else
{
_ostride[d] = _ostride[d - 1] * _rdimensions[d - 1];
_istride[d] = _istride[d - 1] * _simd_layout[d - 1];
}
}
////////////////////////////////////////////////////////////////////////////////////////////
// subplane information
////////////////////////////////////////////////////////////////////////////////////////////
_slice_block.resize(_ndimension);
_slice_stride.resize(_ndimension);
_slice_nblock.resize(_ndimension);
int block = 1;
int nblock = 1;
for (int d = 0; d < _ndimension; d++)
nblock *= _rdimensions[d];
for (int d = 0; d < _ndimension; d++)
{
nblock /= _rdimensions[d];
_slice_block[d] = block;
_slice_stride[d] = _ostride[d] * _rdimensions[d];
_slice_nblock[d] = nblock;
block = block * _rdimensions[d];
}
////////////////////////////////////////////////
// Create a checkerboard lookup table
////////////////////////////////////////////////
int rvol = 1;
for (int d = 0; d < _ndimension; d++)
{
rvol = rvol * _rdimensions[d];
}
_checker_board.resize(rvol);
for (int osite = 0; osite < _osites; osite++)
{
_checker_board[osite] = CheckerBoardFromOindex(osite);
}
};
protected:
virtual int oIndex(std::vector<int> &coor)
{
int idx = 0;
for (int d = 0; d < _ndimension; d++)
{
if (d == _checker_dim)
{
idx += _ostride[d] * ((coor[d] / 2) % _rdimensions[d]);
}
else
{
idx += _ostride[d] * (coor[d] % _rdimensions[d]);
}
}
return idx;
};
virtual int iIndex(std::vector<int> &lcoor)
{
int idx = 0;
for (int d = 0; d < _ndimension; d++)
{
if (d == _checker_dim)
{
idx += _istride[d] * (lcoor[d] / (2 * _rdimensions[d]));
}
else
{
idx += _istride[d] * (lcoor[d] / _rdimensions[d]);
}
}
return idx;
}
};
}
#endif
+34
View File
@@ -0,0 +1,34 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Communicator.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_COMMUNICATOR_H
#define GRID_COMMUNICATOR_H
#include <Grid/communicator/SharedMemory.h>
#include <Grid/communicator/Communicator_base.h>
#endif
+76
View File
@@ -0,0 +1,76 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/communicator/Communicator_none.cc
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/GridCore.h>
#include <fcntl.h>
#include <unistd.h>
#include <limits.h>
#include <sys/mman.h>
namespace Grid {
///////////////////////////////////////////////////////////////
// Info that is setup once and indept of cartesian layout
///////////////////////////////////////////////////////////////
CartesianCommunicator::CommunicatorPolicy_t
CartesianCommunicator::CommunicatorPolicy= CartesianCommunicator::CommunicatorPolicyConcurrent;
int CartesianCommunicator::nCommThreads = -1;
/////////////////////////////////
// Grid information queries
/////////////////////////////////
int CartesianCommunicator::Dimensions(void) { return _ndimension; };
int CartesianCommunicator::IsBoss(void) { return _processor==0; };
int CartesianCommunicator::BossRank(void) { return 0; };
int CartesianCommunicator::ThisRank(void) { return _processor; };
const std::vector<int> & CartesianCommunicator::ThisProcessorCoor(void) { return _processor_coor; };
const std::vector<int> & CartesianCommunicator::ProcessorGrid(void) { return _processors; };
int CartesianCommunicator::ProcessorCount(void) { return _Nprocessors; };
////////////////////////////////////////////////////////////////////////////////
// very VERY rarely (Log, serial RNG) we need world without a grid
////////////////////////////////////////////////////////////////////////////////
void CartesianCommunicator::GlobalSum(ComplexF &c)
{
GlobalSumVector((float *)&c,2);
}
void CartesianCommunicator::GlobalSumVector(ComplexF *c,int N)
{
GlobalSumVector((float *)c,2*N);
}
void CartesianCommunicator::GlobalSum(ComplexD &c)
{
GlobalSumVector((double *)&c,2);
}
void CartesianCommunicator::GlobalSumVector(ComplexD *c,int N)
{
GlobalSumVector((double *)c,2*N);
}
}
+207
View File
@@ -0,0 +1,207 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/communicator/Communicator_base.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_COMMUNICATOR_BASE_H
#define GRID_COMMUNICATOR_BASE_H
///////////////////////////////////
// Processor layout information
///////////////////////////////////
#include <Grid/communicator/SharedMemory.h>
namespace Grid {
class CartesianCommunicator : public SharedMemory {
public:
////////////////////////////////////////////
// Policies
////////////////////////////////////////////
enum CommunicatorPolicy_t { CommunicatorPolicyConcurrent, CommunicatorPolicySequential };
static CommunicatorPolicy_t CommunicatorPolicy;
static void SetCommunicatorPolicy(CommunicatorPolicy_t policy ) { CommunicatorPolicy = policy; }
static int nCommThreads;
////////////////////////////////////////////
// Communicator should know nothing of the physics grid, only processor grid.
////////////////////////////////////////////
int _Nprocessors; // How many in all
std::vector<int> _processors; // Which dimensions get relayed out over processors lanes.
int _processor; // linear processor rank
std::vector<int> _processor_coor; // linear processor coordinate
unsigned long _ndimension;
static Grid_MPI_Comm communicator_world;
Grid_MPI_Comm communicator;
std::vector<Grid_MPI_Comm> communicator_halo;
////////////////////////////////////////////////
// Must call in Grid startup
////////////////////////////////////////////////
static void Init(int *argc, char ***argv);
////////////////////////////////////////////////
// Constructors to sub-divide a parent communicator
// and default to comm world
////////////////////////////////////////////////
CartesianCommunicator(const std::vector<int> &processors,const CartesianCommunicator &parent,int &srank);
CartesianCommunicator(const std::vector<int> &pdimensions_in);
virtual ~CartesianCommunicator();
private:
////////////////////////////////////////////////
// Private initialise from an MPI communicator
// Can use after an MPI_Comm_split, but hidden from user so private
////////////////////////////////////////////////
void InitFromMPICommunicator(const std::vector<int> &processors, Grid_MPI_Comm communicator_base);
public:
////////////////////////////////////////////////////////////////////////////////////////
// Wraps MPI_Cart routines, or implements equivalent on other impls
////////////////////////////////////////////////////////////////////////////////////////
void ShiftedRanks(int dim,int shift,int & source, int & dest);
int RankFromProcessorCoor(std::vector<int> &coor);
void ProcessorCoorFromRank(int rank,std::vector<int> &coor);
int Dimensions(void) ;
int IsBoss(void) ;
int BossRank(void) ;
int ThisRank(void) ;
const std::vector<int> & ThisProcessorCoor(void) ;
const std::vector<int> & ProcessorGrid(void) ;
int ProcessorCount(void) ;
////////////////////////////////////////////////////////////////////////////////
// very VERY rarely (Log, serial RNG) we need world without a grid
////////////////////////////////////////////////////////////////////////////////
static int RankWorld(void) ;
static void BroadcastWorld(int root,void* data, int bytes);
////////////////////////////////////////////////////////////
// Reduction
////////////////////////////////////////////////////////////
void GlobalSum(RealF &);
void GlobalSumVector(RealF *,int N);
void GlobalSum(RealD &);
void GlobalSumVector(RealD *,int N);
void GlobalSum(uint32_t &);
void GlobalSum(uint64_t &);
void GlobalSum(ComplexF &c);
void GlobalSumVector(ComplexF *c,int N);
void GlobalSum(ComplexD &c);
void GlobalSumVector(ComplexD *c,int N);
void GlobalXOR(uint32_t &);
void GlobalXOR(uint64_t &);
template<class obj> void GlobalSum(obj &o){
typedef typename obj::scalar_type scalar_type;
int words = sizeof(obj)/sizeof(scalar_type);
scalar_type * ptr = (scalar_type *)& o;
GlobalSumVector(ptr,words);
}
////////////////////////////////////////////////////////////
// Face exchange, buffer swap in translational invariant way
////////////////////////////////////////////////////////////
void SendToRecvFrom(void *xmit,
int xmit_to_rank,
void *recv,
int recv_from_rank,
int bytes);
void SendRecvPacket(void *xmit,
void *recv,
int xmit_to_rank,
int recv_from_rank,
int bytes);
void SendToRecvFromBegin(std::vector<CommsRequest_t> &list,
void *xmit,
int xmit_to_rank,
void *recv,
int recv_from_rank,
int bytes);
void SendToRecvFromComplete(std::vector<CommsRequest_t> &waitall);
double StencilSendToRecvFrom(void *xmit,
int xmit_to_rank,
void *recv,
int recv_from_rank,
int bytes,int dir);
double StencilSendToRecvFromBegin(std::vector<CommsRequest_t> &list,
void *xmit,
int xmit_to_rank,
void *recv,
int recv_from_rank,
int bytes,int dir);
void StencilSendToRecvFromComplete(std::vector<CommsRequest_t> &waitall,int i);
void StencilBarrier(void);
////////////////////////////////////////////////////////////
// Barrier
////////////////////////////////////////////////////////////
void Barrier(void);
////////////////////////////////////////////////////////////
// Broadcast a buffer and composite larger
////////////////////////////////////////////////////////////
void Broadcast(int root,void* data, int bytes);
////////////////////////////////////////////////////////////
// All2All down one dimension
////////////////////////////////////////////////////////////
template<class T> void AllToAll(int dim,std::vector<T> &in, std::vector<T> &out){
assert(dim>=0);
assert(dim<_ndimension);
assert(in.size()==out.size());
int numnode = _processors[dim];
uint64_t bytes=sizeof(T);
uint64_t words=in.size()/numnode;
assert(numnode * words == in.size());
assert(words < (1ULL<<31));
AllToAll(dim,(void *)&in[0],(void *)&out[0],words,bytes);
}
void AllToAll(int dim ,void *in,void *out,uint64_t words,uint64_t bytes);
void AllToAll(void *in,void *out,uint64_t words ,uint64_t bytes);
template<class obj> void Broadcast(int root,obj &data)
{
Broadcast(root,(void *)&data,sizeof(data));
};
};
}
#endif
+514
View File
@@ -0,0 +1,514 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/communicator/Communicator_mpi.cc
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/GridCore.h>
#include <Grid/communicator/SharedMemory.h>
namespace Grid {
Grid_MPI_Comm CartesianCommunicator::communicator_world;
////////////////////////////////////////////
// First initialise of comms system
////////////////////////////////////////////
void CartesianCommunicator::Init(int *argc, char ***argv)
{
int flag;
int provided;
MPI_Initialized(&flag); // needed to coexist with other libs apparently
if ( !flag ) {
MPI_Init_thread(argc,argv,MPI_THREAD_MULTIPLE,&provided);
//If only 1 comms thread we require any threading mode other than SINGLE, but for multiple comms threads we need MULTIPLE
if( (nCommThreads == 1 && provided == MPI_THREAD_SINGLE) ||
(nCommThreads > 1 && provided != MPI_THREAD_MULTIPLE) )
assert(0);
}
Grid_quiesce_nodes();
// Never clean up as done once.
MPI_Comm_dup (MPI_COMM_WORLD,&communicator_world);
GlobalSharedMemory::Init(communicator_world);
GlobalSharedMemory::SharedMemoryAllocate(
GlobalSharedMemory::MAX_MPI_SHM_BYTES,
GlobalSharedMemory::Hugepages);
}
///////////////////////////////////////////////////////////////////////////
// Use cartesian communicators now even in MPI3
///////////////////////////////////////////////////////////////////////////
void CartesianCommunicator::ShiftedRanks(int dim,int shift,int &source,int &dest)
{
int ierr=MPI_Cart_shift(communicator,dim,shift,&source,&dest);
assert(ierr==0);
}
int CartesianCommunicator::RankFromProcessorCoor(std::vector<int> &coor)
{
int rank;
int ierr=MPI_Cart_rank (communicator, &coor[0], &rank);
assert(ierr==0);
return rank;
}
void CartesianCommunicator::ProcessorCoorFromRank(int rank, std::vector<int> &coor)
{
coor.resize(_ndimension);
int ierr=MPI_Cart_coords (communicator, rank, _ndimension,&coor[0]);
assert(ierr==0);
}
////////////////////////////////////////////////////////////////////////////////////////////////////////
// Initialises from communicator_world
////////////////////////////////////////////////////////////////////////////////////////////////////////
CartesianCommunicator::CartesianCommunicator(const std::vector<int> &processors)
{
MPI_Comm optimal_comm;
////////////////////////////////////////////////////
// Remap using the shared memory optimising routine
// The remap creates a comm which must be freed
////////////////////////////////////////////////////
GlobalSharedMemory::OptimalCommunicator (processors,optimal_comm);
InitFromMPICommunicator(processors,optimal_comm);
SetCommunicator(optimal_comm);
///////////////////////////////////////////////////
// Free the temp communicator
///////////////////////////////////////////////////
MPI_Comm_free(&optimal_comm);
}
//////////////////////////////////
// Try to subdivide communicator
//////////////////////////////////
CartesianCommunicator::CartesianCommunicator(const std::vector<int> &processors,const CartesianCommunicator &parent,int &srank)
{
_ndimension = processors.size();
int parent_ndimension = parent._ndimension; assert(_ndimension >= parent._ndimension);
std::vector<int> parent_processor_coor(_ndimension,0);
std::vector<int> parent_processors (_ndimension,1);
// Can make 5d grid from 4d etc...
int pad = _ndimension-parent_ndimension;
for(int d=0;d<parent_ndimension;d++){
parent_processor_coor[pad+d]=parent._processor_coor[d];
parent_processors [pad+d]=parent._processors[d];
}
//////////////////////////////////////////////////////////////////////////////////////////////////////
// split the communicator
//////////////////////////////////////////////////////////////////////////////////////////////////////
// int Nparent = parent._processors ;
// std::cout << " splitting from communicator "<<parent.communicator <<std::endl;
int Nparent;
MPI_Comm_size(parent.communicator,&Nparent);
// std::cout << " Parent size "<<Nparent <<std::endl;
int childsize=1;
for(int d=0;d<processors.size();d++) {
childsize *= processors[d];
}
int Nchild = Nparent/childsize;
assert (childsize * Nchild == Nparent);
// std::cout << " child size "<<childsize <<std::endl;
std::vector<int> ccoor(_ndimension); // coor within subcommunicator
std::vector<int> scoor(_ndimension); // coor of split within parent
std::vector<int> ssize(_ndimension); // coor of split within parent
for(int d=0;d<_ndimension;d++){
ccoor[d] = parent_processor_coor[d] % processors[d];
scoor[d] = parent_processor_coor[d] / processors[d];
ssize[d] = parent_processors[d] / processors[d];
}
// rank within subcomm ; srank is rank of subcomm within blocks of subcomms
int crank;
// Mpi uses the reverse Lexico convention to us; so reversed routines called
Lexicographic::IndexFromCoorReversed(ccoor,crank,processors); // processors is the split grid dimensions
Lexicographic::IndexFromCoorReversed(scoor,srank,ssize); // ssize is the number of split grids
MPI_Comm comm_split;
if ( Nchild > 1 ) {
if(0){
std::cout << GridLogMessage<<"Child communicator of "<< std::hex << parent.communicator << std::dec<<std::endl;
std::cout << GridLogMessage<<" parent grid["<< parent._ndimension<<"] ";
for(int d=0;d<parent._ndimension;d++) std::cout << parent._processors[d] << " ";
std::cout<<std::endl;
std::cout << GridLogMessage<<" child grid["<< _ndimension <<"] ";
for(int d=0;d<processors.size();d++) std::cout << processors[d] << " ";
std::cout<<std::endl;
std::cout << GridLogMessage<<" old rank "<< parent._processor<<" coor ["<< parent._ndimension <<"] ";
for(int d=0;d<parent._ndimension;d++) std::cout << parent._processor_coor[d] << " ";
std::cout<<std::endl;
std::cout << GridLogMessage<<" new split "<< srank<<" scoor ["<< _ndimension <<"] ";
for(int d=0;d<processors.size();d++) std::cout << scoor[d] << " ";
std::cout<<std::endl;
std::cout << GridLogMessage<<" new rank "<< crank<<" coor ["<< _ndimension <<"] ";
for(int d=0;d<processors.size();d++) std::cout << ccoor[d] << " ";
std::cout<<std::endl;
//////////////////////////////////////////////////////////////////////////////////////////////////////
// Declare victory
//////////////////////////////////////////////////////////////////////////////////////////////////////
std::cout << GridLogMessage<<"Divided communicator "<< parent._Nprocessors<<" into "
<< Nchild <<" communicators with " << childsize << " ranks"<<std::endl;
std::cout << " Split communicator " <<comm_split <<std::endl;
}
////////////////////////////////////////////////////////////////
// Split the communicator
////////////////////////////////////////////////////////////////
int ierr= MPI_Comm_split(parent.communicator,srank,crank,&comm_split);
assert(ierr==0);
} else {
srank = 0;
int ierr = MPI_Comm_dup (parent.communicator,&comm_split);
assert(ierr==0);
}
//////////////////////////////////////////////////////////////////////////////////////////////////////
// Set up from the new split communicator
//////////////////////////////////////////////////////////////////////////////////////////////////////
InitFromMPICommunicator(processors,comm_split);
//////////////////////////////////////////////////////////////////////////////////////////////////////
// Take the right SHM buffers
//////////////////////////////////////////////////////////////////////////////////////////////////////
SetCommunicator(comm_split);
///////////////////////////////////////////////
// Free the temp communicator
///////////////////////////////////////////////
MPI_Comm_free(&comm_split);
if(0){
std::cout << " ndim " <<_ndimension<<" " << parent._ndimension << std::endl;
for(int d=0;d<processors.size();d++){
std::cout << d<< " " << _processor_coor[d] <<" " << ccoor[d]<<std::endl;
}
}
for(int d=0;d<processors.size();d++){
assert(_processor_coor[d] == ccoor[d] );
}
}
void CartesianCommunicator::InitFromMPICommunicator(const std::vector<int> &processors, MPI_Comm communicator_base)
{
////////////////////////////////////////////////////
// Creates communicator, and the communicator_halo
////////////////////////////////////////////////////
_ndimension = processors.size();
_processor_coor.resize(_ndimension);
/////////////////////////////////
// Count the requested nodes
/////////////////////////////////
_Nprocessors=1;
_processors = processors;
for(int i=0;i<_ndimension;i++){
_Nprocessors*=_processors[i];
}
std::vector<int> periodic(_ndimension,1);
MPI_Cart_create(communicator_base, _ndimension,&_processors[0],&periodic[0],0,&communicator);
MPI_Comm_rank(communicator,&_processor);
MPI_Cart_coords(communicator,_processor,_ndimension,&_processor_coor[0]);
if ( 0 && (communicator_base != communicator_world) ) {
std::cout << "InitFromMPICommunicator Cartesian communicator created with a non-world communicator"<<std::endl;
std::cout << " new communicator rank "<<_processor<< " coor ["<<_ndimension<<"] ";
for(int d=0;d<_processors.size();d++){
std::cout << _processor_coor[d]<<" ";
}
std::cout << std::endl;
}
int Size;
MPI_Comm_size(communicator,&Size);
communicator_halo.resize (2*_ndimension);
for(int i=0;i<_ndimension*2;i++){
MPI_Comm_dup(communicator,&communicator_halo[i]);
}
assert(Size==_Nprocessors);
}
CartesianCommunicator::~CartesianCommunicator()
{
int MPI_is_finalised;
MPI_Finalized(&MPI_is_finalised);
if (communicator && !MPI_is_finalised) {
MPI_Comm_free(&communicator);
for(int i=0;i<communicator_halo.size();i++){
MPI_Comm_free(&communicator_halo[i]);
}
}
}
void CartesianCommunicator::GlobalSum(uint32_t &u){
int ierr=MPI_Allreduce(MPI_IN_PLACE,&u,1,MPI_UINT32_T,MPI_SUM,communicator);
assert(ierr==0);
}
void CartesianCommunicator::GlobalSum(uint64_t &u){
int ierr=MPI_Allreduce(MPI_IN_PLACE,&u,1,MPI_UINT64_T,MPI_SUM,communicator);
assert(ierr==0);
}
void CartesianCommunicator::GlobalXOR(uint32_t &u){
int ierr=MPI_Allreduce(MPI_IN_PLACE,&u,1,MPI_UINT32_T,MPI_BXOR,communicator);
assert(ierr==0);
}
void CartesianCommunicator::GlobalXOR(uint64_t &u){
int ierr=MPI_Allreduce(MPI_IN_PLACE,&u,1,MPI_UINT64_T,MPI_BXOR,communicator);
assert(ierr==0);
}
void CartesianCommunicator::GlobalSum(float &f){
int ierr=MPI_Allreduce(MPI_IN_PLACE,&f,1,MPI_FLOAT,MPI_SUM,communicator);
assert(ierr==0);
}
void CartesianCommunicator::GlobalSumVector(float *f,int N)
{
int ierr=MPI_Allreduce(MPI_IN_PLACE,f,N,MPI_FLOAT,MPI_SUM,communicator);
assert(ierr==0);
}
void CartesianCommunicator::GlobalSum(double &d)
{
int ierr = MPI_Allreduce(MPI_IN_PLACE,&d,1,MPI_DOUBLE,MPI_SUM,communicator);
assert(ierr==0);
}
void CartesianCommunicator::GlobalSumVector(double *d,int N)
{
int ierr = MPI_Allreduce(MPI_IN_PLACE,d,N,MPI_DOUBLE,MPI_SUM,communicator);
assert(ierr==0);
}
// Basic Halo comms primitive
void CartesianCommunicator::SendToRecvFrom(void *xmit,
int dest,
void *recv,
int from,
int bytes)
{
std::vector<CommsRequest_t> reqs(0);
// unsigned long xcrc = crc32(0L, Z_NULL, 0);
// unsigned long rcrc = crc32(0L, Z_NULL, 0);
// xcrc = crc32(xcrc,(unsigned char *)xmit,bytes);
SendToRecvFromBegin(reqs,xmit,dest,recv,from,bytes);
SendToRecvFromComplete(reqs);
// rcrc = crc32(rcrc,(unsigned char *)recv,bytes);
// printf("proc %d SendToRecvFrom %d bytes %lx %lx\n",_processor,bytes,xcrc,rcrc);
}
void CartesianCommunicator::SendRecvPacket(void *xmit,
void *recv,
int sender,
int receiver,
int bytes)
{
MPI_Status stat;
assert(sender != receiver);
int tag = sender;
if ( _processor == sender ) {
MPI_Send(xmit, bytes, MPI_CHAR,receiver,tag,communicator);
}
if ( _processor == receiver ) {
MPI_Recv(recv, bytes, MPI_CHAR,sender,tag,communicator,&stat);
}
}
// Basic Halo comms primitive
void CartesianCommunicator::SendToRecvFromBegin(std::vector<CommsRequest_t> &list,
void *xmit,
int dest,
void *recv,
int from,
int bytes)
{
int myrank = _processor;
int ierr;
if ( CommunicatorPolicy == CommunicatorPolicyConcurrent ) {
MPI_Request xrq;
MPI_Request rrq;
ierr =MPI_Irecv(recv, bytes, MPI_CHAR,from,from,communicator,&rrq);
ierr|=MPI_Isend(xmit, bytes, MPI_CHAR,dest,_processor,communicator,&xrq);
assert(ierr==0);
list.push_back(xrq);
list.push_back(rrq);
} else {
// Give the CPU to MPI immediately; can use threads to overlap optionally
ierr=MPI_Sendrecv(xmit,bytes,MPI_CHAR,dest,myrank,
recv,bytes,MPI_CHAR,from, from,
communicator,MPI_STATUS_IGNORE);
assert(ierr==0);
}
}
double CartesianCommunicator::StencilSendToRecvFrom( void *xmit,
int dest,
void *recv,
int from,
int bytes,int dir)
{
std::vector<CommsRequest_t> list;
double offbytes = StencilSendToRecvFromBegin(list,xmit,dest,recv,from,bytes,dir);
StencilSendToRecvFromComplete(list,dir);
return offbytes;
}
double CartesianCommunicator::StencilSendToRecvFromBegin(std::vector<CommsRequest_t> &list,
void *xmit,
int dest,
void *recv,
int from,
int bytes,int dir)
{
int ncomm =communicator_halo.size();
int commdir=dir%ncomm;
MPI_Request xrq;
MPI_Request rrq;
int ierr;
int gdest = ShmRanks[dest];
int gfrom = ShmRanks[from];
int gme = ShmRanks[_processor];
assert(dest != _processor);
assert(from != _processor);
assert(gme == ShmRank);
double off_node_bytes=0.0;
if ( gfrom ==MPI_UNDEFINED) {
ierr=MPI_Irecv(recv, bytes, MPI_CHAR,from,from,communicator_halo[commdir],&rrq);
assert(ierr==0);
list.push_back(rrq);
off_node_bytes+=bytes;
}
if ( gdest == MPI_UNDEFINED ) {
ierr =MPI_Isend(xmit, bytes, MPI_CHAR,dest,_processor,communicator_halo[commdir],&xrq);
assert(ierr==0);
list.push_back(xrq);
off_node_bytes+=bytes;
}
if ( CommunicatorPolicy == CommunicatorPolicySequential ) {
this->StencilSendToRecvFromComplete(list,dir);
}
return off_node_bytes;
}
void CartesianCommunicator::StencilSendToRecvFromComplete(std::vector<CommsRequest_t> &waitall,int dir)
{
SendToRecvFromComplete(waitall);
}
void CartesianCommunicator::StencilBarrier(void)
{
MPI_Barrier (ShmComm);
}
void CartesianCommunicator::SendToRecvFromComplete(std::vector<CommsRequest_t> &list)
{
int nreq=list.size();
if (nreq==0) return;
std::vector<MPI_Status> status(nreq);
int ierr = MPI_Waitall(nreq,&list[0],&status[0]);
assert(ierr==0);
list.resize(0);
}
void CartesianCommunicator::Barrier(void)
{
int ierr = MPI_Barrier(communicator);
assert(ierr==0);
}
void CartesianCommunicator::Broadcast(int root,void* data, int bytes)
{
int ierr=MPI_Bcast(data,
bytes,
MPI_BYTE,
root,
communicator);
assert(ierr==0);
}
int CartesianCommunicator::RankWorld(void){
int r;
MPI_Comm_rank(communicator_world,&r);
return r;
}
void CartesianCommunicator::BroadcastWorld(int root,void* data, int bytes)
{
int ierr= MPI_Bcast(data,
bytes,
MPI_BYTE,
root,
communicator_world);
assert(ierr==0);
}
void CartesianCommunicator::AllToAll(int dim,void *in,void *out,uint64_t words,uint64_t bytes)
{
std::vector<int> row(_ndimension,1);
assert(dim>=0 && dim<_ndimension);
// Split the communicator
row[dim] = _processors[dim];
int me;
CartesianCommunicator Comm(row,*this,me);
Comm.AllToAll(in,out,words,bytes);
}
void CartesianCommunicator::AllToAll(void *in,void *out,uint64_t words,uint64_t bytes)
{
// MPI is a pain and uses "int" arguments
// 64*64*64*128*16 == 500Million elements of data.
// When 24*4 bytes multiples get 50x 10^9 >>> 2x10^9 Y2K bug.
// (Turns up on 32^3 x 64 Gparity too)
MPI_Datatype object;
int iwords;
int ibytes;
iwords = words;
ibytes = bytes;
assert(words == iwords); // safe to cast to int ?
assert(bytes == ibytes); // safe to cast to int ?
MPI_Type_contiguous(ibytes,MPI_BYTE,&object);
MPI_Type_commit(&object);
MPI_Alltoall(in,iwords,object,out,iwords,object,communicator);
MPI_Type_free(&object);
}
}
+165
View File
@@ -0,0 +1,165 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/communicator/Communicator_none.cc
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/GridCore.h>
namespace Grid {
///////////////////////////////////////////////////////////////////////////////////////////////////
// Info that is setup once and indept of cartesian layout
///////////////////////////////////////////////////////////////////////////////////////////////////
Grid_MPI_Comm CartesianCommunicator::communicator_world;
void CartesianCommunicator::Init(int *argc, char *** arv)
{
GlobalSharedMemory::Init(communicator_world);
GlobalSharedMemory::SharedMemoryAllocate(
GlobalSharedMemory::MAX_MPI_SHM_BYTES,
GlobalSharedMemory::Hugepages);
}
CartesianCommunicator::CartesianCommunicator(const std::vector<int> &processors,const CartesianCommunicator &parent,int &srank)
: CartesianCommunicator(processors)
{
srank=0;
SetCommunicator(communicator_world);
}
CartesianCommunicator::CartesianCommunicator(const std::vector<int> &processors)
{
_processors = processors;
_ndimension = processors.size();
_processor_coor.resize(_ndimension);
// Require 1^N processor grid for fake
_Nprocessors=1;
_processor = 0;
for(int d=0;d<_ndimension;d++) {
assert(_processors[d]==1);
_processor_coor[d] = 0;
}
SetCommunicator(communicator_world);
}
CartesianCommunicator::~CartesianCommunicator(){}
void CartesianCommunicator::GlobalSum(float &){}
void CartesianCommunicator::GlobalSumVector(float *,int N){}
void CartesianCommunicator::GlobalSum(double &){}
void CartesianCommunicator::GlobalSum(uint32_t &){}
void CartesianCommunicator::GlobalSum(uint64_t &){}
void CartesianCommunicator::GlobalSumVector(double *,int N){}
void CartesianCommunicator::GlobalXOR(uint32_t &){}
void CartesianCommunicator::GlobalXOR(uint64_t &){}
void CartesianCommunicator::SendRecvPacket(void *xmit,
void *recv,
int xmit_to_rank,
int recv_from_rank,
int bytes)
{
assert(0);
}
// Basic Halo comms primitive -- should never call in single node
void CartesianCommunicator::SendToRecvFrom(void *xmit,
int dest,
void *recv,
int from,
int bytes)
{
assert(0);
}
void CartesianCommunicator::SendToRecvFromBegin(std::vector<CommsRequest_t> &list,
void *xmit,
int dest,
void *recv,
int from,
int bytes)
{
assert(0);
}
void CartesianCommunicator::SendToRecvFromComplete(std::vector<CommsRequest_t> &list)
{
assert(0);
}
void CartesianCommunicator::AllToAll(int dim,void *in,void *out,uint64_t words,uint64_t bytes)
{
bcopy(in,out,bytes*words);
}
void CartesianCommunicator::AllToAll(void *in,void *out,uint64_t words,uint64_t bytes)
{
bcopy(in,out,bytes*words);
}
int CartesianCommunicator::RankWorld(void){return 0;}
void CartesianCommunicator::Barrier(void){}
void CartesianCommunicator::Broadcast(int root,void* data, int bytes) {}
void CartesianCommunicator::BroadcastWorld(int root,void* data, int bytes) { }
int CartesianCommunicator::RankFromProcessorCoor(std::vector<int> &coor) { return 0;}
void CartesianCommunicator::ProcessorCoorFromRank(int rank, std::vector<int> &coor){ coor = _processor_coor; }
void CartesianCommunicator::ShiftedRanks(int dim,int shift,int &source,int &dest)
{
source =0;
dest=0;
}
double CartesianCommunicator::StencilSendToRecvFrom( void *xmit,
int xmit_to_rank,
void *recv,
int recv_from_rank,
int bytes, int dir)
{
std::vector<CommsRequest_t> list;
// Discard the "dir"
SendToRecvFromBegin (list,xmit,xmit_to_rank,recv,recv_from_rank,bytes);
SendToRecvFromComplete(list);
return 2.0*bytes;
}
double CartesianCommunicator::StencilSendToRecvFromBegin(std::vector<CommsRequest_t> &list,
void *xmit,
int xmit_to_rank,
void *recv,
int recv_from_rank,
int bytes, int dir)
{
// Discard the "dir"
SendToRecvFromBegin(list,xmit,xmit_to_rank,recv,recv_from_rank,bytes);
return 2.0*bytes;
}
void CartesianCommunicator::StencilSendToRecvFromComplete(std::vector<CommsRequest_t> &waitall,int dir)
{
SendToRecvFromComplete(waitall);
}
void CartesianCommunicator::StencilBarrier(void){};
}
+92
View File
@@ -0,0 +1,92 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/communicator/SharedMemory.cc
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/GridCore.h>
namespace Grid {
// static data
uint64_t GlobalSharedMemory::MAX_MPI_SHM_BYTES = 1024LL*1024LL*1024LL;
int GlobalSharedMemory::Hugepages = 0;
int GlobalSharedMemory::_ShmSetup;
int GlobalSharedMemory::_ShmAlloc;
uint64_t GlobalSharedMemory::_ShmAllocBytes;
std::vector<void *> GlobalSharedMemory::WorldShmCommBufs;
Grid_MPI_Comm GlobalSharedMemory::WorldShmComm;
int GlobalSharedMemory::WorldShmRank;
int GlobalSharedMemory::WorldShmSize;
std::vector<int> GlobalSharedMemory::WorldShmRanks;
Grid_MPI_Comm GlobalSharedMemory::WorldComm;
int GlobalSharedMemory::WorldSize;
int GlobalSharedMemory::WorldRank;
int GlobalSharedMemory::WorldNodes;
int GlobalSharedMemory::WorldNode;
void GlobalSharedMemory::SharedMemoryFree(void)
{
assert(_ShmAlloc);
assert(_ShmAllocBytes>0);
for(int r=0;r<WorldShmSize;r++){
munmap(WorldShmCommBufs[r],_ShmAllocBytes);
}
_ShmAlloc = 0;
_ShmAllocBytes = 0;
}
/////////////////////////////////
// Alloc, free shmem region
/////////////////////////////////
void *SharedMemory::ShmBufferMalloc(size_t bytes){
// bytes = (bytes+sizeof(vRealD))&(~(sizeof(vRealD)-1));// align up bytes
void *ptr = (void *)heap_top;
heap_top += bytes;
heap_bytes+= bytes;
if (heap_bytes >= heap_size) {
std::cout<< " ShmBufferMalloc exceeded shared heap size -- try increasing with --shm <MB> flag" <<std::endl;
std::cout<< " Parameter specified in units of MB (megabytes) " <<std::endl;
std::cout<< " Current value is " << (heap_size/(1024*1024)) <<std::endl;
assert(heap_bytes<heap_size);
}
return ptr;
}
void SharedMemory::ShmBufferFreeAll(void) {
heap_top =(size_t)ShmBufferSelf();
heap_bytes=0;
}
void *SharedMemory::ShmBufferSelf(void)
{
return ShmCommBufs[ShmRank];
}
}
+165
View File
@@ -0,0 +1,165 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/communicator/SharedMemory.cc
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
// TODO
// 1) move includes into SharedMemory.cc
//
// 2) split shared memory into a) optimal communicator creation from comm world
//
// b) shared memory buffers container
// -- static globally shared; init once
// -- per instance set of buffers.
//
#pragma once
#include <Grid/GridCore.h>
#if defined (GRID_COMMS_MPI3)
#include <mpi.h>
#endif
#include <semaphore.h>
#include <fcntl.h>
#include <unistd.h>
#include <limits.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/mman.h>
#include <zlib.h>
#ifdef HAVE_NUMAIF_H
#include <numaif.h>
#endif
namespace Grid {
#if defined (GRID_COMMS_MPI3)
typedef MPI_Comm Grid_MPI_Comm;
typedef MPI_Request CommsRequest_t;
#else
typedef int CommsRequest_t;
typedef int Grid_MPI_Comm;
#endif
class GlobalSharedMemory {
private:
static const int MAXLOG2RANKSPERNODE = 16;
// Init once lock on the buffer allocation
static int _ShmSetup;
static int _ShmAlloc;
static uint64_t _ShmAllocBytes;
public:
static int ShmSetup(void) { return _ShmSetup; }
static int ShmAlloc(void) { return _ShmAlloc; }
static uint64_t ShmAllocBytes(void) { return _ShmAllocBytes; }
static uint64_t MAX_MPI_SHM_BYTES;
static int Hugepages;
static std::vector<void *> WorldShmCommBufs;
static Grid_MPI_Comm WorldComm;
static int WorldRank;
static int WorldSize;
static Grid_MPI_Comm WorldShmComm;
static int WorldShmRank;
static int WorldShmSize;
static int WorldNodes;
static int WorldNode;
static std::vector<int> WorldShmRanks;
//////////////////////////////////////////////////////////////////////////////////////
// Create an optimal reordered communicator that makes MPI_Cart_create get it right
//////////////////////////////////////////////////////////////////////////////////////
static void Init(Grid_MPI_Comm comm); // Typically MPI_COMM_WORLD
static void OptimalCommunicator(const std::vector<int> &processors,Grid_MPI_Comm & optimal_comm); // Turns MPI_COMM_WORLD into right layout for Cartesian
///////////////////////////////////////////////////
// Provide shared memory facilities off comm world
///////////////////////////////////////////////////
static void SharedMemoryAllocate(uint64_t bytes, int flags);
static void SharedMemoryFree(void);
};
//////////////////////////////
// one per communicator
//////////////////////////////
class SharedMemory
{
private:
static const int MAXLOG2RANKSPERNODE = 16;
size_t heap_top;
size_t heap_bytes;
size_t heap_size;
protected:
Grid_MPI_Comm ShmComm; // for barriers
int ShmRank;
int ShmSize;
std::vector<void *> ShmCommBufs;
std::vector<int> ShmRanks;// Mapping comm ranks to Shm ranks
public:
SharedMemory() {};
~SharedMemory();
///////////////////////////////////////////////////////////////////////////////////////
// set the buffers & sizes
///////////////////////////////////////////////////////////////////////////////////////
void SetCommunicator(Grid_MPI_Comm comm);
////////////////////////////////////////////////////////////////////////
// For this instance ; disjoint buffer sets between splits if split grid
////////////////////////////////////////////////////////////////////////
void ShmBarrier(void);
///////////////////////////////////////////////////
// Call on any instance
///////////////////////////////////////////////////
void SharedMemoryTest(void);
void *ShmBufferSelf(void);
void *ShmBuffer (int rank);
void *ShmBufferTranslate(int rank,void * local_p);
void *ShmBufferMalloc(size_t bytes);
void ShmBufferFreeAll(void) ;
//////////////////////////////////////////////////////////////////////////
// Make info on Nodes & ranks and Shared memory available
//////////////////////////////////////////////////////////////////////////
int NodeCount(void) { return GlobalSharedMemory::WorldNodes;};
int RankCount(void) { return GlobalSharedMemory::WorldSize;};
};
}
+651
View File
@@ -0,0 +1,651 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/communicator/SharedMemory.cc
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/GridCore.h>
#include <pwd.h>
namespace Grid {
/*Construct from an MPI communicator*/
void GlobalSharedMemory::Init(Grid_MPI_Comm comm)
{
assert(_ShmSetup==0);
WorldComm = comm;
MPI_Comm_rank(WorldComm,&WorldRank);
MPI_Comm_size(WorldComm,&WorldSize);
// WorldComm, WorldSize, WorldRank
/////////////////////////////////////////////////////////////////////
// Split into groups that can share memory
/////////////////////////////////////////////////////////////////////
MPI_Comm_split_type(comm, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL,&WorldShmComm);
MPI_Comm_rank(WorldShmComm ,&WorldShmRank);
MPI_Comm_size(WorldShmComm ,&WorldShmSize);
// WorldShmComm, WorldShmSize, WorldShmRank
// WorldNodes
WorldNodes = WorldSize/WorldShmSize;
assert( (WorldNodes * WorldShmSize) == WorldSize );
// FIXME: Check all WorldShmSize are the same ?
/////////////////////////////////////////////////////////////////////
// find world ranks in our SHM group (i.e. which ranks are on our node)
/////////////////////////////////////////////////////////////////////
MPI_Group WorldGroup, ShmGroup;
MPI_Comm_group (WorldComm, &WorldGroup);
MPI_Comm_group (WorldShmComm, &ShmGroup);
std::vector<int> world_ranks(WorldSize); for(int r=0;r<WorldSize;r++) world_ranks[r]=r;
WorldShmRanks.resize(WorldSize);
MPI_Group_translate_ranks (WorldGroup,WorldSize,&world_ranks[0],ShmGroup, &WorldShmRanks[0]);
///////////////////////////////////////////////////////////////////
// Identify who is in my group and nominate the leader
///////////////////////////////////////////////////////////////////
int g=0;
std::vector<int> MyGroup;
MyGroup.resize(WorldShmSize);
for(int rank=0;rank<WorldSize;rank++){
if(WorldShmRanks[rank]!=MPI_UNDEFINED){
assert(g<WorldShmSize);
MyGroup[g++] = rank;
}
}
std::sort(MyGroup.begin(),MyGroup.end(),std::less<int>());
int myleader = MyGroup[0];
std::vector<int> leaders_1hot(WorldSize,0);
std::vector<int> leaders_group(WorldNodes,0);
leaders_1hot [ myleader ] = 1;
///////////////////////////////////////////////////////////////////
// global sum leaders over comm world
///////////////////////////////////////////////////////////////////
int ierr=MPI_Allreduce(MPI_IN_PLACE,&leaders_1hot[0],WorldSize,MPI_INT,MPI_SUM,WorldComm);
assert(ierr==0);
///////////////////////////////////////////////////////////////////
// find the group leaders world rank
///////////////////////////////////////////////////////////////////
int group=0;
for(int l=0;l<WorldSize;l++){
if(leaders_1hot[l]){
leaders_group[group++] = l;
}
}
///////////////////////////////////////////////////////////////////
// Identify the node of the group in which I (and my leader) live
///////////////////////////////////////////////////////////////////
WorldNode=-1;
for(int g=0;g<WorldNodes;g++){
if (myleader == leaders_group[g]){
WorldNode=g;
}
}
assert(WorldNode!=-1);
_ShmSetup=1;
}
// Gray encode support
int BinaryToGray (int binary) {
int gray = (binary>>1)^binary;
return gray;
}
int Log2Size(int TwoToPower,int MAXLOG2)
{
int log2size = -1;
for(int i=0;i<=MAXLOG2;i++){
if ( (0x1<<i) == TwoToPower ) {
log2size = i;
break;
}
}
return log2size;
}
void GlobalSharedMemory::OptimalCommunicator(const std::vector<int> &processors,Grid_MPI_Comm & optimal_comm)
{
#ifdef HYPERCUBE
////////////////////////////////////////////////////////////////
// Assert power of two shm_size.
////////////////////////////////////////////////////////////////
int log2size = Log2Size(WorldShmSize,MAXLOG2RANKSPERNODE);
assert(log2size != -1);
////////////////////////////////////////////////////////////////
// Identify the hypercube coordinate of this node using hostname
////////////////////////////////////////////////////////////////
// n runs 0...7 9...16 18...25 27...34 (8*4) 5 bits
// i runs 0..7 3 bits
// r runs 0..3 2 bits
// 2^10 = 1024 nodes
const int maxhdim = 10;
std::vector<int> HyperCubeCoords(maxhdim,0);
std::vector<int> RootHyperCubeCoords(maxhdim,0);
int R;
int I;
int N;
const int namelen = _POSIX_HOST_NAME_MAX;
char name[namelen];
// Parse ICE-XA hostname to get hypercube location
gethostname(name,namelen);
int nscan = sscanf(name,"r%di%dn%d",&R,&I,&N) ;
assert(nscan==3);
int nlo = N%9;
int nhi = N/9;
uint32_t hypercoor = (R<<8)|(I<<5)|(nhi<<3)|nlo ;
uint32_t rootcoor = hypercoor;
//////////////////////////////////////////////////////////////////
// Print debug info
//////////////////////////////////////////////////////////////////
for(int d=0;d<maxhdim;d++){
HyperCubeCoords[d] = (hypercoor>>d)&0x1;
}
std::string hname(name);
std::cout << "hostname "<<hname<<std::endl;
std::cout << "R " << R << " I " << I << " N "<< N
<< " hypercoor 0x"<<std::hex<<hypercoor<<std::dec<<std::endl;
//////////////////////////////////////////////////////////////////
// broadcast node 0's base coordinate for this partition.
//////////////////////////////////////////////////////////////////
MPI_Bcast(&rootcoor, sizeof(rootcoor), MPI_BYTE, 0, WorldComm);
hypercoor=hypercoor-rootcoor;
assert(hypercoor<WorldSize);
assert(hypercoor>=0);
//////////////////////////////////////
// Printing
//////////////////////////////////////
for(int d=0;d<maxhdim;d++){
HyperCubeCoords[d] = (hypercoor>>d)&0x1;
}
////////////////////////////////////////////////////////////////
// Identify subblock of ranks on node spreading across dims
// in a maximally symmetrical way
////////////////////////////////////////////////////////////////
int ndimension = processors.size();
std::vector<int> processor_coor(ndimension);
std::vector<int> WorldDims = processors; std::vector<int> ShmDims (ndimension,1); std::vector<int> NodeDims (ndimension);
std::vector<int> ShmCoor (ndimension); std::vector<int> NodeCoor (ndimension); std::vector<int> WorldCoor(ndimension);
std::vector<int> HyperCoor(ndimension);
int dim = 0;
for(int l2=0;l2<log2size;l2++){
while ( (WorldDims[dim] / ShmDims[dim]) <= 1 ) dim=(dim+1)%ndimension;
ShmDims[dim]*=2;
dim=(dim+1)%ndimension;
}
////////////////////////////////////////////////////////////////
// Establish torus of processes and nodes with sub-blockings
////////////////////////////////////////////////////////////////
for(int d=0;d<ndimension;d++){
NodeDims[d] = WorldDims[d]/ShmDims[d];
}
////////////////////////////////////////////////////////////////
// Map Hcube according to physical lattice
// must partition. Loop over dims and find out who would join.
////////////////////////////////////////////////////////////////
int hcoor = hypercoor;
for(int d=0;d<ndimension;d++){
int bits = Log2Size(NodeDims[d],MAXLOG2RANKSPERNODE);
int msk = (0x1<<bits)-1;
HyperCoor[d]=hcoor & msk;
HyperCoor[d]=BinaryToGray(HyperCoor[d]); // Space filling curve magic
hcoor = hcoor >> bits;
}
////////////////////////////////////////////////////////////////
// Check processor counts match
////////////////////////////////////////////////////////////////
int Nprocessors=1;
for(int i=0;i<ndimension;i++){
Nprocessors*=processors[i];
}
assert(WorldSize==Nprocessors);
////////////////////////////////////////////////////////////////
// Establish mapping between lexico physics coord and WorldRank
////////////////////////////////////////////////////////////////
int rank;
Lexicographic::CoorFromIndexReversed(NodeCoor,WorldNode ,NodeDims);
for(int d=0;d<ndimension;d++) NodeCoor[d]=HyperCoor[d];
Lexicographic::CoorFromIndexReversed(ShmCoor ,WorldShmRank,ShmDims);
for(int d=0;d<ndimension;d++) WorldCoor[d] = NodeCoor[d]*ShmDims[d]+ShmCoor[d];
Lexicographic::IndexFromCoorReversed(WorldCoor,rank,WorldDims);
/////////////////////////////////////////////////////////////////
// Build the new communicator
/////////////////////////////////////////////////////////////////
int ierr= MPI_Comm_split(WorldComm,0,rank,&optimal_comm);
assert(ierr==0);
#else
////////////////////////////////////////////////////////////////
// Assert power of two shm_size.
////////////////////////////////////////////////////////////////
int log2size = Log2Size(WorldShmSize,MAXLOG2RANKSPERNODE);
assert(log2size != -1);
////////////////////////////////////////////////////////////////
// Identify subblock of ranks on node spreading across dims
// in a maximally symmetrical way
////////////////////////////////////////////////////////////////
int ndimension = processors.size();
std::vector<int> processor_coor(ndimension);
std::vector<int> WorldDims = processors; std::vector<int> ShmDims (ndimension,1); std::vector<int> NodeDims (ndimension);
std::vector<int> ShmCoor (ndimension); std::vector<int> NodeCoor (ndimension); std::vector<int> WorldCoor(ndimension);
int dim = 0;
for(int l2=0;l2<log2size;l2++){
while ( (WorldDims[dim] / ShmDims[dim]) <= 1 ) dim=(dim+1)%ndimension;
ShmDims[dim]*=2;
dim=(dim+1)%ndimension;
}
////////////////////////////////////////////////////////////////
// Establish torus of processes and nodes with sub-blockings
////////////////////////////////////////////////////////////////
for(int d=0;d<ndimension;d++){
NodeDims[d] = WorldDims[d]/ShmDims[d];
}
////////////////////////////////////////////////////////////////
// Check processor counts match
////////////////////////////////////////////////////////////////
int Nprocessors=1;
for(int i=0;i<ndimension;i++){
Nprocessors*=processors[i];
}
assert(WorldSize==Nprocessors);
////////////////////////////////////////////////////////////////
// Establish mapping between lexico physics coord and WorldRank
////////////////////////////////////////////////////////////////
int rank;
Lexicographic::CoorFromIndexReversed(NodeCoor,WorldNode ,NodeDims);
Lexicographic::CoorFromIndexReversed(ShmCoor ,WorldShmRank,ShmDims);
for(int d=0;d<ndimension;d++) WorldCoor[d] = NodeCoor[d]*ShmDims[d]+ShmCoor[d];
Lexicographic::IndexFromCoorReversed(WorldCoor,rank,WorldDims);
/////////////////////////////////////////////////////////////////
// Build the new communicator
/////////////////////////////////////////////////////////////////
int ierr= MPI_Comm_split(WorldComm,0,rank,&optimal_comm);
assert(ierr==0);
#endif
}
////////////////////////////////////////////////////////////////////////////////////////////
// SHMGET
////////////////////////////////////////////////////////////////////////////////////////////
#ifdef GRID_MPI3_SHMGET
void GlobalSharedMemory::SharedMemoryAllocate(uint64_t bytes, int flags)
{
std::cout << "SharedMemoryAllocate "<< bytes<< " shmget implementation "<<std::endl;
assert(_ShmSetup==1);
assert(_ShmAlloc==0);
//////////////////////////////////////////////////////////////////////////////////////////////////////////
// allocate the shared windows for our group
//////////////////////////////////////////////////////////////////////////////////////////////////////////
MPI_Barrier(WorldShmComm);
WorldShmCommBufs.resize(WorldShmSize);
std::vector<int> shmids(WorldShmSize);
if ( WorldShmRank == 0 ) {
for(int r=0;r<WorldShmSize;r++){
size_t size = bytes;
key_t key = IPC_PRIVATE;
int flags = IPC_CREAT | SHM_R | SHM_W;
#ifdef SHM_HUGETLB
if (Hugepages) flags|=SHM_HUGETLB;
#endif
if ((shmids[r]= shmget(key,size, flags)) ==-1) {
int errsv = errno;
printf("Errno %d\n",errsv);
printf("key %d\n",key);
printf("size %lld\n",size);
printf("flags %d\n",flags);
perror("shmget");
exit(1);
}
}
}
MPI_Barrier(WorldShmComm);
MPI_Bcast(&shmids[0],WorldShmSize*sizeof(int),MPI_BYTE,0,WorldShmComm);
MPI_Barrier(WorldShmComm);
for(int r=0;r<WorldShmSize;r++){
WorldShmCommBufs[r] = (uint64_t *)shmat(shmids[r], NULL,0);
if (WorldShmCommBufs[r] == (uint64_t *)-1) {
perror("Shared memory attach failure");
shmctl(shmids[r], IPC_RMID, NULL);
exit(2);
}
}
MPI_Barrier(WorldShmComm);
///////////////////////////////////
// Mark for clean up
///////////////////////////////////
for(int r=0;r<WorldShmSize;r++){
shmctl(shmids[r], IPC_RMID,(struct shmid_ds *)NULL);
}
MPI_Barrier(WorldShmComm);
_ShmAlloc=1;
_ShmAllocBytes = bytes;
}
#endif
////////////////////////////////////////////////////////////////////////////////////////////
// Hugetlbfs mapping intended
////////////////////////////////////////////////////////////////////////////////////////////
#ifdef GRID_MPI3_SHMMMAP
void GlobalSharedMemory::SharedMemoryAllocate(uint64_t bytes, int flags)
{
std::cout << "SharedMemoryAllocate "<< bytes<< " MMAP implementation "<< GRID_SHM_PATH <<std::endl;
assert(_ShmSetup==1);
assert(_ShmAlloc==0);
//////////////////////////////////////////////////////////////////////////////////////////////////////////
// allocate the shared windows for our group
//////////////////////////////////////////////////////////////////////////////////////////////////////////
MPI_Barrier(WorldShmComm);
WorldShmCommBufs.resize(WorldShmSize);
////////////////////////////////////////////////////////////////////////////////////////////
// Hugetlbfs and others map filesystems as mappable huge pages
////////////////////////////////////////////////////////////////////////////////////////////
char shm_name [NAME_MAX];
for(int r=0;r<WorldShmSize;r++){
sprintf(shm_name,GRID_SHM_PATH "/Grid_mpi3_shm_%d_%d",WorldNode,r);
int fd=open(shm_name,O_RDWR|O_CREAT,0666);
if ( fd == -1) {
printf("open %s failed\n",shm_name);
perror("open hugetlbfs");
exit(0);
}
int mmap_flag = MAP_SHARED ;
#ifdef MAP_POPULATE
mmap_flag|=MAP_POPULATE;
#endif
#ifdef MAP_HUGETLB
if ( flags ) mmap_flag |= MAP_HUGETLB;
#endif
void *ptr = (void *) mmap(NULL, bytes, PROT_READ | PROT_WRITE, mmap_flag,fd, 0);
if ( ptr == (void *)MAP_FAILED ) {
printf("mmap %s failed\n",shm_name);
perror("failed mmap"); assert(0);
}
assert(((uint64_t)ptr&0x3F)==0);
close(fd);
WorldShmCommBufs[r] =ptr;
std::cout << "Set WorldShmCommBufs["<<r<<"]="<<ptr<< "("<< bytes<< "bytes)"<<std::endl;
}
_ShmAlloc=1;
_ShmAllocBytes = bytes;
};
#endif // MMAP
#ifdef GRID_MPI3_SHM_NONE
void GlobalSharedMemory::SharedMemoryAllocate(uint64_t bytes, int flags)
{
std::cout << "SharedMemoryAllocate "<< bytes<< " MMAP anonymous implementation "<<std::endl;
assert(_ShmSetup==1);
assert(_ShmAlloc==0);
//////////////////////////////////////////////////////////////////////////////////////////////////////////
// allocate the shared windows for our group
//////////////////////////////////////////////////////////////////////////////////////////////////////////
MPI_Barrier(WorldShmComm);
WorldShmCommBufs.resize(WorldShmSize);
////////////////////////////////////////////////////////////////////////////////////////////
// Hugetlbf and others map filesystems as mappable huge pages
////////////////////////////////////////////////////////////////////////////////////////////
char shm_name [NAME_MAX];
assert(WorldShmSize == 1);
for(int r=0;r<WorldShmSize;r++){
int fd=-1;
int mmap_flag = MAP_SHARED |MAP_ANONYMOUS ;
#ifdef MAP_POPULATE
mmap_flag|=MAP_POPULATE;
#endif
#ifdef MAP_HUGETLB
if ( flags ) mmap_flag |= MAP_HUGETLB;
#endif
void *ptr = (void *) mmap(NULL, bytes, PROT_READ | PROT_WRITE, mmap_flag,fd, 0);
if ( ptr == (void *)MAP_FAILED ) {
printf("mmap %s failed\n",shm_name);
perror("failed mmap"); assert(0);
}
assert(((uint64_t)ptr&0x3F)==0);
close(fd);
WorldShmCommBufs[r] =ptr;
std::cout << "Set WorldShmCommBufs["<<r<<"]="<<ptr<< "("<< bytes<< "bytes)"<<std::endl;
}
_ShmAlloc=1;
_ShmAllocBytes = bytes;
};
#endif // MMAP
#ifdef GRID_MPI3_SHMOPEN
////////////////////////////////////////////////////////////////////////////////////////////
// POSIX SHMOPEN ; as far as I know Linux does not allow EXPLICIT HugePages with this case
// tmpfs (Larry Meadows says) does not support explicit huge page, and this is used for
// the posix shm virtual file system
////////////////////////////////////////////////////////////////////////////////////////////
void GlobalSharedMemory::SharedMemoryAllocate(uint64_t bytes, int flags)
{
std::cout << "SharedMemoryAllocate "<< bytes<< " SHMOPEN implementation "<<std::endl;
assert(_ShmSetup==1);
assert(_ShmAlloc==0);
MPI_Barrier(WorldShmComm);
WorldShmCommBufs.resize(WorldShmSize);
char shm_name [NAME_MAX];
if ( WorldShmRank == 0 ) {
for(int r=0;r<WorldShmSize;r++){
size_t size = bytes;
struct passwd *pw = getpwuid (getuid());
sprintf(shm_name,"/Grid_%s_mpi3_shm_%d_%d",pw->pw_name,WorldNode,r);
shm_unlink(shm_name);
int fd=shm_open(shm_name,O_RDWR|O_CREAT,0666);
if ( fd < 0 ) { perror("failed shm_open"); assert(0); }
ftruncate(fd, size);
int mmap_flag = MAP_SHARED;
#ifdef MAP_POPULATE
mmap_flag |= MAP_POPULATE;
#endif
#ifdef MAP_HUGETLB
if (flags) mmap_flag |= MAP_HUGETLB;
#endif
void * ptr = mmap(NULL,size, PROT_READ | PROT_WRITE, mmap_flag, fd, 0);
std::cout << "Set WorldShmCommBufs["<<r<<"]="<<ptr<< "("<< size<< "bytes)"<<std::endl;
if ( ptr == (void * )MAP_FAILED ) {
perror("failed mmap");
assert(0);
}
assert(((uint64_t)ptr&0x3F)==0);
WorldShmCommBufs[r] =ptr;
close(fd);
}
}
MPI_Barrier(WorldShmComm);
if ( WorldShmRank != 0 ) {
for(int r=0;r<WorldShmSize;r++){
size_t size = bytes ;
struct passwd *pw = getpwuid (getuid());
sprintf(shm_name,"/Grid_%s_mpi3_shm_%d_%d",pw->pw_name,WorldNode,r);
int fd=shm_open(shm_name,O_RDWR,0666);
if ( fd<0 ) { perror("failed shm_open"); assert(0); }
void * ptr = mmap(NULL,size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if ( ptr == MAP_FAILED ) { perror("failed mmap"); assert(0); }
assert(((uint64_t)ptr&0x3F)==0);
WorldShmCommBufs[r] =ptr;
close(fd);
}
}
_ShmAlloc=1;
_ShmAllocBytes = bytes;
}
#endif
////////////////////////////////////////////////////////
// Global shared functionality finished
// Now move to per communicator functionality
////////////////////////////////////////////////////////
void SharedMemory::SetCommunicator(Grid_MPI_Comm comm)
{
int rank, size;
MPI_Comm_rank(comm,&rank);
MPI_Comm_size(comm,&size);
ShmRanks.resize(size);
/////////////////////////////////////////////////////////////////////
// Split into groups that can share memory
/////////////////////////////////////////////////////////////////////
MPI_Comm_split_type(comm, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL,&ShmComm);
MPI_Comm_rank(ShmComm ,&ShmRank);
MPI_Comm_size(ShmComm ,&ShmSize);
ShmCommBufs.resize(ShmSize);
//////////////////////////////////////////////////////////////////////
// Map ShmRank to WorldShmRank and use the right buffer
//////////////////////////////////////////////////////////////////////
assert (GlobalSharedMemory::ShmAlloc()==1);
heap_size = GlobalSharedMemory::ShmAllocBytes();
for(int r=0;r<ShmSize;r++){
uint32_t wsr = (r==ShmRank) ? GlobalSharedMemory::WorldShmRank : 0 ;
MPI_Allreduce(MPI_IN_PLACE,&wsr,1,MPI_UINT32_T,MPI_SUM,ShmComm);
ShmCommBufs[r] = GlobalSharedMemory::WorldShmCommBufs[wsr];
// std::cout << "SetCommunicator ShmCommBufs ["<< r<< "] = "<< ShmCommBufs[r]<< " wsr = "<<wsr<<std::endl;
}
ShmBufferFreeAll();
/////////////////////////////////////////////////////////////////////
// find comm ranks in our SHM group (i.e. which ranks are on our node)
/////////////////////////////////////////////////////////////////////
MPI_Group FullGroup, ShmGroup;
MPI_Comm_group (comm , &FullGroup);
MPI_Comm_group (ShmComm, &ShmGroup);
std::vector<int> ranks(size); for(int r=0;r<size;r++) ranks[r]=r;
MPI_Group_translate_ranks (FullGroup,size,&ranks[0],ShmGroup, &ShmRanks[0]);
}
//////////////////////////////////////////////////////////////////
// On node barrier
//////////////////////////////////////////////////////////////////
void SharedMemory::ShmBarrier(void)
{
MPI_Barrier (ShmComm);
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////
// Test the shared memory is working
//////////////////////////////////////////////////////////////////////////////////////////////////////////
void SharedMemory::SharedMemoryTest(void)
{
ShmBarrier();
if ( ShmRank == 0 ) {
for(int r=0;r<ShmSize;r++){
uint64_t * check = (uint64_t *) ShmCommBufs[r];
check[0] = GlobalSharedMemory::WorldNode;
check[1] = r;
check[2] = 0x5A5A5A;
}
}
ShmBarrier();
for(int r=0;r<ShmSize;r++){
uint64_t * check = (uint64_t *) ShmCommBufs[r];
assert(check[0]==GlobalSharedMemory::WorldNode);
assert(check[1]==r);
assert(check[2]==0x5A5A5A);
}
ShmBarrier();
}
void *SharedMemory::ShmBuffer(int rank)
{
int gpeer = ShmRanks[rank];
if (gpeer == MPI_UNDEFINED){
return NULL;
} else {
return ShmCommBufs[gpeer];
}
}
void *SharedMemory::ShmBufferTranslate(int rank,void * local_p)
{
static int count =0;
int gpeer = ShmRanks[rank];
assert(gpeer!=ShmRank); // never send to self
if (gpeer == MPI_UNDEFINED){
return NULL;
} else {
uint64_t offset = (uint64_t)local_p - (uint64_t)ShmCommBufs[ShmRank];
uint64_t remote = (uint64_t)ShmCommBufs[gpeer]+offset;
return (void *) remote;
}
}
SharedMemory::~SharedMemory()
{
int MPI_is_finalised; MPI_Finalized(&MPI_is_finalised);
if ( !MPI_is_finalised ) {
MPI_Comm_free(&ShmComm);
}
};
}
+128
View File
@@ -0,0 +1,128 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/communicator/SharedMemory.cc
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/GridCore.h>
namespace Grid {
/*Construct from an MPI communicator*/
void GlobalSharedMemory::Init(Grid_MPI_Comm comm)
{
assert(_ShmSetup==0);
WorldComm = 0;
WorldRank = 0;
WorldSize = 1;
WorldShmComm = 0 ;
WorldShmRank = 0 ;
WorldShmSize = 1 ;
WorldNodes = 1 ;
WorldNode = 0 ;
WorldShmRanks.resize(WorldSize); WorldShmRanks[0] = 0;
WorldShmCommBufs.resize(1);
_ShmSetup=1;
}
void GlobalSharedMemory::OptimalCommunicator(const std::vector<int> &processors,Grid_MPI_Comm & optimal_comm)
{
optimal_comm = WorldComm;
}
////////////////////////////////////////////////////////////////////////////////////////////
// Hugetlbfs mapping intended, use anonymous mmap
////////////////////////////////////////////////////////////////////////////////////////////
void GlobalSharedMemory::SharedMemoryAllocate(uint64_t bytes, int flags)
{
void * ShmCommBuf ;
assert(_ShmSetup==1);
assert(_ShmAlloc==0);
int mmap_flag =0;
#ifdef MAP_ANONYMOUS
mmap_flag = mmap_flag| MAP_SHARED | MAP_ANONYMOUS;
#endif
#ifdef MAP_ANON
mmap_flag = mmap_flag| MAP_SHARED | MAP_ANON;
#endif
#ifdef MAP_HUGETLB
if ( flags ) mmap_flag |= MAP_HUGETLB;
#endif
ShmCommBuf =(void *) mmap(NULL, bytes, PROT_READ | PROT_WRITE, mmap_flag, -1, 0);
if (ShmCommBuf == (void *)MAP_FAILED) {
perror("mmap failed ");
exit(EXIT_FAILURE);
}
#ifdef MADV_HUGEPAGE
if (!Hugepages ) madvise(ShmCommBuf,bytes,MADV_HUGEPAGE);
#endif
bzero(ShmCommBuf,bytes);
WorldShmCommBufs[0] = ShmCommBuf;
_ShmAllocBytes=bytes;
_ShmAlloc=1;
};
////////////////////////////////////////////////////////
// Global shared functionality finished
// Now move to per communicator functionality
////////////////////////////////////////////////////////
void SharedMemory::SetCommunicator(Grid_MPI_Comm comm)
{
assert(GlobalSharedMemory::ShmAlloc()==1);
ShmRanks.resize(1);
ShmCommBufs.resize(1);
ShmRanks[0] = 0;
ShmRank = 0;
ShmSize = 1;
//////////////////////////////////////////////////////////////////////
// Map ShmRank to WorldShmRank and use the right buffer
//////////////////////////////////////////////////////////////////////
ShmCommBufs[0] = GlobalSharedMemory::WorldShmCommBufs[0];
heap_size = GlobalSharedMemory::ShmAllocBytes();
ShmBufferFreeAll();
return;
}
//////////////////////////////////////////////////////////////////
// On node barrier
//////////////////////////////////////////////////////////////////
void SharedMemory::ShmBarrier(void){ return ; }
//////////////////////////////////////////////////////////////////////////////////////////////////////////
// Test the shared memory is working
//////////////////////////////////////////////////////////////////////////////////////////////////////////
void SharedMemory::SharedMemoryTest(void) { return; }
void *SharedMemory::ShmBuffer(int rank)
{
return NULL;
}
void *SharedMemory::ShmBufferTranslate(int rank,void * local_p)
{
return NULL;
}
SharedMemory::~SharedMemory()
{};
}
+52
View File
@@ -0,0 +1,52 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Cshift.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef _GRID_CSHIFT_H_
#define _GRID_CSHIFT_H_
#include <Grid/cshift/Cshift_common.h>
#ifdef GRID_COMMS_NONE
#include <Grid/cshift/Cshift_none.h>
#endif
#ifdef GRID_COMMS_MPI
#include <Grid/cshift/Cshift_mpi.h>
#endif
#ifdef GRID_COMMS_MPI3
#include <Grid/cshift/Cshift_mpi.h>
#endif
#ifdef GRID_COMMS_MPIT
#include <Grid/cshift/Cshift_mpi.h>
#endif
#ifdef GRID_COMMS_SHMEM
#include <Grid/cshift/Cshift_mpi.h> // uses same implementation of communicator
#endif
#endif
+391
View File
@@ -0,0 +1,391 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/cshift/Cshift_common.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef _GRID_CSHIFT_COMMON_H_
#define _GRID_CSHIFT_COMMON_H_
namespace Grid {
///////////////////////////////////////////////////////////////////
// Gather for when there is no need to SIMD split
///////////////////////////////////////////////////////////////////
template<class vobj> void
Gather_plane_simple (const Lattice<vobj> &rhs,commVector<vobj> &buffer,int dimension,int plane,int cbmask, int off=0)
{
int rd = rhs._grid->_rdimensions[dimension];
if ( !rhs._grid->CheckerBoarded(dimension) ) {
cbmask = 0x3;
}
int so=plane*rhs._grid->_ostride[dimension]; // base offset for start of plane
int e1=rhs._grid->_slice_nblock[dimension];
int e2=rhs._grid->_slice_block[dimension];
int ent = 0;
static std::vector<std::pair<int,int> > table; table.resize(e1*e2);
int stride=rhs._grid->_slice_stride[dimension];
if ( cbmask == 0x3 ) {
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o = n*stride;
int bo = n*e2;
table[ent++] = std::pair<int,int>(off+bo+b,so+o+b);
}
}
} else {
int bo=0;
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o = n*stride;
int ocb=1<<rhs._grid->CheckerBoardFromOindex(o+b);
if ( ocb &cbmask ) {
table[ent++]=std::pair<int,int> (off+bo++,so+o+b);
}
}
}
}
parallel_for(int i=0;i<ent;i++){
buffer[table[i].first]=rhs._odata[table[i].second];
}
}
///////////////////////////////////////////////////////////////////
// Gather for when there *is* need to SIMD split
///////////////////////////////////////////////////////////////////
template<class vobj> void
Gather_plane_extract(const Lattice<vobj> &rhs,std::vector<typename vobj::scalar_object *> pointers,int dimension,int plane,int cbmask)
{
int rd = rhs._grid->_rdimensions[dimension];
if ( !rhs._grid->CheckerBoarded(dimension) ) {
cbmask = 0x3;
}
int so = plane*rhs._grid->_ostride[dimension]; // base offset for start of plane
int e1=rhs._grid->_slice_nblock[dimension];
int e2=rhs._grid->_slice_block[dimension];
int n1=rhs._grid->_slice_stride[dimension];
if ( cbmask ==0x3){
parallel_for_nest2(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o = n*n1;
int offset = b+n*e2;
vobj temp =rhs._odata[so+o+b];
extract<vobj>(temp,pointers,offset);
}
}
} else {
// Case of SIMD split AND checker dim cannot currently be hit, except in
// Test_cshift_red_black code.
std::cout << " Dense packed buffer WARNING " <<std::endl;
parallel_for_nest2(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o=n*n1;
int ocb=1<<rhs._grid->CheckerBoardFromOindex(o+b);
int offset = b+n*e2;
if ( ocb & cbmask ) {
vobj temp =rhs._odata[so+o+b];
extract<vobj>(temp,pointers,offset);
}
}
}
}
}
//////////////////////////////////////////////////////
// Scatter for when there is no need to SIMD split
//////////////////////////////////////////////////////
template<class vobj> void Scatter_plane_simple (Lattice<vobj> &rhs,commVector<vobj> &buffer, int dimension,int plane,int cbmask)
{
int rd = rhs._grid->_rdimensions[dimension];
if ( !rhs._grid->CheckerBoarded(dimension) ) {
cbmask=0x3;
}
int so = plane*rhs._grid->_ostride[dimension]; // base offset for start of plane
int e1=rhs._grid->_slice_nblock[dimension];
int e2=rhs._grid->_slice_block[dimension];
int stride=rhs._grid->_slice_stride[dimension];
static std::vector<std::pair<int,int> > table; table.resize(e1*e2);
int ent =0;
if ( cbmask ==0x3 ) {
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o =n*rhs._grid->_slice_stride[dimension];
int bo =n*rhs._grid->_slice_block[dimension];
table[ent++] = std::pair<int,int>(so+o+b,bo+b);
}
}
} else {
int bo=0;
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o =n*rhs._grid->_slice_stride[dimension];
int ocb=1<<rhs._grid->CheckerBoardFromOindex(o+b);// Could easily be a table lookup
if ( ocb & cbmask ) {
table[ent++]=std::pair<int,int> (so+o+b,bo++);
}
}
}
}
parallel_for(int i=0;i<ent;i++){
rhs._odata[table[i].first]=buffer[table[i].second];
}
}
//////////////////////////////////////////////////////
// Scatter for when there *is* need to SIMD split
//////////////////////////////////////////////////////
template<class vobj> void Scatter_plane_merge(Lattice<vobj> &rhs,std::vector<typename vobj::scalar_object *> pointers,int dimension,int plane,int cbmask)
{
int rd = rhs._grid->_rdimensions[dimension];
if ( !rhs._grid->CheckerBoarded(dimension) ) {
cbmask=0x3;
}
int so = plane*rhs._grid->_ostride[dimension]; // base offset for start of plane
int e1=rhs._grid->_slice_nblock[dimension];
int e2=rhs._grid->_slice_block[dimension];
if(cbmask ==0x3 ) {
parallel_for_nest2(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o = n*rhs._grid->_slice_stride[dimension];
int offset = b+n*rhs._grid->_slice_block[dimension];
merge(rhs._odata[so+o+b],pointers,offset);
}
}
} else {
// Case of SIMD split AND checker dim cannot currently be hit, except in
// Test_cshift_red_black code.
// std::cout << "Scatter_plane merge assert(0); think this is buggy FIXME "<< std::endl;// think this is buggy FIXME
std::cout<<" Unthreaded warning -- buffer is not densely packed ??"<<std::endl;
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o = n*rhs._grid->_slice_stride[dimension];
int offset = b+n*rhs._grid->_slice_block[dimension];
int ocb=1<<rhs._grid->CheckerBoardFromOindex(o+b);
if ( ocb&cbmask ) {
merge(rhs._odata[so+o+b],pointers,offset);
}
}
}
}
}
//////////////////////////////////////////////////////
// local to node block strided copies
//////////////////////////////////////////////////////
template<class vobj> void Copy_plane(Lattice<vobj>& lhs,const Lattice<vobj> &rhs, int dimension,int lplane,int rplane,int cbmask)
{
int rd = rhs._grid->_rdimensions[dimension];
if ( !rhs._grid->CheckerBoarded(dimension) ) {
cbmask=0x3;
}
int ro = rplane*rhs._grid->_ostride[dimension]; // base offset for start of plane
int lo = lplane*lhs._grid->_ostride[dimension]; // base offset for start of plane
int e1=rhs._grid->_slice_nblock[dimension]; // clearly loop invariant for icpc
int e2=rhs._grid->_slice_block[dimension];
int stride = rhs._grid->_slice_stride[dimension];
static std::vector<std::pair<int,int> > table; table.resize(e1*e2);
int ent=0;
if(cbmask == 0x3 ){
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o =n*stride+b;
table[ent++] = std::pair<int,int>(lo+o,ro+o);
}
}
} else {
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o =n*stride+b;
int ocb=1<<lhs._grid->CheckerBoardFromOindex(o);
if ( ocb&cbmask ) {
table[ent++] = std::pair<int,int>(lo+o,ro+o);
}
}
}
}
parallel_for(int i=0;i<ent;i++){
lhs._odata[table[i].first]=rhs._odata[table[i].second];
}
}
template<class vobj> void Copy_plane_permute(Lattice<vobj>& lhs,const Lattice<vobj> &rhs, int dimension,int lplane,int rplane,int cbmask,int permute_type)
{
int rd = rhs._grid->_rdimensions[dimension];
if ( !rhs._grid->CheckerBoarded(dimension) ) {
cbmask=0x3;
}
int ro = rplane*rhs._grid->_ostride[dimension]; // base offset for start of plane
int lo = lplane*lhs._grid->_ostride[dimension]; // base offset for start of plane
int e1=rhs._grid->_slice_nblock[dimension];
int e2=rhs._grid->_slice_block [dimension];
int stride = rhs._grid->_slice_stride[dimension];
static std::vector<std::pair<int,int> > table; table.resize(e1*e2);
int ent=0;
double t_tab,t_perm;
if ( cbmask == 0x3 ) {
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o =n*stride;
table[ent++] = std::pair<int,int>(lo+o+b,ro+o+b);
}}
} else {
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int o =n*stride;
int ocb=1<<lhs._grid->CheckerBoardFromOindex(o+b);
if ( ocb&cbmask ) table[ent++] = std::pair<int,int>(lo+o+b,ro+o+b);
}}
}
parallel_for(int i=0;i<ent;i++){
permute(lhs._odata[table[i].first],rhs._odata[table[i].second],permute_type);
}
}
//////////////////////////////////////////////////////
// Local to node Cshift
//////////////////////////////////////////////////////
template<class vobj> void Cshift_local(Lattice<vobj>& ret,const Lattice<vobj> &rhs,int dimension,int shift)
{
int sshift[2];
sshift[0] = rhs._grid->CheckerBoardShiftForCB(rhs.checkerboard,dimension,shift,Even);
sshift[1] = rhs._grid->CheckerBoardShiftForCB(rhs.checkerboard,dimension,shift,Odd);
double t_local;
if ( sshift[0] == sshift[1] ) {
Cshift_local(ret,rhs,dimension,shift,0x3);
} else {
Cshift_local(ret,rhs,dimension,shift,0x1);// if checkerboard is unfavourable take two passes
Cshift_local(ret,rhs,dimension,shift,0x2);// both with block stride loop iteration
}
}
template<class vobj> void Cshift_local(Lattice<vobj> &ret,const Lattice<vobj> &rhs,int dimension,int shift,int cbmask)
{
GridBase *grid = rhs._grid;
int fd = grid->_fdimensions[dimension];
int rd = grid->_rdimensions[dimension];
int ld = grid->_ldimensions[dimension];
int gd = grid->_gdimensions[dimension];
int ly = grid->_simd_layout[dimension];
// Map to always positive shift modulo global full dimension.
shift = (shift+fd)%fd;
// the permute type
ret.checkerboard = grid->CheckerBoardDestination(rhs.checkerboard,shift,dimension);
int permute_dim =grid->PermuteDim(dimension);
int permute_type=grid->PermuteType(dimension);
int permute_type_dist;
for(int x=0;x<rd;x++){
int o = 0;
int bo = x * grid->_ostride[dimension];
int cb= (cbmask==0x2)? Odd : Even;
int sshift = grid->CheckerBoardShiftForCB(rhs.checkerboard,dimension,shift,cb);
int sx = (x+sshift)%rd;
// wrap is whether sshift > rd.
// num is sshift mod rd.
//
// shift 7
//
// XoXo YcYc
// oXoX cYcY
// XoXo YcYc
// oXoX cYcY
//
// sshift --
//
// XX YY ; 3
// XX YY ; 0
// XX YY ; 3
// XX YY ; 0
//
int permute_slice=0;
if(permute_dim){
int wrap = sshift/rd; wrap=wrap % ly;
int num = sshift%rd;
if ( x< rd-num ) permute_slice=wrap;
else permute_slice = (wrap+1)%ly;
if ( (ly>2) && (permute_slice) ) {
assert(permute_type & RotateBit);
permute_type_dist = permute_type|permute_slice;
} else {
permute_type_dist = permute_type;
}
}
if ( permute_slice ) Copy_plane_permute(ret,rhs,dimension,x,sx,cbmask,permute_type_dist);
else Copy_plane(ret,rhs,dimension,x,sx,cbmask);
}
}
}
#endif
+262
View File
@@ -0,0 +1,262 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/cshift/Cshift_mpi.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef _GRID_CSHIFT_MPI_H_
#define _GRID_CSHIFT_MPI_H_
namespace Grid {
template<class vobj> Lattice<vobj> Cshift(const Lattice<vobj> &rhs,int dimension,int shift)
{
typedef typename vobj::vector_type vector_type;
typedef typename vobj::scalar_type scalar_type;
Lattice<vobj> ret(rhs._grid);
int fd = rhs._grid->_fdimensions[dimension];
int rd = rhs._grid->_rdimensions[dimension];
// Map to always positive shift modulo global full dimension.
shift = (shift+fd)%fd;
ret.checkerboard = rhs._grid->CheckerBoardDestination(rhs.checkerboard,shift,dimension);
// the permute type
int simd_layout = rhs._grid->_simd_layout[dimension];
int comm_dim = rhs._grid->_processors[dimension] >1 ;
int splice_dim = rhs._grid->_simd_layout[dimension]>1 && (comm_dim);
if ( !comm_dim ) {
//std::cout << "CSHIFT: Cshift_local" <<std::endl;
Cshift_local(ret,rhs,dimension,shift); // Handles checkerboarding
} else if ( splice_dim ) {
//std::cout << "CSHIFT: Cshift_comms_simd call - splice_dim = " << splice_dim << " shift " << shift << " dimension = " << dimension << std::endl;
Cshift_comms_simd(ret,rhs,dimension,shift);
} else {
//std::cout << "CSHIFT: Cshift_comms" <<std::endl;
Cshift_comms(ret,rhs,dimension,shift);
}
return ret;
}
template<class vobj> void Cshift_comms(Lattice<vobj>& ret,const Lattice<vobj> &rhs,int dimension,int shift)
{
int sshift[2];
sshift[0] = rhs._grid->CheckerBoardShiftForCB(rhs.checkerboard,dimension,shift,Even);
sshift[1] = rhs._grid->CheckerBoardShiftForCB(rhs.checkerboard,dimension,shift,Odd);
// std::cout << "Cshift_comms dim "<<dimension<<"cb "<<rhs.checkerboard<<"shift "<<shift<<" sshift " << sshift[0]<<" "<<sshift[1]<<std::endl;
if ( sshift[0] == sshift[1] ) {
// std::cout << "Single pass Cshift_comms" <<std::endl;
Cshift_comms(ret,rhs,dimension,shift,0x3);
} else {
// std::cout << "Two pass Cshift_comms" <<std::endl;
Cshift_comms(ret,rhs,dimension,shift,0x1);// if checkerboard is unfavourable take two passes
Cshift_comms(ret,rhs,dimension,shift,0x2);// both with block stride loop iteration
}
}
template<class vobj> void Cshift_comms_simd(Lattice<vobj>& ret,const Lattice<vobj> &rhs,int dimension,int shift)
{
int sshift[2];
sshift[0] = rhs._grid->CheckerBoardShiftForCB(rhs.checkerboard,dimension,shift,Even);
sshift[1] = rhs._grid->CheckerBoardShiftForCB(rhs.checkerboard,dimension,shift,Odd);
//std::cout << "Cshift_comms_simd dim "<<dimension<<"cb "<<rhs.checkerboard<<"shift "<<shift<<" sshift " << sshift[0]<<" "<<sshift[1]<<std::endl;
if ( sshift[0] == sshift[1] ) {
//std::cout << "Single pass Cshift_comms" <<std::endl;
Cshift_comms_simd(ret,rhs,dimension,shift,0x3);
} else {
//std::cout << "Two pass Cshift_comms" <<std::endl;
Cshift_comms_simd(ret,rhs,dimension,shift,0x1);// if checkerboard is unfavourable take two passes
Cshift_comms_simd(ret,rhs,dimension,shift,0x2);// both with block stride loop iteration
}
}
template<class vobj> void Cshift_comms(Lattice<vobj> &ret,const Lattice<vobj> &rhs,int dimension,int shift,int cbmask)
{
typedef typename vobj::vector_type vector_type;
typedef typename vobj::scalar_type scalar_type;
GridBase *grid=rhs._grid;
Lattice<vobj> temp(rhs._grid);
int fd = rhs._grid->_fdimensions[dimension];
int rd = rhs._grid->_rdimensions[dimension];
int pd = rhs._grid->_processors[dimension];
int simd_layout = rhs._grid->_simd_layout[dimension];
int comm_dim = rhs._grid->_processors[dimension] >1 ;
assert(simd_layout==1);
assert(comm_dim==1);
assert(shift>=0);
assert(shift<fd);
int buffer_size = rhs._grid->_slice_nblock[dimension]*rhs._grid->_slice_block[dimension];
commVector<vobj> send_buf(buffer_size);
commVector<vobj> recv_buf(buffer_size);
int cb= (cbmask==0x2)? Odd : Even;
int sshift= rhs._grid->CheckerBoardShiftForCB(rhs.checkerboard,dimension,shift,cb);
for(int x=0;x<rd;x++){
int sx = (x+sshift)%rd;
int comm_proc = ((x+sshift)/rd)%pd;
if (comm_proc==0) {
Copy_plane(ret,rhs,dimension,x,sx,cbmask);
} else {
int words = send_buf.size();
if (cbmask != 0x3) words=words>>1;
int bytes = words * sizeof(vobj);
Gather_plane_simple (rhs,send_buf,dimension,sx,cbmask);
int rank = grid->_processor;
int recv_from_rank;
int xmit_to_rank;
grid->ShiftedRanks(dimension,comm_proc,xmit_to_rank,recv_from_rank);
grid->SendToRecvFrom((void *)&send_buf[0],
xmit_to_rank,
(void *)&recv_buf[0],
recv_from_rank,
bytes);
grid->Barrier();
Scatter_plane_simple (ret,recv_buf,dimension,x,cbmask);
}
}
}
template<class vobj> void Cshift_comms_simd(Lattice<vobj> &ret,const Lattice<vobj> &rhs,int dimension,int shift,int cbmask)
{
GridBase *grid=rhs._grid;
const int Nsimd = grid->Nsimd();
typedef typename vobj::vector_type vector_type;
typedef typename vobj::scalar_object scalar_object;
typedef typename vobj::scalar_type scalar_type;
int fd = grid->_fdimensions[dimension];
int rd = grid->_rdimensions[dimension];
int ld = grid->_ldimensions[dimension];
int pd = grid->_processors[dimension];
int simd_layout = grid->_simd_layout[dimension];
int comm_dim = grid->_processors[dimension] >1 ;
//std::cout << "Cshift_comms_simd dim "<< dimension << " fd "<<fd<<" rd "<<rd
// << " ld "<<ld<<" pd " << pd<<" simd_layout "<<simd_layout
// << " comm_dim " << comm_dim << " cbmask " << cbmask <<std::endl;
assert(comm_dim==1);
assert(simd_layout==2);
assert(shift>=0);
assert(shift<fd);
int permute_type=grid->PermuteType(dimension);
///////////////////////////////////////////////
// Simd direction uses an extract/merge pair
///////////////////////////////////////////////
int buffer_size = grid->_slice_nblock[dimension]*grid->_slice_block[dimension];
int words = sizeof(vobj)/sizeof(vector_type);
std::vector<commVector<scalar_object> > send_buf_extract(Nsimd,commVector<scalar_object>(buffer_size) );
std::vector<commVector<scalar_object> > recv_buf_extract(Nsimd,commVector<scalar_object>(buffer_size) );
int bytes = buffer_size*sizeof(scalar_object);
std::vector<scalar_object *> pointers(Nsimd); //
std::vector<scalar_object *> rpointers(Nsimd); // received pointers
///////////////////////////////////////////
// Work out what to send where
///////////////////////////////////////////
int cb = (cbmask==0x2)? Odd : Even;
int sshift= grid->CheckerBoardShiftForCB(rhs.checkerboard,dimension,shift,cb);
// loop over outer coord planes orthog to dim
for(int x=0;x<rd;x++){
// FIXME call local permute copy if none are offnode.
for(int i=0;i<Nsimd;i++){
pointers[i] = &send_buf_extract[i][0];
}
int sx = (x+sshift)%rd;
Gather_plane_extract(rhs,pointers,dimension,sx,cbmask);
for(int i=0;i<Nsimd;i++){
int inner_bit = (Nsimd>>(permute_type+1));
int ic= (i&inner_bit)? 1:0;
int my_coor = rd*ic + x;
int nbr_coor = my_coor+sshift;
int nbr_proc = ((nbr_coor)/ld) % pd;// relative shift in processors
int nbr_ic = (nbr_coor%ld)/rd; // inner coord of peer
int nbr_ox = (nbr_coor%rd); // outer coord of peer
int nbr_lane = (i&(~inner_bit));
int recv_from_rank;
int xmit_to_rank;
if (nbr_ic) nbr_lane|=inner_bit;
assert (sx == nbr_ox);
if(nbr_proc){
grid->ShiftedRanks(dimension,nbr_proc,xmit_to_rank,recv_from_rank);
grid->SendToRecvFrom((void *)&send_buf_extract[nbr_lane][0],
xmit_to_rank,
(void *)&recv_buf_extract[i][0],
recv_from_rank,
bytes);
grid->Barrier();
rpointers[i] = &recv_buf_extract[i][0];
} else {
rpointers[i] = &send_buf_extract[nbr_lane][0];
}
}
Scatter_plane_merge(ret,rpointers,dimension,x,cbmask);
}
}
}
#endif
+39
View File
@@ -0,0 +1,39 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/cshift/Cshift_none.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef _GRID_CSHIFT_NONE_H_
#define _GRID_CSHIFT_NONE_H_
namespace Grid {
template<class vobj> Lattice<vobj> Cshift(const Lattice<vobj> &rhs,int dimension,int shift)
{
Lattice<vobj> ret(rhs._grid);
ret.checkerboard = rhs._grid->CheckerBoardDestination(rhs.checkerboard,shift,dimension);
Cshift_local(ret,rhs,dimension,shift);
return ret;
}
}
#endif
+18920
View File
File diff suppressed because it is too large Load Diff
+33
View File
@@ -0,0 +1,33 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Lattice.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_H
#define GRID_LATTICE_H
#include <Grid/lattice/Lattice_base.h>
#endif
+466
View File
@@ -0,0 +1,466 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_ET.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: neo <cossu@post.kek.jp>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_ET_H
#define GRID_LATTICE_ET_H
#include <iostream>
#include <tuple>
#include <typeinfo>
#include <vector>
namespace Grid {
////////////////////////////////////////////////////
// Predicated where support
////////////////////////////////////////////////////
template <class iobj, class vobj, class robj>
inline vobj predicatedWhere(const iobj &predicate, const vobj &iftrue,
const robj &iffalse) {
typename std::remove_const<vobj>::type ret;
typedef typename vobj::scalar_object scalar_object;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
const int Nsimd = vobj::vector_type::Nsimd();
const int words = sizeof(vobj) / sizeof(vector_type);
std::vector<Integer> mask(Nsimd);
std::vector<scalar_object> truevals(Nsimd);
std::vector<scalar_object> falsevals(Nsimd);
extract(iftrue, truevals);
extract(iffalse, falsevals);
extract<vInteger, Integer>(TensorRemove(predicate), mask);
for (int s = 0; s < Nsimd; s++) {
if (mask[s]) falsevals[s] = truevals[s];
}
merge(ret, falsevals);
return ret;
}
////////////////////////////////////////////
// recursive evaluation of expressions; Could
// switch to generic approach with variadics, a la
// Antonin's Lat Sim but the repack to variadic with popped
// from tuple is hideous; C++14 introduces std::make_index_sequence for this
////////////////////////////////////////////
// leaf eval of lattice ; should enable if protect using traits
template <typename T>
using is_lattice = std::is_base_of<LatticeBase, T>;
template <typename T>
using is_lattice_expr = std::is_base_of<LatticeExpressionBase, T>;
template <typename T> using is_lattice_expr = std::is_base_of<LatticeExpressionBase,T >;
//Specialization of getVectorType for lattices
template<typename T>
struct getVectorType<Lattice<T> >{
typedef typename Lattice<T>::vector_object type;
};
template<class sobj>
inline sobj eval(const unsigned int ss, const sobj &arg)
{
return arg;
}
template <class lobj>
inline const lobj &eval(const unsigned int ss, const Lattice<lobj> &arg) {
return arg._odata[ss];
}
// handle nodes in syntax tree
template <typename Op, typename T1>
auto inline eval(
const unsigned int ss,
const LatticeUnaryExpression<Op, T1> &expr) // eval one operand
-> decltype(expr.first.func(eval(ss, std::get<0>(expr.second)))) {
return expr.first.func(eval(ss, std::get<0>(expr.second)));
}
template <typename Op, typename T1, typename T2>
auto inline eval(
const unsigned int ss,
const LatticeBinaryExpression<Op, T1, T2> &expr) // eval two operands
-> decltype(expr.first.func(eval(ss, std::get<0>(expr.second)),
eval(ss, std::get<1>(expr.second)))) {
return expr.first.func(eval(ss, std::get<0>(expr.second)),
eval(ss, std::get<1>(expr.second)));
}
template <typename Op, typename T1, typename T2, typename T3>
auto inline eval(const unsigned int ss,
const LatticeTrinaryExpression<Op, T1, T2, T3>
&expr) // eval three operands
-> decltype(expr.first.func(eval(ss, std::get<0>(expr.second)),
eval(ss, std::get<1>(expr.second)),
eval(ss, std::get<2>(expr.second)))) {
return expr.first.func(eval(ss, std::get<0>(expr.second)),
eval(ss, std::get<1>(expr.second)),
eval(ss, std::get<2>(expr.second)));
}
//////////////////////////////////////////////////////////////////////////
// Obtain the grid from an expression, ensuring conformable. This must follow a
// tree recursion
//////////////////////////////////////////////////////////////////////////
template <class T1,
typename std::enable_if<is_lattice<T1>::value, T1>::type * = nullptr>
inline void GridFromExpression(GridBase *&grid, const T1 &lat) // Lattice leaf
{
if (grid) {
conformable(grid, lat._grid);
}
grid = lat._grid;
}
template <class T1,
typename std::enable_if<!is_lattice<T1>::value, T1>::type * = nullptr>
inline void GridFromExpression(GridBase *&grid,
const T1 &notlat) // non-lattice leaf
{}
template <typename Op, typename T1>
inline void GridFromExpression(GridBase *&grid,
const LatticeUnaryExpression<Op, T1> &expr) {
GridFromExpression(grid, std::get<0>(expr.second)); // recurse
}
template <typename Op, typename T1, typename T2>
inline void GridFromExpression(
GridBase *&grid, const LatticeBinaryExpression<Op, T1, T2> &expr) {
GridFromExpression(grid, std::get<0>(expr.second)); // recurse
GridFromExpression(grid, std::get<1>(expr.second));
}
template <typename Op, typename T1, typename T2, typename T3>
inline void GridFromExpression(
GridBase *&grid, const LatticeTrinaryExpression<Op, T1, T2, T3> &expr) {
GridFromExpression(grid, std::get<0>(expr.second)); // recurse
GridFromExpression(grid, std::get<1>(expr.second));
GridFromExpression(grid, std::get<2>(expr.second));
}
//////////////////////////////////////////////////////////////////////////
// Obtain the CB from an expression, ensuring conformable. This must follow a
// tree recursion
//////////////////////////////////////////////////////////////////////////
template <class T1,
typename std::enable_if<is_lattice<T1>::value, T1>::type * = nullptr>
inline void CBFromExpression(int &cb, const T1 &lat) // Lattice leaf
{
if ((cb == Odd) || (cb == Even)) {
assert(cb == lat.checkerboard);
}
cb = lat.checkerboard;
// std::cout<<GridLogMessage<<"Lattice leaf cb "<<cb<<std::endl;
}
template <class T1,
typename std::enable_if<!is_lattice<T1>::value, T1>::type * = nullptr>
inline void CBFromExpression(int &cb, const T1 &notlat) // non-lattice leaf
{
// std::cout<<GridLogMessage<<"Non lattice leaf cb"<<cb<<std::endl;
}
template <typename Op, typename T1>
inline void CBFromExpression(int &cb,
const LatticeUnaryExpression<Op, T1> &expr) {
CBFromExpression(cb, std::get<0>(expr.second)); // recurse
// std::cout<<GridLogMessage<<"Unary node cb "<<cb<<std::endl;
}
template <typename Op, typename T1, typename T2>
inline void CBFromExpression(int &cb,
const LatticeBinaryExpression<Op, T1, T2> &expr) {
CBFromExpression(cb, std::get<0>(expr.second)); // recurse
CBFromExpression(cb, std::get<1>(expr.second));
// std::cout<<GridLogMessage<<"Binary node cb "<<cb<<std::endl;
}
template <typename Op, typename T1, typename T2, typename T3>
inline void CBFromExpression(
int &cb, const LatticeTrinaryExpression<Op, T1, T2, T3> &expr) {
CBFromExpression(cb, std::get<0>(expr.second)); // recurse
CBFromExpression(cb, std::get<1>(expr.second));
CBFromExpression(cb, std::get<2>(expr.second));
// std::cout<<GridLogMessage<<"Trinary node cb "<<cb<<std::endl;
}
////////////////////////////////////////////
// Unary operators and funcs
////////////////////////////////////////////
#define GridUnopClass(name, ret) \
template <class arg> \
struct name { \
static auto inline func(const arg a) -> decltype(ret) { return ret; } \
};
GridUnopClass(UnarySub, -a);
GridUnopClass(UnaryNot, Not(a));
GridUnopClass(UnaryAdj, adj(a));
GridUnopClass(UnaryConj, conjugate(a));
GridUnopClass(UnaryTrace, trace(a));
GridUnopClass(UnaryTranspose, transpose(a));
GridUnopClass(UnaryTa, Ta(a));
GridUnopClass(UnaryProjectOnGroup, ProjectOnGroup(a));
GridUnopClass(UnaryReal, real(a));
GridUnopClass(UnaryImag, imag(a));
GridUnopClass(UnaryToReal, toReal(a));
GridUnopClass(UnaryToComplex, toComplex(a));
GridUnopClass(UnaryTimesI, timesI(a));
GridUnopClass(UnaryTimesMinusI, timesMinusI(a));
GridUnopClass(UnaryAbs, abs(a));
GridUnopClass(UnarySqrt, sqrt(a));
GridUnopClass(UnaryRsqrt, rsqrt(a));
GridUnopClass(UnarySin, sin(a));
GridUnopClass(UnaryCos, cos(a));
GridUnopClass(UnaryAsin, asin(a));
GridUnopClass(UnaryAcos, acos(a));
GridUnopClass(UnaryLog, log(a));
GridUnopClass(UnaryExp, exp(a));
////////////////////////////////////////////
// Binary operators
////////////////////////////////////////////
#define GridBinOpClass(name, combination) \
template <class left, class right> \
struct name { \
static auto inline func(const left &lhs, const right &rhs) \
-> decltype(combination) const { \
return combination; \
} \
}
GridBinOpClass(BinaryAdd, lhs + rhs);
GridBinOpClass(BinarySub, lhs - rhs);
GridBinOpClass(BinaryMul, lhs *rhs);
GridBinOpClass(BinaryDiv, lhs /rhs);
GridBinOpClass(BinaryAnd, lhs &rhs);
GridBinOpClass(BinaryOr, lhs | rhs);
GridBinOpClass(BinaryAndAnd, lhs &&rhs);
GridBinOpClass(BinaryOrOr, lhs || rhs);
////////////////////////////////////////////////////
// Trinary conditional op
////////////////////////////////////////////////////
#define GridTrinOpClass(name, combination) \
template <class predicate, class left, class right> \
struct name { \
static auto inline func(const predicate &pred, const left &lhs, \
const right &rhs) -> decltype(combination) const { \
return combination; \
} \
}
GridTrinOpClass(
TrinaryWhere,
(predicatedWhere<predicate, typename std::remove_reference<left>::type,
typename std::remove_reference<right>::type>(pred, lhs,
rhs)));
////////////////////////////////////////////
// Operator syntactical glue
////////////////////////////////////////////
#define GRID_UNOP(name) name<decltype(eval(0, arg))>
#define GRID_BINOP(name) name<decltype(eval(0, lhs)), decltype(eval(0, rhs))>
#define GRID_TRINOP(name) \
name<decltype(eval(0, pred)), decltype(eval(0, lhs)), decltype(eval(0, rhs))>
#define GRID_DEF_UNOP(op, name) \
template <typename T1, \
typename std::enable_if<is_lattice<T1>::value || \
is_lattice_expr<T1>::value, \
T1>::type * = nullptr> \
inline auto op(const T1 &arg) \
->decltype(LatticeUnaryExpression<GRID_UNOP(name), const T1 &>( \
std::make_pair(GRID_UNOP(name)(), std::forward_as_tuple(arg)))) { \
return LatticeUnaryExpression<GRID_UNOP(name), const T1 &>( \
std::make_pair(GRID_UNOP(name)(), std::forward_as_tuple(arg))); \
}
#define GRID_BINOP_LEFT(op, name) \
template <typename T1, typename T2, \
typename std::enable_if<is_lattice<T1>::value || \
is_lattice_expr<T1>::value, \
T1>::type * = nullptr> \
inline auto op(const T1 &lhs, const T2 &rhs) \
->decltype( \
LatticeBinaryExpression<GRID_BINOP(name), const T1 &, const T2 &>( \
std::make_pair(GRID_BINOP(name)(), \
std::forward_as_tuple(lhs, rhs)))) { \
return LatticeBinaryExpression<GRID_BINOP(name), const T1 &, const T2 &>( \
std::make_pair(GRID_BINOP(name)(), std::forward_as_tuple(lhs, rhs))); \
}
#define GRID_BINOP_RIGHT(op, name) \
template <typename T1, typename T2, \
typename std::enable_if<!is_lattice<T1>::value && \
!is_lattice_expr<T1>::value, \
T1>::type * = nullptr, \
typename std::enable_if<is_lattice<T2>::value || \
is_lattice_expr<T2>::value, \
T2>::type * = nullptr> \
inline auto op(const T1 &lhs, const T2 &rhs) \
->decltype( \
LatticeBinaryExpression<GRID_BINOP(name), const T1 &, const T2 &>( \
std::make_pair(GRID_BINOP(name)(), \
std::forward_as_tuple(lhs, rhs)))) { \
return LatticeBinaryExpression<GRID_BINOP(name), const T1 &, const T2 &>( \
std::make_pair(GRID_BINOP(name)(), std::forward_as_tuple(lhs, rhs))); \
}
#define GRID_DEF_BINOP(op, name) \
GRID_BINOP_LEFT(op, name); \
GRID_BINOP_RIGHT(op, name);
#define GRID_DEF_TRINOP(op, name) \
template <typename T1, typename T2, typename T3> \
inline auto op(const T1 &pred, const T2 &lhs, const T3 &rhs) \
->decltype( \
LatticeTrinaryExpression<GRID_TRINOP(name), const T1 &, const T2 &, \
const T3 &>(std::make_pair( \
GRID_TRINOP(name)(), std::forward_as_tuple(pred, lhs, rhs)))) { \
return LatticeTrinaryExpression<GRID_TRINOP(name), const T1 &, const T2 &, \
const T3 &>(std::make_pair( \
GRID_TRINOP(name)(), std::forward_as_tuple(pred, lhs, rhs))); \
}
////////////////////////
// Operator definitions
////////////////////////
GRID_DEF_UNOP(operator-, UnarySub);
GRID_DEF_UNOP(Not, UnaryNot);
GRID_DEF_UNOP(operator!, UnaryNot);
GRID_DEF_UNOP(adj, UnaryAdj);
GRID_DEF_UNOP(conjugate, UnaryConj);
GRID_DEF_UNOP(trace, UnaryTrace);
GRID_DEF_UNOP(transpose, UnaryTranspose);
GRID_DEF_UNOP(Ta, UnaryTa);
GRID_DEF_UNOP(ProjectOnGroup, UnaryProjectOnGroup);
GRID_DEF_UNOP(real, UnaryReal);
GRID_DEF_UNOP(imag, UnaryImag);
GRID_DEF_UNOP(toReal, UnaryToReal);
GRID_DEF_UNOP(toComplex, UnaryToComplex);
GRID_DEF_UNOP(timesI, UnaryTimesI);
GRID_DEF_UNOP(timesMinusI, UnaryTimesMinusI);
GRID_DEF_UNOP(abs, UnaryAbs); // abs overloaded in cmath C++98; DON'T do the
// abs-fabs-dabs-labs thing
GRID_DEF_UNOP(sqrt, UnarySqrt);
GRID_DEF_UNOP(rsqrt, UnaryRsqrt);
GRID_DEF_UNOP(sin, UnarySin);
GRID_DEF_UNOP(cos, UnaryCos);
GRID_DEF_UNOP(asin, UnaryAsin);
GRID_DEF_UNOP(acos, UnaryAcos);
GRID_DEF_UNOP(log, UnaryLog);
GRID_DEF_UNOP(exp, UnaryExp);
GRID_DEF_BINOP(operator+, BinaryAdd);
GRID_DEF_BINOP(operator-, BinarySub);
GRID_DEF_BINOP(operator*, BinaryMul);
GRID_DEF_BINOP(operator/, BinaryDiv);
GRID_DEF_BINOP(operator&, BinaryAnd);
GRID_DEF_BINOP(operator|, BinaryOr);
GRID_DEF_BINOP(operator&&, BinaryAndAnd);
GRID_DEF_BINOP(operator||, BinaryOrOr);
GRID_DEF_TRINOP(where, TrinaryWhere);
/////////////////////////////////////////////////////////////
// Closure convenience to force expression to evaluate
/////////////////////////////////////////////////////////////
template <class Op, class T1>
auto closure(const LatticeUnaryExpression<Op, T1> &expr)
-> Lattice<decltype(expr.first.func(eval(0, std::get<0>(expr.second))))> {
Lattice<decltype(expr.first.func(eval(0, std::get<0>(expr.second))))> ret(
expr);
return ret;
}
template <class Op, class T1, class T2>
auto closure(const LatticeBinaryExpression<Op, T1, T2> &expr)
-> Lattice<decltype(expr.first.func(eval(0, std::get<0>(expr.second)),
eval(0, std::get<1>(expr.second))))> {
Lattice<decltype(expr.first.func(eval(0, std::get<0>(expr.second)),
eval(0, std::get<1>(expr.second))))>
ret(expr);
return ret;
}
template <class Op, class T1, class T2, class T3>
auto closure(const LatticeTrinaryExpression<Op, T1, T2, T3> &expr)
-> Lattice<decltype(expr.first.func(eval(0, std::get<0>(expr.second)),
eval(0, std::get<1>(expr.second)),
eval(0, std::get<2>(expr.second))))> {
Lattice<decltype(expr.first.func(eval(0, std::get<0>(expr.second)),
eval(0, std::get<1>(expr.second)),
eval(0, std::get<2>(expr.second))))>
ret(expr);
return ret;
}
#undef GRID_UNOP
#undef GRID_BINOP
#undef GRID_TRINOP
#undef GRID_DEF_UNOP
#undef GRID_DEF_BINOP
#undef GRID_DEF_TRINOP
}
#if 0
using namespace Grid;
int main(int argc,char **argv){
Lattice<double> v1(16);
Lattice<double> v2(16);
Lattice<double> v3(16);
BinaryAdd<double,double> tmp;
LatticeBinaryExpression<BinaryAdd<double,double>,Lattice<double> &,Lattice<double> &>
expr(std::make_pair(tmp,
std::forward_as_tuple(v1,v2)));
tmp.func(eval(0,v1),eval(0,v2));
auto var = v1+v2;
std::cout<<GridLogMessage<<typeid(var).name()<<std::endl;
v3=v1+v2;
v3=v1+v2+v1*v2;
};
void testit(Lattice<double> &v1,Lattice<double> &v2,Lattice<double> &v3)
{
v3=v1+v2+v1*v2;
}
#endif
#endif
+255
View File
@@ -0,0 +1,255 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_arith.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_ARITH_H
#define GRID_LATTICE_ARITH_H
namespace Grid {
//////////////////////////////////////////////////////////////////////////////////////////////////////
// avoid copy back routines for mult, mac, sub, add
//////////////////////////////////////////////////////////////////////////////////////////////////////
template<class obj1,class obj2,class obj3> strong_inline
void mult(Lattice<obj1> &ret,const Lattice<obj2> &lhs,const Lattice<obj3> &rhs){
ret.checkerboard = lhs.checkerboard;
conformable(ret,rhs);
conformable(lhs,rhs);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
#ifdef STREAMING_STORES
obj1 tmp;
mult(&tmp,&lhs._odata[ss],&rhs._odata[ss]);
vstream(ret._odata[ss],tmp);
#else
mult(&ret._odata[ss],&lhs._odata[ss],&rhs._odata[ss]);
#endif
}
}
template<class obj1,class obj2,class obj3> strong_inline
void mac(Lattice<obj1> &ret,const Lattice<obj2> &lhs,const Lattice<obj3> &rhs){
ret.checkerboard = lhs.checkerboard;
conformable(ret,rhs);
conformable(lhs,rhs);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
#ifdef STREAMING_STORES
obj1 tmp;
mac(&tmp,&lhs._odata[ss],&rhs._odata[ss]);
vstream(ret._odata[ss],tmp);
#else
mac(&ret._odata[ss],&lhs._odata[ss],&rhs._odata[ss]);
#endif
}
}
template<class obj1,class obj2,class obj3> strong_inline
void sub(Lattice<obj1> &ret,const Lattice<obj2> &lhs,const Lattice<obj3> &rhs){
ret.checkerboard = lhs.checkerboard;
conformable(ret,rhs);
conformable(lhs,rhs);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
#ifdef STREAMING_STORES
obj1 tmp;
sub(&tmp,&lhs._odata[ss],&rhs._odata[ss]);
vstream(ret._odata[ss],tmp);
#else
sub(&ret._odata[ss],&lhs._odata[ss],&rhs._odata[ss]);
#endif
}
}
template<class obj1,class obj2,class obj3> strong_inline
void add(Lattice<obj1> &ret,const Lattice<obj2> &lhs,const Lattice<obj3> &rhs){
ret.checkerboard = lhs.checkerboard;
conformable(ret,rhs);
conformable(lhs,rhs);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
#ifdef STREAMING_STORES
obj1 tmp;
add(&tmp,&lhs._odata[ss],&rhs._odata[ss]);
vstream(ret._odata[ss],tmp);
#else
add(&ret._odata[ss],&lhs._odata[ss],&rhs._odata[ss]);
#endif
}
}
//////////////////////////////////////////////////////////////////////////////////////////////////////
// avoid copy back routines for mult, mac, sub, add
//////////////////////////////////////////////////////////////////////////////////////////////////////
template<class obj1,class obj2,class obj3> strong_inline
void mult(Lattice<obj1> &ret,const Lattice<obj2> &lhs,const obj3 &rhs){
ret.checkerboard = lhs.checkerboard;
conformable(lhs,ret);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
obj1 tmp;
mult(&tmp,&lhs._odata[ss],&rhs);
vstream(ret._odata[ss],tmp);
}
}
template<class obj1,class obj2,class obj3> strong_inline
void mac(Lattice<obj1> &ret,const Lattice<obj2> &lhs,const obj3 &rhs){
ret.checkerboard = lhs.checkerboard;
conformable(ret,lhs);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
obj1 tmp;
mac(&tmp,&lhs._odata[ss],&rhs);
vstream(ret._odata[ss],tmp);
}
}
template<class obj1,class obj2,class obj3> strong_inline
void sub(Lattice<obj1> &ret,const Lattice<obj2> &lhs,const obj3 &rhs){
ret.checkerboard = lhs.checkerboard;
conformable(ret,lhs);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
#ifdef STREAMING_STORES
obj1 tmp;
sub(&tmp,&lhs._odata[ss],&rhs);
vstream(ret._odata[ss],tmp);
#else
sub(&ret._odata[ss],&lhs._odata[ss],&rhs);
#endif
}
}
template<class obj1,class obj2,class obj3> strong_inline
void add(Lattice<obj1> &ret,const Lattice<obj2> &lhs,const obj3 &rhs){
ret.checkerboard = lhs.checkerboard;
conformable(lhs,ret);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
#ifdef STREAMING_STORES
obj1 tmp;
add(&tmp,&lhs._odata[ss],&rhs);
vstream(ret._odata[ss],tmp);
#else
add(&ret._odata[ss],&lhs._odata[ss],&rhs);
#endif
}
}
//////////////////////////////////////////////////////////////////////////////////////////////////////
// avoid copy back routines for mult, mac, sub, add
//////////////////////////////////////////////////////////////////////////////////////////////////////
template<class obj1,class obj2,class obj3> strong_inline
void mult(Lattice<obj1> &ret,const obj2 &lhs,const Lattice<obj3> &rhs){
ret.checkerboard = rhs.checkerboard;
conformable(ret,rhs);
parallel_for(int ss=0;ss<rhs._grid->oSites();ss++){
#ifdef STREAMING_STORES
obj1 tmp;
mult(&tmp,&lhs,&rhs._odata[ss]);
vstream(ret._odata[ss],tmp);
#else
mult(&ret._odata[ss],&lhs,&rhs._odata[ss]);
#endif
}
}
template<class obj1,class obj2,class obj3> strong_inline
void mac(Lattice<obj1> &ret,const obj2 &lhs,const Lattice<obj3> &rhs){
ret.checkerboard = rhs.checkerboard;
conformable(ret,rhs);
parallel_for(int ss=0;ss<rhs._grid->oSites();ss++){
#ifdef STREAMING_STORES
obj1 tmp;
mac(&tmp,&lhs,&rhs._odata[ss]);
vstream(ret._odata[ss],tmp);
#else
mac(&ret._odata[ss],&lhs,&rhs._odata[ss]);
#endif
}
}
template<class obj1,class obj2,class obj3> strong_inline
void sub(Lattice<obj1> &ret,const obj2 &lhs,const Lattice<obj3> &rhs){
ret.checkerboard = rhs.checkerboard;
conformable(ret,rhs);
parallel_for(int ss=0;ss<rhs._grid->oSites();ss++){
#ifdef STREAMING_STORES
obj1 tmp;
sub(&tmp,&lhs,&rhs._odata[ss]);
vstream(ret._odata[ss],tmp);
#else
sub(&ret._odata[ss],&lhs,&rhs._odata[ss]);
#endif
}
}
template<class obj1,class obj2,class obj3> strong_inline
void add(Lattice<obj1> &ret,const obj2 &lhs,const Lattice<obj3> &rhs){
ret.checkerboard = rhs.checkerboard;
conformable(ret,rhs);
parallel_for(int ss=0;ss<rhs._grid->oSites();ss++){
#ifdef STREAMING_STORES
obj1 tmp;
add(&tmp,&lhs,&rhs._odata[ss]);
vstream(ret._odata[ss],tmp);
#else
add(&ret._odata[ss],&lhs,&rhs._odata[ss]);
#endif
}
}
template<class sobj,class vobj> strong_inline
void axpy(Lattice<vobj> &ret,sobj a,const Lattice<vobj> &x,const Lattice<vobj> &y){
ret.checkerboard = x.checkerboard;
conformable(ret,x);
conformable(x,y);
parallel_for(int ss=0;ss<x._grid->oSites();ss++){
#ifdef STREAMING_STORES
vobj tmp = a*x._odata[ss]+y._odata[ss];
vstream(ret._odata[ss],tmp);
#else
ret._odata[ss]=a*x._odata[ss]+y._odata[ss];
#endif
}
}
template<class sobj,class vobj> strong_inline
void axpby(Lattice<vobj> &ret,sobj a,sobj b,const Lattice<vobj> &x,const Lattice<vobj> &y){
ret.checkerboard = x.checkerboard;
conformable(ret,x);
conformable(x,y);
parallel_for(int ss=0;ss<x._grid->oSites();ss++){
#ifdef STREAMING_STORES
vobj tmp = a*x._odata[ss]+b*y._odata[ss];
vstream(ret._odata[ss],tmp);
#else
ret._odata[ss]=a*x._odata[ss]+b*y._odata[ss];
#endif
}
}
template<class sobj,class vobj> strong_inline
RealD axpy_norm(Lattice<vobj> &ret,sobj a,const Lattice<vobj> &x,const Lattice<vobj> &y){
return axpy_norm_fast(ret,a,x,y);
}
template<class sobj,class vobj> strong_inline
RealD axpby_norm(Lattice<vobj> &ret,sobj a,sobj b,const Lattice<vobj> &x,const Lattice<vobj> &y){
return axpby_norm_fast(ret,a,b,x,y);
}
}
#endif
+375
View File
@@ -0,0 +1,375 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_base.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_BASE_H
#define GRID_LATTICE_BASE_H
#define STREAMING_STORES
namespace Grid {
// TODO:
// mac,real,imag
// Functionality:
// -=,+=,*=,()
// add,+,sub,-,mult,mac,*
// adj,conjugate
// real,imag
// transpose,transposeIndex
// trace,traceIndex
// peekIndex
// innerProduct,outerProduct,
// localNorm2
// localInnerProduct
extern int GridCshiftPermuteMap[4][16];
////////////////////////////////////////////////
// Basic expressions used in Expression Template
////////////////////////////////////////////////
class LatticeBase
{
public:
virtual ~LatticeBase(void) = default;
GridBase *_grid;
};
class LatticeExpressionBase {};
template <typename Op, typename T1>
class LatticeUnaryExpression : public std::pair<Op,std::tuple<T1> > , public LatticeExpressionBase {
public:
LatticeUnaryExpression(const std::pair<Op,std::tuple<T1> > &arg): std::pair<Op,std::tuple<T1> >(arg) {};
};
template <typename Op, typename T1, typename T2>
class LatticeBinaryExpression : public std::pair<Op,std::tuple<T1,T2> > , public LatticeExpressionBase {
public:
LatticeBinaryExpression(const std::pair<Op,std::tuple<T1,T2> > &arg): std::pair<Op,std::tuple<T1,T2> >(arg) {};
};
template <typename Op, typename T1, typename T2, typename T3>
class LatticeTrinaryExpression :public std::pair<Op,std::tuple<T1,T2,T3> >, public LatticeExpressionBase {
public:
LatticeTrinaryExpression(const std::pair<Op,std::tuple<T1,T2,T3> > &arg): std::pair<Op,std::tuple<T1,T2,T3> >(arg) {};
};
void inline conformable(GridBase *lhs,GridBase *rhs)
{
assert(lhs == rhs);
}
template<class vobj>
class Lattice : public LatticeBase
{
public:
int checkerboard;
Vector<vobj> _odata;
// to pthread need a computable loop where loop induction is not required
int begin(void) { return 0;};
int end(void) { return _odata.size(); }
vobj & operator[](int i) { return _odata[i]; };
const vobj & operator[](int i) const { return _odata[i]; };
public:
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
typedef vobj vector_object;
////////////////////////////////////////////////////////////////////////////////
// Expression Template closure support
////////////////////////////////////////////////////////////////////////////////
template <typename Op, typename T1> strong_inline Lattice<vobj> & operator=(const LatticeUnaryExpression<Op,T1> &expr)
{
GridBase *egrid(nullptr);
GridFromExpression(egrid,expr);
assert(egrid!=nullptr);
conformable(_grid,egrid);
int cb=-1;
CBFromExpression(cb,expr);
assert( (cb==Odd) || (cb==Even));
checkerboard=cb;
parallel_for(int ss=0;ss<_grid->oSites();ss++){
#ifdef STREAMING_STORES
vobj tmp = eval(ss,expr);
vstream(_odata[ss] ,tmp);
#else
_odata[ss]=eval(ss,expr);
#endif
}
return *this;
}
template <typename Op, typename T1,typename T2> strong_inline Lattice<vobj> & operator=(const LatticeBinaryExpression<Op,T1,T2> &expr)
{
GridBase *egrid(nullptr);
GridFromExpression(egrid,expr);
assert(egrid!=nullptr);
conformable(_grid,egrid);
int cb=-1;
CBFromExpression(cb,expr);
assert( (cb==Odd) || (cb==Even));
checkerboard=cb;
parallel_for(int ss=0;ss<_grid->oSites();ss++){
#ifdef STREAMING_STORES
vobj tmp = eval(ss,expr);
vstream(_odata[ss] ,tmp);
#else
_odata[ss]=eval(ss,expr);
#endif
}
return *this;
}
template <typename Op, typename T1,typename T2,typename T3> strong_inline Lattice<vobj> & operator=(const LatticeTrinaryExpression<Op,T1,T2,T3> &expr)
{
GridBase *egrid(nullptr);
GridFromExpression(egrid,expr);
assert(egrid!=nullptr);
conformable(_grid,egrid);
int cb=-1;
CBFromExpression(cb,expr);
assert( (cb==Odd) || (cb==Even));
checkerboard=cb;
parallel_for(int ss=0;ss<_grid->oSites();ss++){
#ifdef STREAMING_STORES
//vobj tmp = eval(ss,expr);
vstream(_odata[ss] ,eval(ss,expr));
#else
_odata[ss] = eval(ss,expr);
#endif
}
return *this;
}
//GridFromExpression is tricky to do
template<class Op,class T1>
Lattice(const LatticeUnaryExpression<Op,T1> & expr) {
_grid = nullptr;
GridFromExpression(_grid,expr);
assert(_grid!=nullptr);
int cb=-1;
CBFromExpression(cb,expr);
assert( (cb==Odd) || (cb==Even));
checkerboard=cb;
_odata.resize(_grid->oSites());
parallel_for(int ss=0;ss<_grid->oSites();ss++){
#ifdef STREAMING_STORES
vobj tmp = eval(ss,expr);
vstream(_odata[ss] ,tmp);
#else
_odata[ss]=eval(ss,expr);
#endif
}
};
template<class Op,class T1, class T2>
Lattice(const LatticeBinaryExpression<Op,T1,T2> & expr) {
_grid = nullptr;
GridFromExpression(_grid,expr);
assert(_grid!=nullptr);
int cb=-1;
CBFromExpression(cb,expr);
assert( (cb==Odd) || (cb==Even));
checkerboard=cb;
_odata.resize(_grid->oSites());
parallel_for(int ss=0;ss<_grid->oSites();ss++){
#ifdef STREAMING_STORES
vobj tmp = eval(ss,expr);
vstream(_odata[ss] ,tmp);
#else
_odata[ss]=eval(ss,expr);
#endif
}
};
template<class Op,class T1, class T2, class T3>
Lattice(const LatticeTrinaryExpression<Op,T1,T2,T3> & expr) {
_grid = nullptr;
GridFromExpression(_grid,expr);
assert(_grid!=nullptr);
int cb=-1;
CBFromExpression(cb,expr);
assert( (cb==Odd) || (cb==Even));
checkerboard=cb;
_odata.resize(_grid->oSites());
parallel_for(int ss=0;ss<_grid->oSites();ss++){
vstream(_odata[ss] ,eval(ss,expr));
}
};
//////////////////////////////////////////////////////////////////
// Constructor requires "grid" passed.
// what about a default grid?
//////////////////////////////////////////////////////////////////
Lattice(GridBase *grid) : _odata(grid->oSites()) {
_grid = grid;
// _odata.reserve(_grid->oSites());
// _odata.resize(_grid->oSites());
// std::cout << "Constructing lattice object with Grid pointer "<<_grid<<std::endl;
assert((((uint64_t)&_odata[0])&0xF) ==0);
checkerboard=0;
}
Lattice(const Lattice& r){ // copy constructor
_grid = r._grid;
checkerboard = r.checkerboard;
_odata.resize(_grid->oSites());// essential
parallel_for(int ss=0;ss<_grid->oSites();ss++){
_odata[ss]=r._odata[ss];
}
}
Lattice(Lattice&& r){ // move constructor
_grid = r._grid;
checkerboard = r.checkerboard;
_odata=std::move(r._odata);
}
inline Lattice<vobj> & operator = (Lattice<vobj> && r)
{
_grid = r._grid;
checkerboard = r.checkerboard;
_odata =std::move(r._odata);
return *this;
}
inline Lattice<vobj> & operator = (const Lattice<vobj> & r){
_grid = r._grid;
checkerboard = r.checkerboard;
_odata.resize(_grid->oSites());// essential
parallel_for(int ss=0;ss<_grid->oSites();ss++){
_odata[ss]=r._odata[ss];
}
return *this;
}
template<class robj> strong_inline Lattice<vobj> & operator = (const Lattice<robj> & r){
this->checkerboard = r.checkerboard;
conformable(*this,r);
parallel_for(int ss=0;ss<_grid->oSites();ss++){
this->_odata[ss]=r._odata[ss];
}
return *this;
}
virtual ~Lattice(void) = default;
void reset(GridBase* grid) {
if (_grid != grid) {
_grid = grid;
_odata.resize(grid->oSites());
checkerboard = 0;
}
}
template<class sobj> strong_inline Lattice<vobj> & operator = (const sobj & r){
parallel_for(int ss=0;ss<_grid->oSites();ss++){
this->_odata[ss]=r;
}
return *this;
}
// *=,+=,-= operators inherit behvour from correspond */+/- operation
template<class T> strong_inline Lattice<vobj> &operator *=(const T &r) {
*this = (*this)*r;
return *this;
}
template<class T> strong_inline Lattice<vobj> &operator -=(const T &r) {
*this = (*this)-r;
return *this;
}
template<class T> strong_inline Lattice<vobj> &operator +=(const T &r) {
*this = (*this)+r;
return *this;
}
}; // class Lattice
template<class vobj> std::ostream& operator<< (std::ostream& stream, const Lattice<vobj> &o){
std::vector<int> gcoor;
typedef typename vobj::scalar_object sobj;
sobj ss;
for(int g=0;g<o._grid->_gsites;g++){
o._grid->GlobalIndexToGlobalCoor(g,gcoor);
peekSite(ss,o,gcoor);
stream<<"[";
for(int d=0;d<gcoor.size();d++){
stream<<gcoor[d];
if(d!=gcoor.size()-1) stream<<",";
}
stream<<"]\t";
stream<<ss<<std::endl;
}
return stream;
}
}
#include "Lattice_conformable.h"
#define GRID_LATTICE_EXPRESSION_TEMPLATES
#ifdef GRID_LATTICE_EXPRESSION_TEMPLATES
#include "Lattice_ET.h"
#else
#include "Lattice_overload.h"
#endif
#include "Lattice_arith.h"
#include "Lattice_trace.h"
#include "Lattice_transpose.h"
#include "Lattice_local.h"
#include "Lattice_reduction.h"
#include "Lattice_peekpoke.h"
#include "Lattice_reality.h"
#include "Lattice_comparison_utils.h"
#include "Lattice_comparison.h"
#include "Lattice_coordinate.h"
#include "Lattice_where.h"
#include "Lattice_rng.h"
#include "Lattice_unary.h"
#include "Lattice_transfer.h"
#endif
+169
View File
@@ -0,0 +1,169 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_comparison.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_COMPARISON_H
#define GRID_LATTICE_COMPARISON_H
namespace Grid {
//////////////////////////////////////////////////////////////////////////
// relational operators
//
// Support <,>,<=,>=,==,!=
//
//Query supporting bitwise &, |, ^, !
//Query supporting logical &&, ||,
//////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////
// compare lattice to lattice
//////////////////////////////////////////////////////////////////////////
template<class vfunctor,class lobj,class robj>
inline Lattice<vInteger> LLComparison(vfunctor op,const Lattice<lobj> &lhs,const Lattice<robj> &rhs)
{
Lattice<vInteger> ret(rhs._grid);
parallel_for(int ss=0;ss<rhs._grid->oSites(); ss++){
ret._odata[ss]=op(lhs._odata[ss],rhs._odata[ss]);
}
return ret;
}
//////////////////////////////////////////////////////////////////////////
// compare lattice to scalar
//////////////////////////////////////////////////////////////////////////
template<class vfunctor,class lobj,class robj>
inline Lattice<vInteger> LSComparison(vfunctor op,const Lattice<lobj> &lhs,const robj &rhs)
{
Lattice<vInteger> ret(lhs._grid);
parallel_for(int ss=0;ss<lhs._grid->oSites(); ss++){
ret._odata[ss]=op(lhs._odata[ss],rhs);
}
return ret;
}
//////////////////////////////////////////////////////////////////////////
// compare scalar to lattice
//////////////////////////////////////////////////////////////////////////
template<class vfunctor,class lobj,class robj>
inline Lattice<vInteger> SLComparison(vfunctor op,const lobj &lhs,const Lattice<robj> &rhs)
{
Lattice<vInteger> ret(rhs._grid);
parallel_for(int ss=0;ss<rhs._grid->oSites(); ss++){
ret._odata[ss]=op(lhs._odata[ss],rhs);
}
return ret;
}
//////////////////////////////////////////////////////////////////////////
// Map to functors
//////////////////////////////////////////////////////////////////////////
// Less than
template<class lobj,class robj>
inline Lattice<vInteger> operator < (const Lattice<lobj> & lhs, const Lattice<robj> & rhs) {
return LLComparison(vlt<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator < (const Lattice<lobj> & lhs, const robj & rhs) {
return LSComparison(vlt<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator < (const lobj & lhs, const Lattice<robj> & rhs) {
return SLComparison(vlt<lobj,robj>(),lhs,rhs);
}
// Less than equal
template<class lobj,class robj>
inline Lattice<vInteger> operator <= (const Lattice<lobj> & lhs, const Lattice<robj> & rhs) {
return LLComparison(vle<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator <= (const Lattice<lobj> & lhs, const robj & rhs) {
return LSComparison(vle<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator <= (const lobj & lhs, const Lattice<robj> & rhs) {
return SLComparison(vle<lobj,robj>(),lhs,rhs);
}
// Greater than
template<class lobj,class robj>
inline Lattice<vInteger> operator > (const Lattice<lobj> & lhs, const Lattice<robj> & rhs) {
return LLComparison(vgt<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator > (const Lattice<lobj> & lhs, const robj & rhs) {
return LSComparison(vgt<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator > (const lobj & lhs, const Lattice<robj> & rhs) {
return SLComparison(vgt<lobj,robj>(),lhs,rhs);
}
// Greater than equal
template<class lobj,class robj>
inline Lattice<vInteger> operator >= (const Lattice<lobj> & lhs, const Lattice<robj> & rhs) {
return LLComparison(vge<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator >= (const Lattice<lobj> & lhs, const robj & rhs) {
return LSComparison(vge<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator >= (const lobj & lhs, const Lattice<robj> & rhs) {
return SLComparison(vge<lobj,robj>(),lhs,rhs);
}
// equal
template<class lobj,class robj>
inline Lattice<vInteger> operator == (const Lattice<lobj> & lhs, const Lattice<robj> & rhs) {
return LLComparison(veq<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator == (const Lattice<lobj> & lhs, const robj & rhs) {
return LSComparison(veq<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator == (const lobj & lhs, const Lattice<robj> & rhs) {
return SLComparison(veq<lobj,robj>(),lhs,rhs);
}
// not equal
template<class lobj,class robj>
inline Lattice<vInteger> operator != (const Lattice<lobj> & lhs, const Lattice<robj> & rhs) {
return LLComparison(vne<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator != (const Lattice<lobj> & lhs, const robj & rhs) {
return LSComparison(vne<lobj,robj>(),lhs,rhs);
}
template<class lobj,class robj>
inline Lattice<vInteger> operator != (const lobj & lhs, const Lattice<robj> & rhs) {
return SLComparison(vne<lobj,robj>(),lhs,rhs);
}
}
#endif
+232
View File
@@ -0,0 +1,232 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_comparison_utils.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_COMPARISON_H
#define GRID_COMPARISON_H
namespace Grid {
/////////////////////////////////////////
// This implementation is a bit poor.
//
// Only support relational logical operations (<, > etc)
// on scalar objects. Therefore can strip any tensor structures.
//
// Should guard this with isGridTensor<> enable if?
/////////////////////////////////////////
//
// Generic list of functors
//
template<class lobj,class robj> class veq {
public:
vInteger operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) == (rhs);
}
};
template<class lobj,class robj> class vne {
public:
vInteger operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) != (rhs);
}
};
template<class lobj,class robj> class vlt {
public:
vInteger operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) < (rhs);
}
};
template<class lobj,class robj> class vle {
public:
vInteger operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) <= (rhs);
}
};
template<class lobj,class robj> class vgt {
public:
vInteger operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) > (rhs);
}
};
template<class lobj,class robj> class vge {
public:
vInteger operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) >= (rhs);
}
};
// Generic list of functors
template<class lobj,class robj> class seq {
public:
Integer operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) == (rhs);
}
};
template<class lobj,class robj> class sne {
public:
Integer operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) != (rhs);
}
};
template<class lobj,class robj> class slt {
public:
Integer operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) < (rhs);
}
};
template<class lobj,class robj> class sle {
public:
Integer operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) <= (rhs);
}
};
template<class lobj,class robj> class sgt {
public:
Integer operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) > (rhs);
}
};
template<class lobj,class robj> class sge {
public:
Integer operator()(const lobj &lhs, const robj &rhs)
{
return (lhs) >= (rhs);
}
};
//////////////////////////////////////////////////////////////////////////////////////////////////////
// Integer and real get extra relational functions.
//////////////////////////////////////////////////////////////////////////////////////////////////////
template<class sfunctor, class vsimd,IfNotComplex<vsimd> = 0>
inline vInteger Comparison(sfunctor sop,const vsimd & lhs, const vsimd & rhs)
{
typedef typename vsimd::scalar_type scalar;
std::vector<scalar> vlhs(vsimd::Nsimd()); // Use functors to reduce this to single implementation
std::vector<scalar> vrhs(vsimd::Nsimd());
std::vector<Integer> vpred(vsimd::Nsimd());
vInteger ret;
extract<vsimd,scalar>(lhs,vlhs);
extract<vsimd,scalar>(rhs,vrhs);
for(int s=0;s<vsimd::Nsimd();s++){
vpred[s] = sop(vlhs[s],vrhs[s]);
}
merge<vInteger,Integer>(ret,vpred);
return ret;
}
template<class sfunctor, class vsimd,IfNotComplex<vsimd> = 0>
inline vInteger Comparison(sfunctor sop,const vsimd & lhs, const typename vsimd::scalar_type & rhs)
{
typedef typename vsimd::scalar_type scalar;
std::vector<scalar> vlhs(vsimd::Nsimd()); // Use functors to reduce this to single implementation
std::vector<Integer> vpred(vsimd::Nsimd());
vInteger ret;
extract<vsimd,scalar>(lhs,vlhs);
for(int s=0;s<vsimd::Nsimd();s++){
vpred[s] = sop(vlhs[s],rhs);
}
merge<vInteger,Integer>(ret,vpred);
return ret;
}
template<class sfunctor, class vsimd,IfNotComplex<vsimd> = 0>
inline vInteger Comparison(sfunctor sop,const typename vsimd::scalar_type & lhs, const vsimd & rhs)
{
typedef typename vsimd::scalar_type scalar;
std::vector<scalar> vrhs(vsimd::Nsimd()); // Use functors to reduce this to single implementation
std::vector<Integer> vpred(vsimd::Nsimd());
vInteger ret;
extract<vsimd,scalar>(rhs,vrhs);
for(int s=0;s<vsimd::Nsimd();s++){
vpred[s] = sop(lhs,vrhs[s]);
}
merge<vInteger,Integer>(ret,vpred);
return ret;
}
#define DECLARE_RELATIONAL_EQ(op,functor) \
template<class vsimd,IfSimd<vsimd> = 0>\
inline vInteger operator op (const vsimd & lhs, const vsimd & rhs)\
{\
typedef typename vsimd::scalar_type scalar;\
return Comparison(functor<scalar,scalar>(),lhs,rhs);\
}\
template<class vsimd,IfSimd<vsimd> = 0>\
inline vInteger operator op (const vsimd & lhs, const typename vsimd::scalar_type & rhs) \
{\
typedef typename vsimd::scalar_type scalar;\
return Comparison(functor<scalar,scalar>(),lhs,rhs);\
}\
template<class vsimd,IfSimd<vsimd> = 0>\
inline vInteger operator op (const typename vsimd::scalar_type & lhs, const vsimd & rhs) \
{\
typedef typename vsimd::scalar_type scalar;\
return Comparison(functor<scalar,scalar>(),lhs,rhs);\
}\
template<class vsimd>\
inline vInteger operator op(const iScalar<vsimd> &lhs,const typename vsimd::scalar_type &rhs) \
{ \
return lhs._internal op rhs; \
} \
template<class vsimd>\
inline vInteger operator op(const typename vsimd::scalar_type &lhs,const iScalar<vsimd> &rhs) \
{ \
return lhs op rhs._internal; \
} \
#define DECLARE_RELATIONAL(op,functor) \
DECLARE_RELATIONAL_EQ(op,functor) \
template<class vsimd>\
inline vInteger operator op(const iScalar<vsimd> &lhs,const iScalar<vsimd> &rhs)\
{ \
return lhs._internal op rhs._internal; \
}
DECLARE_RELATIONAL(<,slt);
DECLARE_RELATIONAL(<=,sle);
DECLARE_RELATIONAL(>,sgt);
DECLARE_RELATIONAL(>=,sge);
DECLARE_RELATIONAL_EQ(==,seq);
DECLARE_RELATIONAL(!=,sne);
#undef DECLARE_RELATIONAL
}
#endif
+40
View File
@@ -0,0 +1,40 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_conformable.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_CONFORMABLE_H
#define GRID_LATTICE_CONFORMABLE_H
namespace Grid {
template<class obj1,class obj2> void conformable(const Lattice<obj1> &lhs,const Lattice<obj2> &rhs)
{
assert(lhs._grid == rhs._grid);
assert(lhs.checkerboard == rhs.checkerboard);
}
}
#endif
+56
View File
@@ -0,0 +1,56 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_coordinate.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_COORDINATE_H
#define GRID_LATTICE_COORDINATE_H
namespace Grid {
template<class iobj> inline void LatticeCoordinate(Lattice<iobj> &l,int mu)
{
typedef typename iobj::scalar_type scalar_type;
typedef typename iobj::vector_type vector_type;
GridBase *grid = l._grid;
int Nsimd = grid->iSites();
std::vector<int> gcoor;
std::vector<scalar_type> mergebuf(Nsimd);
vector_type vI;
for(int o=0;o<grid->oSites();o++){
for(int i=0;i<grid->iSites();i++){
grid->RankIndexToGlobalCoor(grid->ThisRank(),o,i,gcoor);
mergebuf[i]=(Integer)gcoor[mu];
}
merge<vector_type,scalar_type>(vI,mergebuf);
l._odata[o]=vI;
}
};
}
#endif
+75
View File
@@ -0,0 +1,75 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_local.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_LOCALREDUCTION_H
#define GRID_LATTICE_LOCALREDUCTION_H
///////////////////////////////////////////////
// localInner, localNorm, outerProduct
///////////////////////////////////////////////
namespace Grid {
/////////////////////////////////////////////////////
// Non site, reduced locally reduced routines
/////////////////////////////////////////////////////
// localNorm2,
template<class vobj>
inline auto localNorm2 (const Lattice<vobj> &rhs)-> Lattice<typename vobj::tensor_reduced>
{
Lattice<typename vobj::tensor_reduced> ret(rhs._grid);
parallel_for(int ss=0;ss<rhs._grid->oSites(); ss++){
ret._odata[ss]=innerProduct(rhs._odata[ss],rhs._odata[ss]);
}
return ret;
}
// localInnerProduct
template<class vobj>
inline auto localInnerProduct (const Lattice<vobj> &lhs,const Lattice<vobj> &rhs) -> Lattice<typename vobj::tensor_reduced>
{
Lattice<typename vobj::tensor_reduced> ret(rhs._grid);
parallel_for(int ss=0;ss<rhs._grid->oSites(); ss++){
ret._odata[ss]=innerProduct(lhs._odata[ss],rhs._odata[ss]);
}
return ret;
}
// outerProduct Scalar x Scalar -> Scalar
// Vector x Vector -> Matrix
template<class ll,class rr>
inline auto outerProduct (const Lattice<ll> &lhs,const Lattice<rr> &rhs) -> Lattice<decltype(outerProduct(lhs._odata[0],rhs._odata[0]))>
{
Lattice<decltype(outerProduct(lhs._odata[0],rhs._odata[0]))> ret(rhs._grid);
parallel_for(int ss=0;ss<rhs._grid->oSites(); ss++){
ret._odata[ss]=outerProduct(lhs._odata[ss],rhs._odata[ss]);
}
return ret;
}
}
#endif
+138
View File
@@ -0,0 +1,138 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_overload.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_OVERLOAD_H
#define GRID_LATTICE_OVERLOAD_H
namespace Grid {
//////////////////////////////////////////////////////////////////////////////////////////////////////
// unary negation
//////////////////////////////////////////////////////////////////////////////////////////////////////
template<class vobj>
inline Lattice<vobj> operator -(const Lattice<vobj> &r)
{
Lattice<vobj> ret(r._grid);
parallel_for(int ss=0;ss<r._grid->oSites();ss++){
vstream(ret._odata[ss], -r._odata[ss]);
}
return ret;
}
/////////////////////////////////////////////////////////////////////////////////////
// Lattice BinOp Lattice,
//NB mult performs conformable check. Do not reapply here for performance.
/////////////////////////////////////////////////////////////////////////////////////
template<class left,class right>
inline auto operator * (const Lattice<left> &lhs,const Lattice<right> &rhs)-> Lattice<decltype(lhs._odata[0]*rhs._odata[0])>
{
Lattice<decltype(lhs._odata[0]*rhs._odata[0])> ret(rhs._grid);
mult(ret,lhs,rhs);
return ret;
}
template<class left,class right>
inline auto operator + (const Lattice<left> &lhs,const Lattice<right> &rhs)-> Lattice<decltype(lhs._odata[0]+rhs._odata[0])>
{
Lattice<decltype(lhs._odata[0]+rhs._odata[0])> ret(rhs._grid);
add(ret,lhs,rhs);
return ret;
}
template<class left,class right>
inline auto operator - (const Lattice<left> &lhs,const Lattice<right> &rhs)-> Lattice<decltype(lhs._odata[0]-rhs._odata[0])>
{
Lattice<decltype(lhs._odata[0]-rhs._odata[0])> ret(rhs._grid);
sub(ret,lhs,rhs);
return ret;
}
// Scalar BinOp Lattice ;generate return type
template<class left,class right>
inline auto operator * (const left &lhs,const Lattice<right> &rhs) -> Lattice<decltype(lhs*rhs._odata[0])>
{
Lattice<decltype(lhs*rhs._odata[0])> ret(rhs._grid);
parallel_for(int ss=0;ss<rhs._grid->oSites(); ss++){
decltype(lhs*rhs._odata[0]) tmp=lhs*rhs._odata[ss];
vstream(ret._odata[ss],tmp);
// ret._odata[ss]=lhs*rhs._odata[ss];
}
return ret;
}
template<class left,class right>
inline auto operator + (const left &lhs,const Lattice<right> &rhs) -> Lattice<decltype(lhs+rhs._odata[0])>
{
Lattice<decltype(lhs+rhs._odata[0])> ret(rhs._grid);
parallel_for(int ss=0;ss<rhs._grid->oSites(); ss++){
decltype(lhs+rhs._odata[0]) tmp =lhs-rhs._odata[ss];
vstream(ret._odata[ss],tmp);
// ret._odata[ss]=lhs+rhs._odata[ss];
}
return ret;
}
template<class left,class right>
inline auto operator - (const left &lhs,const Lattice<right> &rhs) -> Lattice<decltype(lhs-rhs._odata[0])>
{
Lattice<decltype(lhs-rhs._odata[0])> ret(rhs._grid);
parallel_for(int ss=0;ss<rhs._grid->oSites(); ss++){
decltype(lhs-rhs._odata[0]) tmp=lhs-rhs._odata[ss];
vstream(ret._odata[ss],tmp);
}
return ret;
}
template<class left,class right>
inline auto operator * (const Lattice<left> &lhs,const right &rhs) -> Lattice<decltype(lhs._odata[0]*rhs)>
{
Lattice<decltype(lhs._odata[0]*rhs)> ret(lhs._grid);
parallel_for(int ss=0;ss<lhs._grid->oSites(); ss++){
decltype(lhs._odata[0]*rhs) tmp =lhs._odata[ss]*rhs;
vstream(ret._odata[ss],tmp);
// ret._odata[ss]=lhs._odata[ss]*rhs;
}
return ret;
}
template<class left,class right>
inline auto operator + (const Lattice<left> &lhs,const right &rhs) -> Lattice<decltype(lhs._odata[0]+rhs)>
{
Lattice<decltype(lhs._odata[0]+rhs)> ret(lhs._grid);
parallel_for(int ss=0;ss<rhs._grid->oSites(); ss++){
decltype(lhs._odata[0]+rhs) tmp=lhs._odata[ss]+rhs;
vstream(ret._odata[ss],tmp);
// ret._odata[ss]=lhs._odata[ss]+rhs;
}
return ret;
}
template<class left,class right>
inline auto operator - (const Lattice<left> &lhs,const right &rhs) -> Lattice<decltype(lhs._odata[0]-rhs)>
{
Lattice<decltype(lhs._odata[0]-rhs)> ret(lhs._grid);
parallel_for(int ss=0;ss<rhs._grid->oSites(); ss++){
decltype(lhs._odata[0]-rhs) tmp=lhs._odata[ss]-rhs;
vstream(ret._odata[ss],tmp);
// ret._odata[ss]=lhs._odata[ss]-rhs;
}
return ret;
}
}
#endif
+205
View File
@@ -0,0 +1,205 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_peekpoke.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: Peter Boyle <peterboyle@Peters-MacBook-Pro-2.local>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_PEEK_H
#define GRID_LATTICE_PEEK_H
///////////////////////////////////////////////
// Peeking and poking around
///////////////////////////////////////////////
namespace Grid {
////////////////////////////////////////////////////////////////////////////////////////////////////
// Peek internal indices of a Lattice object
////////////////////////////////////////////////////////////////////////////////////////////////////
template<int Index,class vobj>
auto PeekIndex(const Lattice<vobj> &lhs,int i) -> Lattice<decltype(peekIndex<Index>(lhs._odata[0],i))>
{
Lattice<decltype(peekIndex<Index>(lhs._odata[0],i))> ret(lhs._grid);
ret.checkerboard=lhs.checkerboard;
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
ret._odata[ss] = peekIndex<Index>(lhs._odata[ss],i);
}
return ret;
};
template<int Index,class vobj>
auto PeekIndex(const Lattice<vobj> &lhs,int i,int j) -> Lattice<decltype(peekIndex<Index>(lhs._odata[0],i,j))>
{
Lattice<decltype(peekIndex<Index>(lhs._odata[0],i,j))> ret(lhs._grid);
ret.checkerboard=lhs.checkerboard;
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
ret._odata[ss] = peekIndex<Index>(lhs._odata[ss],i,j);
}
return ret;
};
////////////////////////////////////////////////////////////////////////////////////////////////////
// Poke internal indices of a Lattice object
////////////////////////////////////////////////////////////////////////////////////////////////////
template<int Index,class vobj>
void PokeIndex(Lattice<vobj> &lhs,const Lattice<decltype(peekIndex<Index>(lhs._odata[0],0))> & rhs,int i)
{
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
pokeIndex<Index>(lhs._odata[ss],rhs._odata[ss],i);
}
}
template<int Index,class vobj>
void PokeIndex(Lattice<vobj> &lhs,const Lattice<decltype(peekIndex<Index>(lhs._odata[0],0,0))> & rhs,int i,int j)
{
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
pokeIndex<Index>(lhs._odata[ss],rhs._odata[ss],i,j);
}
}
//////////////////////////////////////////////////////
// Poke a scalar object into the SIMD array
//////////////////////////////////////////////////////
template<class vobj,class sobj>
void pokeSite(const sobj &s,Lattice<vobj> &l,const std::vector<int> &site){
GridBase *grid=l._grid;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
int Nsimd = grid->Nsimd();
assert( l.checkerboard== l._grid->CheckerBoard(site));
assert( sizeof(sobj)*Nsimd == sizeof(vobj));
int rank,odx,idx;
// Optional to broadcast from node 0.
grid->GlobalCoorToRankIndex(rank,odx,idx,site);
grid->Broadcast(grid->BossRank(),s);
std::vector<sobj> buf(Nsimd);
// extract-modify-merge cycle is easiest way and this is not perf critical
if ( rank == grid->ThisRank() ) {
extract(l._odata[odx],buf);
buf[idx] = s;
merge(l._odata[odx],buf);
}
return;
};
//////////////////////////////////////////////////////////
// Peek a scalar object from the SIMD array
//////////////////////////////////////////////////////////
template<class vobj,class sobj>
void peekSite(sobj &s,const Lattice<vobj> &l,const std::vector<int> &site){
GridBase *grid=l._grid;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
int Nsimd = grid->Nsimd();
assert( l.checkerboard == l._grid->CheckerBoard(site));
int rank,odx,idx;
grid->GlobalCoorToRankIndex(rank,odx,idx,site);
std::vector<sobj> buf(Nsimd);
extract(l._odata[odx],buf);
s = buf[idx];
grid->Broadcast(rank,s);
return;
};
//////////////////////////////////////////////////////////
// Peek a scalar object from the SIMD array
//////////////////////////////////////////////////////////
template<class vobj,class sobj>
void peekLocalSite(sobj &s,const Lattice<vobj> &l,std::vector<int> &site){
GridBase *grid = l._grid;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
int Nsimd = grid->Nsimd();
assert( l.checkerboard== l._grid->CheckerBoard(site));
assert( sizeof(sobj)*Nsimd == sizeof(vobj));
static const int words=sizeof(vobj)/sizeof(vector_type);
int odx,idx;
idx= grid->iIndex(site);
odx= grid->oIndex(site);
scalar_type * vp = (scalar_type *)&l._odata[odx];
scalar_type * pt = (scalar_type *)&s;
for(int w=0;w<words;w++){
pt[w] = vp[idx+w*Nsimd];
}
return;
};
template<class vobj,class sobj>
void pokeLocalSite(const sobj &s,Lattice<vobj> &l,std::vector<int> &site){
GridBase *grid=l._grid;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
int Nsimd = grid->Nsimd();
assert( l.checkerboard== l._grid->CheckerBoard(site));
assert( sizeof(sobj)*Nsimd == sizeof(vobj));
static const int words=sizeof(vobj)/sizeof(vector_type);
int odx,idx;
idx= grid->iIndex(site);
odx= grid->oIndex(site);
scalar_type * vp = (scalar_type *)&l._odata[odx];
scalar_type * pt = (scalar_type *)&s;
for(int w=0;w<words;w++){
vp[idx+w*Nsimd] = pt[w];
}
return;
};
}
#endif
+57
View File
@@ -0,0 +1,57 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_reality.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: neo <cossu@post.kek.jp>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_REALITY_H
#define GRID_LATTICE_REALITY_H
// FIXME .. this is the sector of the code
// I am most worried about the directions
// The choice of burying complex in the SIMD
// is making the use of "real" and "imag" very cumbersome
namespace Grid {
template<class vobj> inline Lattice<vobj> adj(const Lattice<vobj> &lhs){
Lattice<vobj> ret(lhs._grid);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
ret._odata[ss] = adj(lhs._odata[ss]);
}
return ret;
};
template<class vobj> inline Lattice<vobj> conjugate(const Lattice<vobj> &lhs){
Lattice<vobj> ret(lhs._grid);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
ret._odata[ss] = conjugate(lhs._odata[ss]);
}
return ret;
};
}
#endif
+733
View File
@@ -0,0 +1,733 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_reduction.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_REDUCTION_H
#define GRID_LATTICE_REDUCTION_H
#include <Grid/Grid_Eigen_Dense.h>
namespace Grid {
#ifdef GRID_WARN_SUBOPTIMAL
#warning "Optimisation alert all these reduction loops are NOT threaded "
#endif
////////////////////////////////////////////////////////////////////////////////////////////////////
// Deterministic Reduction operations
////////////////////////////////////////////////////////////////////////////////////////////////////
template<class vobj> inline RealD norm2(const Lattice<vobj> &arg){
auto nrm = innerProduct(arg,arg);
return std::real(nrm);
}
// Double inner product
template<class vobj>
inline ComplexD innerProduct(const Lattice<vobj> &left,const Lattice<vobj> &right)
{
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_typeD vector_type;
GridBase *grid = left._grid;
const int pad = 8;
ComplexD inner;
Vector<ComplexD> sumarray(grid->SumArraySize()*pad);
parallel_for(int thr=0;thr<grid->SumArraySize();thr++){
int nwork, mywork, myoff;
GridThread::GetWork(left._grid->oSites(),thr,mywork,myoff);
decltype(innerProductD(left._odata[0],right._odata[0])) vinner=zero; // private to thread; sub summation
for(int ss=myoff;ss<mywork+myoff; ss++){
vinner = vinner + innerProductD(left._odata[ss],right._odata[ss]);
}
// All threads sum across SIMD; reduce serial work at end
// one write per cacheline with streaming store
ComplexD tmp = Reduce(TensorRemove(vinner)) ;
vstream(sumarray[thr*pad],tmp);
}
inner=0.0;
for(int i=0;i<grid->SumArraySize();i++){
inner = inner+sumarray[i*pad];
}
right._grid->GlobalSum(inner);
return inner;
}
/////////////////////////
// Fast axpby_norm
// z = a x + b y
// return norm z
/////////////////////////
template<class sobj,class vobj> strong_inline RealD
axpy_norm_fast(Lattice<vobj> &z,sobj a,const Lattice<vobj> &x,const Lattice<vobj> &y)
{
sobj one(1.0);
return axpby_norm_fast(z,a,one,x,y);
}
template<class sobj,class vobj> strong_inline RealD
axpby_norm_fast(Lattice<vobj> &z,sobj a,sobj b,const Lattice<vobj> &x,const Lattice<vobj> &y)
{
const int pad = 8;
z.checkerboard = x.checkerboard;
conformable(z,x);
conformable(x,y);
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_typeD vector_type;
RealD nrm;
GridBase *grid = x._grid;
Vector<RealD> sumarray(grid->SumArraySize()*pad);
parallel_for(int thr=0;thr<grid->SumArraySize();thr++){
int nwork, mywork, myoff;
GridThread::GetWork(x._grid->oSites(),thr,mywork,myoff);
// private to thread; sub summation
decltype(innerProductD(z._odata[0],z._odata[0])) vnrm=zero;
for(int ss=myoff;ss<mywork+myoff; ss++){
vobj tmp = a*x._odata[ss]+b*y._odata[ss];
vnrm = vnrm + innerProductD(tmp,tmp);
vstream(z._odata[ss],tmp);
}
vstream(sumarray[thr*pad],real(Reduce(TensorRemove(vnrm)))) ;
}
nrm = 0.0; // sum across threads; linear in thread count but fast
for(int i=0;i<grid->SumArraySize();i++){
nrm = nrm+sumarray[i*pad];
}
z._grid->GlobalSum(nrm);
return nrm;
}
template<class Op,class T1>
inline auto sum(const LatticeUnaryExpression<Op,T1> & expr)
->typename decltype(expr.first.func(eval(0,std::get<0>(expr.second))))::scalar_object
{
return sum(closure(expr));
}
template<class Op,class T1,class T2>
inline auto sum(const LatticeBinaryExpression<Op,T1,T2> & expr)
->typename decltype(expr.first.func(eval(0,std::get<0>(expr.second)),eval(0,std::get<1>(expr.second))))::scalar_object
{
return sum(closure(expr));
}
template<class Op,class T1,class T2,class T3>
inline auto sum(const LatticeTrinaryExpression<Op,T1,T2,T3> & expr)
->typename decltype(expr.first.func(eval(0,std::get<0>(expr.second)),
eval(0,std::get<1>(expr.second)),
eval(0,std::get<2>(expr.second))
))::scalar_object
{
return sum(closure(expr));
}
template<class vobj>
inline typename vobj::scalar_object sum(const Lattice<vobj> &arg)
{
GridBase *grid=arg._grid;
int Nsimd = grid->Nsimd();
std::vector<vobj,alignedAllocator<vobj> > sumarray(grid->SumArraySize());
for(int i=0;i<grid->SumArraySize();i++){
sumarray[i]=zero;
}
parallel_for(int thr=0;thr<grid->SumArraySize();thr++){
int nwork, mywork, myoff;
GridThread::GetWork(grid->oSites(),thr,mywork,myoff);
vobj vvsum=zero;
for(int ss=myoff;ss<mywork+myoff; ss++){
vvsum = vvsum + arg._odata[ss];
}
sumarray[thr]=vvsum;
}
vobj vsum=zero; // sum across threads
for(int i=0;i<grid->SumArraySize();i++){
vsum = vsum+sumarray[i];
}
typedef typename vobj::scalar_object sobj;
sobj ssum=zero;
std::vector<sobj> buf(Nsimd);
extract(vsum,buf);
for(int i=0;i<Nsimd;i++) ssum = ssum + buf[i];
arg._grid->GlobalSum(ssum);
return ssum;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
// sliceSum, sliceInnerProduct, sliceAxpy, sliceNorm etc...
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
template<class vobj> inline void sliceSum(const Lattice<vobj> &Data,std::vector<typename vobj::scalar_object> &result,int orthogdim)
{
///////////////////////////////////////////////////////
// FIXME precision promoted summation
// may be important for correlation functions
// But easily avoided by using double precision fields
///////////////////////////////////////////////////////
typedef typename vobj::scalar_object sobj;
GridBase *grid = Data._grid;
assert(grid!=NULL);
const int Nd = grid->_ndimension;
const int Nsimd = grid->Nsimd();
assert(orthogdim >= 0);
assert(orthogdim < Nd);
int fd=grid->_fdimensions[orthogdim];
int ld=grid->_ldimensions[orthogdim];
int rd=grid->_rdimensions[orthogdim];
std::vector<vobj,alignedAllocator<vobj> > lvSum(rd); // will locally sum vectors first
std::vector<sobj> lsSum(ld,zero); // sum across these down to scalars
std::vector<sobj> extracted(Nsimd); // splitting the SIMD
result.resize(fd); // And then global sum to return the same vector to every node
for(int r=0;r<rd;r++){
lvSum[r]=zero;
}
int e1= grid->_slice_nblock[orthogdim];
int e2= grid->_slice_block [orthogdim];
int stride=grid->_slice_stride[orthogdim];
// sum over reduced dimension planes, breaking out orthog dir
// Parallel over orthog direction
parallel_for(int r=0;r<rd;r++){
int so=r*grid->_ostride[orthogdim]; // base offset for start of plane
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int ss= so+n*stride+b;
lvSum[r]=lvSum[r]+Data._odata[ss];
}
}
}
// Sum across simd lanes in the plane, breaking out orthog dir.
std::vector<int> icoor(Nd);
for(int rt=0;rt<rd;rt++){
extract(lvSum[rt],extracted);
for(int idx=0;idx<Nsimd;idx++){
grid->iCoorFromIindex(icoor,idx);
int ldx =rt+icoor[orthogdim]*rd;
lsSum[ldx]=lsSum[ldx]+extracted[idx];
}
}
// sum over nodes.
sobj gsum;
for(int t=0;t<fd;t++){
int pt = t/ld; // processor plane
int lt = t%ld;
if ( pt == grid->_processor_coor[orthogdim] ) {
gsum=lsSum[lt];
} else {
gsum=zero;
}
grid->GlobalSum(gsum);
result[t]=gsum;
}
}
template<class vobj>
static void mySliceInnerProductVector( std::vector<ComplexD> & result, const Lattice<vobj> &lhs,const Lattice<vobj> &rhs,int orthogdim)
{
// std::cout << GridLogMessage << "Start mySliceInnerProductVector" << std::endl;
typedef typename vobj::scalar_type scalar_type;
std::vector<scalar_type> lsSum;
localSliceInnerProductVector(result, lhs, rhs, lsSum, orthogdim);
globalSliceInnerProductVector(result, lhs, lsSum, orthogdim);
// std::cout << GridLogMessage << "End mySliceInnerProductVector" << std::endl;
}
template <class vobj>
static void localSliceInnerProductVector(std::vector<ComplexD> &result, const Lattice<vobj> &lhs, const Lattice<vobj> &rhs, std::vector<typename vobj::scalar_type> &lsSum, int orthogdim)
{
// std::cout << GridLogMessage << "Start prep" << std::endl;
typedef typename vobj::vector_type vector_type;
typedef typename vobj::scalar_type scalar_type;
GridBase *grid = lhs._grid;
assert(grid!=NULL);
conformable(grid,rhs._grid);
const int Nd = grid->_ndimension;
const int Nsimd = grid->Nsimd();
assert(orthogdim >= 0);
assert(orthogdim < Nd);
int fd=grid->_fdimensions[orthogdim];
int ld=grid->_ldimensions[orthogdim];
int rd=grid->_rdimensions[orthogdim];
// std::cout << GridLogMessage << "Start alloc" << std::endl;
std::vector<vector_type,alignedAllocator<vector_type> > lvSum(rd); // will locally sum vectors first
lsSum.resize(ld,scalar_type(0.0)); // sum across these down to scalars
std::vector<iScalar<scalar_type>> extracted(Nsimd); // splitting the SIMD
// std::cout << GridLogMessage << "End alloc" << std::endl;
result.resize(fd); // And then global sum to return the same vector to every node for IO to file
for(int r=0;r<rd;r++){
lvSum[r]=zero;
}
int e1= grid->_slice_nblock[orthogdim];
int e2= grid->_slice_block [orthogdim];
int stride=grid->_slice_stride[orthogdim];
// std::cout << GridLogMessage << "End prep" << std::endl;
// std::cout << GridLogMessage << "Start parallel inner product, _rd = " << rd << std::endl;
vector_type vv;
parallel_for(int r=0;r<rd;r++)
{
int so=r*grid->_ostride[orthogdim]; // base offset for start of plane
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int ss = so + n * stride + b;
vv = TensorRemove(innerProduct(lhs._odata[ss], rhs._odata[ss]));
lvSum[r] = lvSum[r] + vv;
}
}
}
// std::cout << GridLogMessage << "End parallel inner product" << std::endl;
// Sum across simd lanes in the plane, breaking out orthog dir.
std::vector<int> icoor(Nd);
for(int rt=0;rt<rd;rt++){
iScalar<vector_type> temp;
temp._internal = lvSum[rt];
extract(temp,extracted);
for(int idx=0;idx<Nsimd;idx++){
grid->iCoorFromIindex(icoor,idx);
int ldx =rt+icoor[orthogdim]*rd;
lsSum[ldx]=lsSum[ldx]+extracted[idx]._internal;
}
}
// std::cout << GridLogMessage << "End sum over simd lanes" << std::endl;
}
template <class vobj>
static void globalSliceInnerProductVector(std::vector<ComplexD> &result, const Lattice<vobj> &lhs, std::vector<typename vobj::scalar_type> &lsSum, int orthogdim)
{
typedef typename vobj::scalar_type scalar_type;
GridBase *grid = lhs._grid;
int fd = result.size();
int ld = lsSum.size();
// sum over nodes.
std::vector<scalar_type> gsum;
gsum.resize(fd, scalar_type(0.0));
// std::cout << GridLogMessage << "Start of gsum[t] creation:" << std::endl;
for(int t=0;t<fd;t++){
int pt = t/ld; // processor plane
int lt = t%ld;
if ( pt == grid->_processor_coor[orthogdim] ) {
gsum[t]=lsSum[lt];
}
}
// std::cout << GridLogMessage << "End of gsum[t] creation:" << std::endl;
// std::cout << GridLogMessage << "Start of GlobalSumVector:" << std::endl;
grid->GlobalSumVector(&gsum[0], fd);
// std::cout << GridLogMessage << "End of GlobalSumVector:" << std::endl;
result = gsum;
}
template<class vobj>
static void sliceInnerProductVector( std::vector<ComplexD> & result, const Lattice<vobj> &lhs,const Lattice<vobj> &rhs,int orthogdim)
{
typedef typename vobj::vector_type vector_type;
typedef typename vobj::scalar_type scalar_type;
GridBase *grid = lhs._grid;
assert(grid!=NULL);
conformable(grid,rhs._grid);
const int Nd = grid->_ndimension;
const int Nsimd = grid->Nsimd();
assert(orthogdim >= 0);
assert(orthogdim < Nd);
int fd=grid->_fdimensions[orthogdim];
int ld=grid->_ldimensions[orthogdim];
int rd=grid->_rdimensions[orthogdim];
std::vector<vector_type,alignedAllocator<vector_type> > lvSum(rd); // will locally sum vectors first
std::vector<scalar_type > lsSum(ld,scalar_type(0.0)); // sum across these down to scalars
std::vector<iScalar<scalar_type> > extracted(Nsimd); // splitting the SIMD
result.resize(fd); // And then global sum to return the same vector to every node for IO to file
for(int r=0;r<rd;r++){
lvSum[r]=zero;
}
int e1= grid->_slice_nblock[orthogdim];
int e2= grid->_slice_block [orthogdim];
int stride=grid->_slice_stride[orthogdim];
parallel_for(int r=0;r<rd;r++){
int so=r*grid->_ostride[orthogdim]; // base offset for start of plane
for(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int ss= so+n*stride+b;
vector_type vv = TensorRemove(innerProduct(lhs._odata[ss],rhs._odata[ss]));
lvSum[r]=lvSum[r]+vv;
}
}
}
// Sum across simd lanes in the plane, breaking out orthog dir.
std::vector<int> icoor(Nd);
for(int rt=0;rt<rd;rt++){
iScalar<vector_type> temp;
temp._internal = lvSum[rt];
extract(temp,extracted);
for(int idx=0;idx<Nsimd;idx++){
grid->iCoorFromIindex(icoor,idx);
int ldx =rt+icoor[orthogdim]*rd;
lsSum[ldx]=lsSum[ldx]+extracted[idx]._internal;
}
}
// sum over nodes.
scalar_type gsum;
for(int t=0;t<fd;t++){
int pt = t/ld; // processor plane
int lt = t%ld;
if ( pt == grid->_processor_coor[orthogdim] ) {
gsum=lsSum[lt];
} else {
gsum=scalar_type(0.0);
}
grid->GlobalSum(gsum);
result[t]=gsum;
}
}
template<class vobj>
static void sliceNorm (std::vector<RealD> &sn,const Lattice<vobj> &rhs,int Orthog)
{
typedef typename vobj::scalar_object sobj;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
int Nblock = rhs._grid->GlobalDimensions()[Orthog];
std::vector<ComplexD> ip(Nblock);
sn.resize(Nblock);
sliceInnerProductVector(ip,rhs,rhs,Orthog);
for(int ss=0;ss<Nblock;ss++){
sn[ss] = real(ip[ss]);
}
};
template<class vobj>
static void sliceMaddVector(Lattice<vobj> &R,std::vector<RealD> &a,const Lattice<vobj> &X,const Lattice<vobj> &Y,
int orthogdim,RealD scale=1.0)
{
typedef typename vobj::scalar_object sobj;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
typedef typename vobj::tensor_reduced tensor_reduced;
scalar_type zscale(scale);
GridBase *grid = X._grid;
int Nsimd =grid->Nsimd();
int Nblock =grid->GlobalDimensions()[orthogdim];
int fd =grid->_fdimensions[orthogdim];
int ld =grid->_ldimensions[orthogdim];
int rd =grid->_rdimensions[orthogdim];
int e1 =grid->_slice_nblock[orthogdim];
int e2 =grid->_slice_block [orthogdim];
int stride =grid->_slice_stride[orthogdim];
std::vector<int> icoor;
for(int r=0;r<rd;r++){
int so=r*grid->_ostride[orthogdim]; // base offset for start of plane
vector_type av;
for(int l=0;l<Nsimd;l++){
grid->iCoorFromIindex(icoor,l);
int ldx =r+icoor[orthogdim]*rd;
scalar_type *as =(scalar_type *)&av;
as[l] = scalar_type(a[ldx])*zscale;
}
tensor_reduced at; at=av;
parallel_for_nest2(int n=0;n<e1;n++){
for(int b=0;b<e2;b++){
int ss= so+n*stride+b;
R._odata[ss] = at*X._odata[ss]+Y._odata[ss];
}
}
}
};
/*
inline GridBase *makeSubSliceGrid(const GridBase *BlockSolverGrid,int Orthog)
{
int NN = BlockSolverGrid->_ndimension;
int nsimd = BlockSolverGrid->Nsimd();
std::vector<int> latt_phys(0);
std::vector<int> simd_phys(0);
std::vector<int> mpi_phys(0);
for(int d=0;d<NN;d++){
if( d!=Orthog ) {
latt_phys.push_back(BlockSolverGrid->_fdimensions[d]);
simd_phys.push_back(BlockSolverGrid->_simd_layout[d]);
mpi_phys.push_back(BlockSolverGrid->_processors[d]);
}
}
return (GridBase *)new GridCartesian(latt_phys,simd_phys,mpi_phys);
}
*/
template<class vobj>
static void sliceMaddMatrix (Lattice<vobj> &R,Eigen::MatrixXcd &aa,const Lattice<vobj> &X,const Lattice<vobj> &Y,int Orthog,RealD scale=1.0)
{
typedef typename vobj::scalar_object sobj;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
int Nblock = X._grid->GlobalDimensions()[Orthog];
GridBase *FullGrid = X._grid;
// GridBase *SliceGrid = makeSubSliceGrid(FullGrid,Orthog);
// Lattice<vobj> Xslice(SliceGrid);
// Lattice<vobj> Rslice(SliceGrid);
assert( FullGrid->_simd_layout[Orthog]==1);
int nh = FullGrid->_ndimension;
// int nl = SliceGrid->_ndimension;
int nl = nh-1;
//FIXME package in a convenient iterator
//Should loop over a plane orthogonal to direction "Orthog"
int stride=FullGrid->_slice_stride[Orthog];
int block =FullGrid->_slice_block [Orthog];
int nblock=FullGrid->_slice_nblock[Orthog];
int ostride=FullGrid->_ostride[Orthog];
#pragma omp parallel
{
std::vector<vobj> s_x(Nblock);
#pragma omp for collapse(2)
for(int n=0;n<nblock;n++){
for(int b=0;b<block;b++){
int o = n*stride + b;
for(int i=0;i<Nblock;i++){
s_x[i] = X[o+i*ostride];
}
vobj dot;
for(int i=0;i<Nblock;i++){
dot = Y[o+i*ostride];
for(int j=0;j<Nblock;j++){
dot = dot + s_x[j]*(scale*aa(j,i));
}
R[o+i*ostride]=dot;
}
}}
}
};
template<class vobj>
static void sliceMulMatrix (Lattice<vobj> &R,Eigen::MatrixXcd &aa,const Lattice<vobj> &X,int Orthog,RealD scale=1.0)
{
typedef typename vobj::scalar_object sobj;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
int Nblock = X._grid->GlobalDimensions()[Orthog];
GridBase *FullGrid = X._grid;
// GridBase *SliceGrid = makeSubSliceGrid(FullGrid,Orthog);
// Lattice<vobj> Xslice(SliceGrid);
// Lattice<vobj> Rslice(SliceGrid);
assert( FullGrid->_simd_layout[Orthog]==1);
int nh = FullGrid->_ndimension;
// int nl = SliceGrid->_ndimension;
int nl=1;
//FIXME package in a convenient iterator
//Should loop over a plane orthogonal to direction "Orthog"
int stride=FullGrid->_slice_stride[Orthog];
int block =FullGrid->_slice_block [Orthog];
int nblock=FullGrid->_slice_nblock[Orthog];
int ostride=FullGrid->_ostride[Orthog];
#pragma omp parallel
{
std::vector<vobj> s_x(Nblock);
#pragma omp for collapse(2)
for(int n=0;n<nblock;n++){
for(int b=0;b<block;b++){
int o = n*stride + b;
for(int i=0;i<Nblock;i++){
s_x[i] = X[o+i*ostride];
}
vobj dot;
for(int i=0;i<Nblock;i++){
dot = s_x[0]*(scale*aa(0,i));
for(int j=1;j<Nblock;j++){
dot = dot + s_x[j]*(scale*aa(j,i));
}
R[o+i*ostride]=dot;
}
}}
}
};
template<class vobj>
static void sliceInnerProductMatrix( Eigen::MatrixXcd &mat, const Lattice<vobj> &lhs,const Lattice<vobj> &rhs,int Orthog)
{
typedef typename vobj::scalar_object sobj;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
GridBase *FullGrid = lhs._grid;
// GridBase *SliceGrid = makeSubSliceGrid(FullGrid,Orthog);
int Nblock = FullGrid->GlobalDimensions()[Orthog];
// Lattice<vobj> Lslice(SliceGrid);
// Lattice<vobj> Rslice(SliceGrid);
mat = Eigen::MatrixXcd::Zero(Nblock,Nblock);
assert( FullGrid->_simd_layout[Orthog]==1);
int nh = FullGrid->_ndimension;
// int nl = SliceGrid->_ndimension;
int nl = nh-1;
//FIXME package in a convenient iterator
//Should loop over a plane orthogonal to direction "Orthog"
int stride=FullGrid->_slice_stride[Orthog];
int block =FullGrid->_slice_block [Orthog];
int nblock=FullGrid->_slice_nblock[Orthog];
int ostride=FullGrid->_ostride[Orthog];
typedef typename vobj::vector_typeD vector_typeD;
#pragma omp parallel
{
std::vector<vobj> Left(Nblock);
std::vector<vobj> Right(Nblock);
Eigen::MatrixXcd mat_thread = Eigen::MatrixXcd::Zero(Nblock,Nblock);
#pragma omp for collapse(2)
for(int n=0;n<nblock;n++){
for(int b=0;b<block;b++){
int o = n*stride + b;
for(int i=0;i<Nblock;i++){
Left [i] = lhs[o+i*ostride];
Right[i] = rhs[o+i*ostride];
}
for(int i=0;i<Nblock;i++){
for(int j=0;j<Nblock;j++){
auto tmp = innerProduct(Left[i],Right[j]);
auto rtmp = TensorRemove(tmp);
mat_thread(i,j) += Reduce(rtmp);
}}
}}
#pragma omp critical
{
mat += mat_thread;
}
}
for(int i=0;i<Nblock;i++){
for(int j=0;j<Nblock;j++){
ComplexD sum = mat(i,j);
FullGrid->GlobalSum(sum);
mat(i,j)=sum;
}}
return;
}
} /*END NAMESPACE GRID*/
#endif
+516
View File
@@ -0,0 +1,516 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_rng.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: Guido Cossu <guido.cossu@ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_RNG_H
#define GRID_LATTICE_RNG_H
#include <random>
#ifdef RNG_SITMO
#include <Grid/sitmo_rng/sitmo_prng_engine.hpp>
#endif
#if defined(RNG_SITMO)
#define RNG_FAST_DISCARD
#else
#undef RNG_FAST_DISCARD
#endif
namespace Grid {
//////////////////////////////////////////////////////////////
// Allow the RNG state to be less dense than the fine grid
//////////////////////////////////////////////////////////////
inline int RNGfillable(GridBase *coarse,GridBase *fine)
{
int rngdims = coarse->_ndimension;
// trivially extended in higher dims, with locality guaranteeing RNG state is local to node
int lowerdims = fine->_ndimension - coarse->_ndimension;
assert(lowerdims >= 0);
for(int d=0;d<lowerdims;d++){
assert(fine->_simd_layout[d]==1);
assert(fine->_processors[d]==1);
}
int multiplicity=1;
for(int d=0;d<lowerdims;d++){
multiplicity=multiplicity*fine->_rdimensions[d];
}
// local and global volumes subdivide cleanly after SIMDization
for(int d=0;d<rngdims;d++){
int fd= d+lowerdims;
assert(coarse->_processors[d] == fine->_processors[fd]);
assert(coarse->_simd_layout[d] == fine->_simd_layout[fd]);
assert(((fine->_rdimensions[fd] / coarse->_rdimensions[d])* coarse->_rdimensions[d])==fine->_rdimensions[fd]);
multiplicity = multiplicity *fine->_rdimensions[fd] / coarse->_rdimensions[d];
}
return multiplicity;
}
// merge of April 11 2017
// this function is necessary for the LS vectorised field
inline int RNGfillable_general(GridBase *coarse,GridBase *fine)
{
int rngdims = coarse->_ndimension;
// trivially extended in higher dims, with locality guaranteeing RNG state is local to node
int lowerdims = fine->_ndimension - coarse->_ndimension; assert(lowerdims >= 0);
// assumes that the higher dimensions are not using more processors
// all further divisions are local
for(int d=0;d<lowerdims;d++) assert(fine->_processors[d]==1);
for(int d=0;d<rngdims;d++) assert(coarse->_processors[d] == fine->_processors[d+lowerdims]);
// then divide the number of local sites
// check that the total number of sims agree, meanse the iSites are the same
assert(fine->Nsimd() == coarse->Nsimd());
// check that the two grids divide cleanly
assert( (fine->lSites() / coarse->lSites() ) * coarse->lSites() == fine->lSites() );
return fine->lSites() / coarse->lSites();
}
// real scalars are one component
template<class scalar,class distribution,class generator>
void fillScalar(scalar &s,distribution &dist,generator & gen)
{
s=dist(gen);
}
template<class distribution,class generator>
void fillScalar(ComplexF &s,distribution &dist, generator &gen)
{
s=ComplexF(dist(gen),dist(gen));
}
template<class distribution,class generator>
void fillScalar(ComplexD &s,distribution &dist,generator &gen)
{
s=ComplexD(dist(gen),dist(gen));
}
class GridRNGbase {
public:
// One generator per site.
// Uniform and Gaussian distributions from these generators.
#ifdef RNG_RANLUX
typedef std::ranlux48 RngEngine;
typedef uint64_t RngStateType;
static const int RngStateCount = 15;
#endif
#ifdef RNG_MT19937
typedef std::mt19937 RngEngine;
typedef uint32_t RngStateType;
static const int RngStateCount = std::mt19937::state_size;
#endif
#ifdef RNG_SITMO
typedef sitmo::prng_engine RngEngine;
typedef uint64_t RngStateType;
static const int RngStateCount = 13;
#endif
std::vector<RngEngine> _generators;
std::vector<std::uniform_real_distribution<RealD> > _uniform;
std::vector<std::normal_distribution<RealD> > _gaussian;
std::vector<std::discrete_distribution<int32_t> > _bernoulli;
std::vector<std::uniform_int_distribution<uint32_t> > _uid;
///////////////////////
// support for parallel init
///////////////////////
#ifdef RNG_FAST_DISCARD
static void Skip(RngEngine &eng,uint64_t site)
{
/////////////////////////////////////////////////////////////////////////////////////
// Skip by 2^40 elements between successive lattice sites
// This goes by 10^12.
// Consider quenched updating; likely never exceeding rate of 1000 sweeps
// per second on any machine. This gives us of order 10^9 seconds, or 100 years
// skip ahead.
// For HMC unlikely to go at faster than a solve per second, and
// tens of seconds per trajectory so this is clean in all reasonable cases,
// and margin of safety is orders of magnitude.
// We could hack Sitmo to skip in the higher order words of state if necessary
//
// Replace with 2^30 ; avoid problem on large volumes
//
/////////////////////////////////////////////////////////////////////////////////////
// uint64_t skip = site+1; // Old init Skipped then drew. Checked compat with faster init
const int shift = 30;
uint64_t skip = site;
skip = skip<<shift;
assert((skip >> shift)==site); // check for overflow
eng.discard(skip);
// std::cout << " Engine " <<site << " state " <<eng<<std::endl;
}
#endif
static RngEngine Reseed(RngEngine &eng)
{
std::vector<uint32_t> newseed;
std::uniform_int_distribution<uint32_t> uid;
return Reseed(eng,newseed,uid);
}
static RngEngine Reseed(RngEngine &eng,std::vector<uint32_t> & newseed,
std::uniform_int_distribution<uint32_t> &uid)
{
const int reseeds=4;
newseed.resize(reseeds);
for(int i=0;i<reseeds;i++){
newseed[i] = uid(eng);
}
std::seed_seq sseq(newseed.begin(),newseed.end());
return RngEngine(sseq);
}
void GetState(std::vector<RngStateType> & saved,RngEngine &eng) {
saved.resize(RngStateCount);
std::stringstream ss;
ss<<eng;
ss.seekg(0,ss.beg);
for(int i=0;i<RngStateCount;i++){
ss>>saved[i];
}
}
void GetState(std::vector<RngStateType> & saved,int gen) {
GetState(saved,_generators[gen]);
}
void SetState(std::vector<RngStateType> & saved,RngEngine &eng){
assert(saved.size()==RngStateCount);
std::stringstream ss;
for(int i=0;i<RngStateCount;i++){
ss<< saved[i]<<" ";
}
ss.seekg(0,ss.beg);
ss>>eng;
}
void SetState(std::vector<RngStateType> & saved,int gen){
SetState(saved,_generators[gen]);
}
void SetEngine(RngEngine &Eng, int gen){
_generators[gen]=Eng;
}
void GetEngine(RngEngine &Eng, int gen){
Eng=_generators[gen];
}
template<class source> void Seed(source &src, int gen)
{
_generators[gen] = RngEngine(src);
}
};
class GridSerialRNG : public GridRNGbase {
public:
GridSerialRNG() : GridRNGbase() {
_generators.resize(1);
_uniform.resize(1,std::uniform_real_distribution<RealD>{0,1});
_gaussian.resize(1,std::normal_distribution<RealD>(0.0,1.0) );
_bernoulli.resize(1,std::discrete_distribution<int32_t>{1,1});
_uid.resize(1,std::uniform_int_distribution<uint32_t>() );
}
template <class sobj,class distribution> inline void fill(sobj &l,std::vector<distribution> &dist){
typedef typename sobj::scalar_type scalar_type;
int words = sizeof(sobj)/sizeof(scalar_type);
scalar_type *buf = (scalar_type *) & l;
dist[0].reset();
for(int idx=0;idx<words;idx++){
fillScalar(buf[idx],dist[0],_generators[0]);
}
CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l));
};
template <class distribution> inline void fill(ComplexF &l,std::vector<distribution> &dist){
dist[0].reset();
fillScalar(l,dist[0],_generators[0]);
CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l));
}
template <class distribution> inline void fill(ComplexD &l,std::vector<distribution> &dist){
dist[0].reset();
fillScalar(l,dist[0],_generators[0]);
CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l));
}
template <class distribution> inline void fill(RealF &l,std::vector<distribution> &dist){
dist[0].reset();
fillScalar(l,dist[0],_generators[0]);
CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l));
}
template <class distribution> inline void fill(RealD &l,std::vector<distribution> &dist){
dist[0].reset();
fillScalar(l,dist[0],_generators[0]);
CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l));
}
// vector fill
template <class distribution> inline void fill(vComplexF &l,std::vector<distribution> &dist){
RealF *pointer=(RealF *)&l;
dist[0].reset();
for(int i=0;i<2*vComplexF::Nsimd();i++){
fillScalar(pointer[i],dist[0],_generators[0]);
}
CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l));
}
template <class distribution> inline void fill(vComplexD &l,std::vector<distribution> &dist){
RealD *pointer=(RealD *)&l;
dist[0].reset();
for(int i=0;i<2*vComplexD::Nsimd();i++){
fillScalar(pointer[i],dist[0],_generators[0]);
}
CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l));
}
template <class distribution> inline void fill(vRealF &l,std::vector<distribution> &dist){
RealF *pointer=(RealF *)&l;
dist[0].reset();
for(int i=0;i<vRealF::Nsimd();i++){
fillScalar(pointer[i],dist[0],_generators[0]);
}
CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l));
}
template <class distribution> inline void fill(vRealD &l,std::vector<distribution> &dist){
RealD *pointer=(RealD *)&l;
dist[0].reset();
for(int i=0;i<vRealD::Nsimd();i++){
fillScalar(pointer[i],dist[0],_generators[0]);
}
CartesianCommunicator::BroadcastWorld(0,(void *)&l,sizeof(l));
}
void SeedFixedIntegers(const std::vector<int> &seeds){
CartesianCommunicator::BroadcastWorld(0,(void *)&seeds[0],sizeof(int)*seeds.size());
std::seed_seq src(seeds.begin(),seeds.end());
Seed(src,0);
}
void SeedUniqueString(const std::string &s){
std::vector<int> seeds;
std::stringstream sha;
seeds = GridChecksum::sha256_seeds(s);
for(int i=0;i<seeds.size();i++) {
sha << std::hex << seeds[i];
}
std::cout << GridLogMessage << "Intialising serial RNG with unique string '"
<< s << "'" << std::endl;
std::cout << GridLogMessage << "Seed SHA256: " << sha.str() << std::endl;
SeedFixedIntegers(seeds);
}
};
class GridParallelRNG : public GridRNGbase {
double _time_counter;
public:
GridBase *_grid;
unsigned int _vol;
int generator_idx(int os,int is) {
return is*_grid->oSites()+os;
}
GridParallelRNG(GridBase *grid) : GridRNGbase() {
_grid = grid;
_vol =_grid->iSites()*_grid->oSites();
_generators.resize(_vol);
_uniform.resize(_vol,std::uniform_real_distribution<RealD>{0,1});
_gaussian.resize(_vol,std::normal_distribution<RealD>(0.0,1.0) );
_bernoulli.resize(_vol,std::discrete_distribution<int32_t>{1,1});
_uid.resize(_vol,std::uniform_int_distribution<uint32_t>() );
}
template <class vobj,class distribution> inline void fill(Lattice<vobj> &l,std::vector<distribution> &dist){
typedef typename vobj::scalar_object scalar_object;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
double inner_time_counter = usecond();
int multiplicity = RNGfillable_general(_grid, l._grid); // l has finer or same grid
int Nsimd = _grid->Nsimd(); // guaranteed to be the same for l._grid too
int osites = _grid->oSites(); // guaranteed to be <= l._grid->oSites() by a factor multiplicity
int words = sizeof(scalar_object) / sizeof(scalar_type);
parallel_for(int ss=0;ss<osites;ss++){
std::vector<scalar_object> buf(Nsimd);
for (int m = 0; m < multiplicity; m++) { // Draw from same generator multiplicity times
int sm = multiplicity * ss + m; // Maps the generator site to the fine site
for (int si = 0; si < Nsimd; si++) {
int gdx = generator_idx(ss, si); // index of generator state
scalar_type *pointer = (scalar_type *)&buf[si];
dist[gdx].reset();
for (int idx = 0; idx < words; idx++)
fillScalar(pointer[idx], dist[gdx], _generators[gdx]);
}
// merge into SIMD lanes, FIXME suboptimal implementation
merge(l._odata[sm], buf);
}
}
_time_counter += usecond()- inner_time_counter;
};
void SeedUniqueString(const std::string &s){
std::vector<int> seeds;
seeds = GridChecksum::sha256_seeds(s);
std::cout << GridLogMessage << "Intialising parallel RNG with unique string '"
<< s << "'" << std::endl;
std::cout << GridLogMessage << "Seed SHA256: " << GridChecksum::sha256_string(seeds) << std::endl;
SeedFixedIntegers(seeds);
}
void SeedFixedIntegers(const std::vector<int> &seeds){
// Everyone generates the same seed_seq based on input seeds
CartesianCommunicator::BroadcastWorld(0,(void *)&seeds[0],sizeof(int)*seeds.size());
std::seed_seq source(seeds.begin(),seeds.end());
RngEngine master_engine(source);
#ifdef RNG_FAST_DISCARD
////////////////////////////////////////////////
// Skip ahead through a single stream.
// Applicable to SITMO and other has based/crypto RNGs
// Should be applicable to Mersenne Twister, but the C++11
// MT implementation does not implement fast discard even though
// in principle this is possible
////////////////////////////////////////////////
// Everybody loops over global volume.
parallel_for(int gidx=0;gidx<_grid->_gsites;gidx++){
// Where is it?
int rank,o_idx,i_idx;
std::vector<int> gcoor;
_grid->GlobalIndexToGlobalCoor(gidx,gcoor);
_grid->GlobalCoorToRankIndex(rank,o_idx,i_idx,gcoor);
// If this is one of mine we take it
if( rank == _grid->ThisRank() ){
int l_idx=generator_idx(o_idx,i_idx);
_generators[l_idx] = master_engine;
Skip(_generators[l_idx],gidx); // Skip to next RNG sequence
}
}
#else
////////////////////////////////////////////////////////////////
// Machine and thread decomposition dependent seeding is efficient
// and maximally parallel; but NOT reproducible from machine to machine.
// Not ideal, but fastest way to reseed all nodes.
////////////////////////////////////////////////////////////////
{
// Obtain one Reseed per processor
int Nproc = _grid->ProcessorCount();
std::vector<RngEngine> seeders(Nproc);
int me= _grid->ThisRank();
for(int p=0;p<Nproc;p++){
seeders[p] = Reseed(master_engine);
}
master_engine = seeders[me];
}
{
// Obtain one reseeded generator per thread
int Nthread = GridThread::GetThreads();
std::vector<RngEngine> seeders(Nthread);
for(int t=0;t<Nthread;t++){
seeders[t] = Reseed(master_engine);
}
parallel_for(int t=0;t<Nthread;t++) {
// set up one per local site in threaded fashion
std::vector<uint32_t> newseeds;
std::uniform_int_distribution<uint32_t> uid;
for(int l=0;l<_grid->lSites();l++) {
if ( (l%Nthread)==t ) {
_generators[l] = Reseed(seeders[t],newseeds,uid);
}
}
}
}
#endif
}
void Report(){
std::cout << GridLogMessage << "Time spent in the fill() routine by GridParallelRNG: "<< _time_counter/1e3 << " ms" << std::endl;
}
////////////////////////////////////////////////////////////////////////
// Support for rigorous test of RNG's
// Return uniform random uint32_t from requested site generator
////////////////////////////////////////////////////////////////////////
uint32_t GlobalU01(int gsite){
uint32_t the_number;
// who
std::vector<int> gcoor;
int rank,o_idx,i_idx;
_grid->GlobalIndexToGlobalCoor(gsite,gcoor);
_grid->GlobalCoorToRankIndex(rank,o_idx,i_idx,gcoor);
// draw
int l_idx=generator_idx(o_idx,i_idx);
if( rank == _grid->ThisRank() ){
the_number = _uid[l_idx](_generators[l_idx]);
}
// share & return
_grid->Broadcast(rank,(void *)&the_number,sizeof(the_number));
return the_number;
}
};
template <class vobj> inline void random(GridParallelRNG &rng,Lattice<vobj> &l) { rng.fill(l,rng._uniform); }
template <class vobj> inline void gaussian(GridParallelRNG &rng,Lattice<vobj> &l) { rng.fill(l,rng._gaussian); }
template <class vobj> inline void bernoulli(GridParallelRNG &rng,Lattice<vobj> &l){ rng.fill(l,rng._bernoulli);}
template <class sobj> inline void random(GridSerialRNG &rng,sobj &l) { rng.fill(l,rng._uniform ); }
template <class sobj> inline void gaussian(GridSerialRNG &rng,sobj &l) { rng.fill(l,rng._gaussian ); }
template <class sobj> inline void bernoulli(GridSerialRNG &rng,sobj &l){ rng.fill(l,rng._bernoulli); }
}
#endif
+67
View File
@@ -0,0 +1,67 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_trace.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_TRACE_H
#define GRID_LATTICE_TRACE_H
///////////////////////////////////////////////
// Tracing, transposing, peeking, poking
///////////////////////////////////////////////
namespace Grid {
////////////////////////////////////////////////////////////////////////////////////////////////////
// Trace
////////////////////////////////////////////////////////////////////////////////////////////////////
template<class vobj>
inline auto trace(const Lattice<vobj> &lhs)
-> Lattice<decltype(trace(lhs._odata[0]))>
{
Lattice<decltype(trace(lhs._odata[0]))> ret(lhs._grid);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
ret._odata[ss] = trace(lhs._odata[ss]);
}
return ret;
};
////////////////////////////////////////////////////////////////////////////////////////////////////
// Trace Index level dependent operation
////////////////////////////////////////////////////////////////////////////////////////////////////
template<int Index,class vobj>
inline auto TraceIndex(const Lattice<vobj> &lhs) -> Lattice<decltype(traceIndex<Index>(lhs._odata[0]))>
{
Lattice<decltype(traceIndex<Index>(lhs._odata[0]))> ret(lhs._grid);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
ret._odata[ss] = traceIndex<Index>(lhs._odata[ss]);
}
return ret;
};
}
#endif
File diff suppressed because it is too large Load Diff
+63
View File
@@ -0,0 +1,63 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_transpose.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_TRANSPOSE_H
#define GRID_LATTICE_TRANSPOSE_H
///////////////////////////////////////////////
// Transpose
///////////////////////////////////////////////
namespace Grid {
////////////////////////////////////////////////////////////////////////////////////////////////////
// Transpose
////////////////////////////////////////////////////////////////////////////////////////////////////
template<class vobj>
inline Lattice<vobj> transpose(const Lattice<vobj> &lhs){
Lattice<vobj> ret(lhs._grid);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
ret._odata[ss] = transpose(lhs._odata[ss]);
}
return ret;
};
////////////////////////////////////////////////////////////////////////////////////////////////////
// Index level dependent transpose
////////////////////////////////////////////////////////////////////////////////////////////////////
template<int Index,class vobj>
inline auto TransposeIndex(const Lattice<vobj> &lhs) -> Lattice<decltype(transposeIndex<Index>(lhs._odata[0]))>
{
Lattice<decltype(transposeIndex<Index>(lhs._odata[0]))> ret(lhs._grid);
parallel_for(int ss=0;ss<lhs._grid->oSites();ss++){
ret._odata[ss] = transposeIndex<Index>(lhs._odata[ss]);
}
return ret;
};
}
#endif
+84
View File
@@ -0,0 +1,84 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_unary.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: neo <cossu@post.kek.jp>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_UNARY_H
#define GRID_LATTICE_UNARY_H
namespace Grid {
template<class obj> Lattice<obj> pow(const Lattice<obj> &rhs,RealD y){
Lattice<obj> ret(rhs._grid);
ret.checkerboard = rhs.checkerboard;
conformable(ret,rhs);
parallel_for(int ss=0;ss<rhs._grid->oSites();ss++){
ret._odata[ss]=pow(rhs._odata[ss],y);
}
return ret;
}
template<class obj> Lattice<obj> mod(const Lattice<obj> &rhs,Integer y){
Lattice<obj> ret(rhs._grid);
ret.checkerboard = rhs.checkerboard;
conformable(ret,rhs);
parallel_for(int ss=0;ss<rhs._grid->oSites();ss++){
ret._odata[ss]=mod(rhs._odata[ss],y);
}
return ret;
}
template<class obj> Lattice<obj> div(const Lattice<obj> &rhs,Integer y){
Lattice<obj> ret(rhs._grid);
ret.checkerboard = rhs.checkerboard;
conformable(ret,rhs);
parallel_for(int ss=0;ss<rhs._grid->oSites();ss++){
ret._odata[ss]=div(rhs._odata[ss],y);
}
return ret;
}
template<class obj> Lattice<obj> expMat(const Lattice<obj> &rhs, RealD alpha, Integer Nexp = DEFAULT_MAT_EXP){
Lattice<obj> ret(rhs._grid);
ret.checkerboard = rhs.checkerboard;
conformable(ret,rhs);
parallel_for(int ss=0;ss<rhs._grid->oSites();ss++){
ret._odata[ss]=Exponentiate(rhs._odata[ss],alpha, Nexp);
}
return ret;
}
}
#endif
+86
View File
@@ -0,0 +1,86 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/lattice/Lattice_where.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: Peter Boyle <peterboyle@Peters-MacBook-Pro-2.local>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_LATTICE_WHERE_H
#define GRID_LATTICE_WHERE_H
namespace Grid {
// Must implement the predicate gating the
// Must be able to reduce the predicate down to a single vInteger per site.
// Must be able to require the type be iScalar x iScalar x ....
// give a GetVtype method in iScalar
// and blow away the tensor structures.
//
template<class vobj,class iobj>
inline void whereWolf(Lattice<vobj> &ret,const Lattice<iobj> &predicate,Lattice<vobj> &iftrue,Lattice<vobj> &iffalse)
{
conformable(iftrue,iffalse);
conformable(iftrue,predicate);
conformable(iftrue,ret);
GridBase *grid=iftrue._grid;
typedef typename vobj::scalar_object scalar_object;
typedef typename vobj::scalar_type scalar_type;
typedef typename vobj::vector_type vector_type;
typedef typename iobj::vector_type mask_type;
const int Nsimd = grid->Nsimd();
std::vector<Integer> mask(Nsimd);
std::vector<scalar_object> truevals (Nsimd);
std::vector<scalar_object> falsevals(Nsimd);
parallel_for(int ss=0;ss<iftrue._grid->oSites(); ss++){
extract(iftrue._odata[ss] ,truevals);
extract(iffalse._odata[ss] ,falsevals);
extract<vInteger,Integer>(TensorRemove(predicate._odata[ss]),mask);
for(int s=0;s<Nsimd;s++){
if (mask[s]) falsevals[s]=truevals[s];
}
merge(ret._odata[ss],falsevals);
}
}
template<class vobj,class iobj>
inline Lattice<vobj> whereWolf(const Lattice<iobj> &predicate,Lattice<vobj> &iftrue,Lattice<vobj> &iffalse)
{
conformable(iftrue,iffalse);
conformable(iftrue,predicate);
Lattice<vobj> ret(iftrue._grid);
where(ret,predicate,iftrue,iffalse);
return ret;
}
}
#endif
+116
View File
@@ -0,0 +1,116 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Log.cc
Copyright (C) 2015
Author: Antonin Portelli <antonin.portelli@me.com>
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/GridCore.h>
#include <Grid/util/CompilerCompatible.h>
#include <cxxabi.h>
#include <memory>
namespace Grid {
std::string demangle(const char* name) {
int status = -4; // some arbitrary value to eliminate the compiler warning
// enable c++11 by passing the flag -std=c++11 to g++
std::unique_ptr<char, void(*)(void*)> res {
abi::__cxa_demangle(name, NULL, NULL, &status),
std::free
};
return (status==0) ? res.get() : name ;
}
GridStopWatch Logger::GlobalStopWatch;
int Logger::timestamp;
std::ostream Logger::devnull(0);
void GridLogTimestamp(int on){
Logger::Timestamp(on);
}
Colours GridLogColours(0);
GridLogger GridLogIRL (1, "IRL" , GridLogColours, "NORMAL");
GridLogger GridLogSolver (1, "Solver", GridLogColours, "NORMAL");
GridLogger GridLogError (1, "Error" , GridLogColours, "RED");
GridLogger GridLogWarning(1, "Warning", GridLogColours, "YELLOW");
GridLogger GridLogMessage(1, "Message", GridLogColours, "NORMAL");
GridLogger GridLogDebug (1, "Debug", GridLogColours, "PURPLE");
GridLogger GridLogPerformance(1, "Performance", GridLogColours, "GREEN");
GridLogger GridLogIterative (1, "Iterative", GridLogColours, "BLUE");
GridLogger GridLogIntegrator (1, "Integrator", GridLogColours, "BLUE");
void GridLogConfigure(std::vector<std::string> &logstreams) {
GridLogError.Active(0);
GridLogWarning.Active(0);
GridLogMessage.Active(1); // at least the messages should be always on
GridLogIterative.Active(0);
GridLogDebug.Active(0);
GridLogPerformance.Active(0);
GridLogIntegrator.Active(0);
GridLogColours.Active(0);
for (int i = 0; i < logstreams.size(); i++) {
if (logstreams[i] == std::string("Error")) GridLogError.Active(1);
if (logstreams[i] == std::string("Warning")) GridLogWarning.Active(1);
if (logstreams[i] == std::string("NoMessage")) GridLogMessage.Active(0);
if (logstreams[i] == std::string("Iterative")) GridLogIterative.Active(1);
if (logstreams[i] == std::string("Debug")) GridLogDebug.Active(1);
if (logstreams[i] == std::string("Performance"))
GridLogPerformance.Active(1);
if (logstreams[i] == std::string("Integrator")) GridLogIntegrator.Active(1);
if (logstreams[i] == std::string("Colours")) GridLogColours.Active(1);
}
}
////////////////////////////////////////////////////////////
// Verbose limiter on MPI tasks
////////////////////////////////////////////////////////////
void Grid_quiesce_nodes(void) {
int me = 0;
#if defined(GRID_COMMS_MPI) || defined(GRID_COMMS_MPI3) || defined(GRID_COMMS_MPIT)
MPI_Comm_rank(MPI_COMM_WORLD, &me);
#endif
#ifdef GRID_COMMS_SHMEM
me = shmem_my_pe();
#endif
if (me) {
std::cout.setstate(std::ios::badbit);
}
}
void Grid_unquiesce_nodes(void) {
#ifdef GRID_COMMS_MPI
std::cout.clear();
#endif
}
}
+216
View File
@@ -0,0 +1,216 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Log.h
Copyright (C) 2015
Author: Antonin Portelli <antonin.portelli@me.com>
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <map>
#ifndef GRID_LOG_H
#define GRID_LOG_H
#ifdef HAVE_EXECINFO_H
#include <execinfo.h>
#endif
namespace Grid {
//////////////////////////////////////////////////////////////////////////////////////////////////
// Dress the output; use std::chrono for time stamping via the StopWatch class
//////////////////////////////////////////////////////////////////////////////////////////////////
class Colours{
protected:
bool is_active;
public:
std::map<std::string, std::string> colour;
Colours(bool activate=false){
Active(activate);
};
void Active(bool activate){
is_active=activate;
if (is_active){
colour["BLACK"] ="\033[30m";
colour["RED"] ="\033[31m";
colour["GREEN"] ="\033[32m";
colour["YELLOW"] ="\033[33m";
colour["BLUE"] ="\033[34m";
colour["PURPLE"] ="\033[35m";
colour["CYAN"] ="\033[36m";
colour["WHITE"] ="\033[37m";
colour["NORMAL"] ="\033[0;39m";
} else {
colour["BLACK"] ="";
colour["RED"] ="";
colour["GREEN"] ="";
colour["YELLOW"]="";
colour["BLUE"] ="";
colour["PURPLE"]="";
colour["CYAN"] ="";
colour["WHITE"] ="";
colour["NORMAL"]="";
}
};
};
class Logger {
protected:
Colours &Painter;
int active;
int timing_mode;
int topWidth{-1}, chanWidth{-1};
static int timestamp;
std::string name, topName;
std::string COLOUR;
public:
static GridStopWatch GlobalStopWatch;
GridStopWatch LocalStopWatch;
GridStopWatch *StopWatch;
static std::ostream devnull;
std::string background() {return Painter.colour["NORMAL"];}
std::string evidence() {return Painter.colour["YELLOW"];}
std::string colour() {return Painter.colour[COLOUR];}
Logger(std::string topNm, int on, std::string nm, Colours& col_class, std::string col) : active(on),
name(nm),
topName(topNm),
Painter(col_class),
timing_mode(0),
COLOUR(col)
{
StopWatch = & GlobalStopWatch;
};
void Active(int on) {active = on;};
int isActive(void) {return active;};
static void Timestamp(int on) {timestamp = on;};
void Reset(void) {
StopWatch->Reset();
StopWatch->Start();
}
void TimingMode(int on) {
timing_mode = on;
if(on) {
StopWatch = &LocalStopWatch;
Reset();
}
}
void setTopWidth(const int w) {topWidth = w;}
void setChanWidth(const int w) {chanWidth = w;}
friend std::ostream& operator<< (std::ostream& stream, Logger& log){
if ( log.active ) {
stream << log.background()<< std::left;
if (log.topWidth > 0)
{
stream << std::setw(log.topWidth);
}
stream << log.topName << log.background()<< " : ";
stream << log.colour() << std::left;
if (log.chanWidth > 0)
{
stream << std::setw(log.chanWidth);
}
stream << log.name << log.background() << " : ";
if ( log.timestamp ) {
log.StopWatch->Stop();
GridTime now = log.StopWatch->Elapsed();
if ( log.timing_mode==1 ) log.StopWatch->Reset();
log.StopWatch->Start();
stream << log.evidence()<< std::setw(6)<<now << log.background() << " : " ;
}
stream << log.colour();
return stream;
} else {
return devnull;
}
}
};
class GridLogger: public Logger {
public:
GridLogger(int on, std::string nm, Colours&col_class, std::string col_key = "NORMAL"):
Logger("Grid", on, nm, col_class, col_key){};
};
void GridLogConfigure(std::vector<std::string> &logstreams);
extern GridLogger GridLogIRL;
extern GridLogger GridLogSolver;
extern GridLogger GridLogError;
extern GridLogger GridLogWarning;
extern GridLogger GridLogMessage;
extern GridLogger GridLogDebug ;
extern GridLogger GridLogPerformance;
extern GridLogger GridLogIterative ;
extern GridLogger GridLogIntegrator ;
extern Colours GridLogColours;
std::string demangle(const char* name) ;
#define _NBACKTRACE (256)
extern void * Grid_backtrace_buffer[_NBACKTRACE];
#define BACKTRACEFILE() {\
char string[20]; \
std::sprintf(string,"backtrace.%d",CartesianCommunicator::RankWorld()); \
std::FILE * fp = std::fopen(string,"w"); \
BACKTRACEFP(fp)\
std::fclose(fp); \
}
#ifdef HAVE_EXECINFO_H
#define BACKTRACEFP(fp) { \
int symbols = backtrace (Grid_backtrace_buffer,_NBACKTRACE);\
char **strings = backtrace_symbols(Grid_backtrace_buffer,symbols);\
for (int i = 0; i < symbols; i++){\
std::fprintf (fp,"BackTrace Strings: %d %s\n",i, demangle(strings[i]).c_str()); std::fflush(fp); \
}\
}
#else
#define BACKTRACEFP(fp) { \
std::fprintf (fp,"BT %d %lx\n",0, __builtin_return_address(0)); std::fflush(fp); \
std::fprintf (fp,"BT %d %lx\n",1, __builtin_return_address(1)); std::fflush(fp); \
std::fprintf (fp,"BT %d %lx\n",2, __builtin_return_address(2)); std::fflush(fp); \
std::fprintf (fp,"BT %d %lx\n",3, __builtin_return_address(3)); std::fflush(fp); \
}
#endif
#define BACKTRACE() BACKTRACEFP(stdout)
}
#endif
+729
View File
@@ -0,0 +1,729 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/parallelIO/BinaryIO.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: Guido Cossu<guido.cossu@ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_BINARY_IO_H
#define GRID_BINARY_IO_H
#if defined(GRID_COMMS_MPI) || defined(GRID_COMMS_MPI3) || defined(GRID_COMMS_MPIT)
#define USE_MPI_IO
#else
#undef USE_MPI_IO
#endif
#ifdef HAVE_ENDIAN_H
#include <endian.h>
#endif
#include <arpa/inet.h>
#include <algorithm>
namespace Grid {
/////////////////////////////////////////////////////////////////////////////////
// Byte reversal garbage
/////////////////////////////////////////////////////////////////////////////////
inline uint32_t byte_reverse32(uint32_t f) {
f = ((f&0xFF)<<24) | ((f&0xFF00)<<8) | ((f&0xFF0000)>>8) | ((f&0xFF000000UL)>>24) ;
return f;
}
inline uint64_t byte_reverse64(uint64_t f) {
uint64_t g;
g = ((f&0xFF)<<24) | ((f&0xFF00)<<8) | ((f&0xFF0000)>>8) | ((f&0xFF000000UL)>>24) ;
g = g << 32;
f = f >> 32;
g|= ((f&0xFF)<<24) | ((f&0xFF00)<<8) | ((f&0xFF0000)>>8) | ((f&0xFF000000UL)>>24) ;
return g;
}
#if BYTE_ORDER == BIG_ENDIAN
inline uint64_t Grid_ntohll(uint64_t A) { return A; }
#else
inline uint64_t Grid_ntohll(uint64_t A) {
return byte_reverse64(A);
}
#endif
// A little helper
inline void removeWhitespace(std::string &key)
{
key.erase(std::remove_if(key.begin(), key.end(), ::isspace),key.end());
}
///////////////////////////////////////////////////////////////////////////////////////////////////
// Static class holding the parallel IO code
// Could just use a namespace
///////////////////////////////////////////////////////////////////////////////////////////////////
class BinaryIO {
public:
/////////////////////////////////////////////////////////////////////////////
// more byte manipulation helpers
/////////////////////////////////////////////////////////////////////////////
template<class vobj> static inline void Uint32Checksum(Lattice<vobj> &lat,uint32_t &nersc_csum)
{
typedef typename vobj::scalar_object sobj;
GridBase *grid = lat._grid;
uint64_t lsites = grid->lSites();
std::vector<sobj> scalardata(lsites);
unvectorizeToLexOrdArray(scalardata,lat);
NerscChecksum(grid,scalardata,nersc_csum);
}
template <class fobj>
static inline void NerscChecksum(GridBase *grid, std::vector<fobj> &fbuf, uint32_t &nersc_csum)
{
const uint64_t size32 = sizeof(fobj) / sizeof(uint32_t);
uint64_t lsites = grid->lSites();
if (fbuf.size() == 1)
{
lsites = 1;
}
PARALLEL_REGION
{
uint32_t nersc_csum_thr = 0;
PARALLEL_FOR_LOOP_INTERN
for (uint64_t local_site = 0; local_site < lsites; local_site++)
{
uint32_t *site_buf = (uint32_t *)&fbuf[local_site];
for (uint64_t j = 0; j < size32; j++)
{
nersc_csum_thr = nersc_csum_thr + site_buf[j];
}
}
PARALLEL_CRITICAL
{
nersc_csum += nersc_csum_thr;
}
}
}
template<class fobj> static inline void ScidacChecksum(GridBase *grid,std::vector<fobj> &fbuf,uint32_t &scidac_csuma,uint32_t &scidac_csumb)
{
const uint64_t size32 = sizeof(fobj)/sizeof(uint32_t);
int nd = grid->_ndimension;
uint64_t lsites =grid->lSites();
if (fbuf.size()==1) {
lsites=1;
}
std::vector<int> local_vol =grid->LocalDimensions();
std::vector<int> local_start =grid->LocalStarts();
std::vector<int> global_vol =grid->FullDimensions();
PARALLEL_REGION
{
std::vector<int> coor(nd);
uint32_t scidac_csuma_thr=0;
uint32_t scidac_csumb_thr=0;
uint32_t site_crc=0;
PARALLEL_FOR_LOOP_INTERN
for(uint64_t local_site=0;local_site<lsites;local_site++){
uint32_t * site_buf = (uint32_t *)&fbuf[local_site];
/*
* Scidac csum is rather more heavyweight
* FIXME -- 128^3 x 256 x 16 will overflow.
*/
int global_site;
Lexicographic::CoorFromIndex(coor,local_site,local_vol);
for(int d=0;d<nd;d++) {
coor[d] = coor[d]+local_start[d];
}
Lexicographic::IndexFromCoor(coor,global_site,global_vol);
uint32_t gsite29 = global_site%29;
uint32_t gsite31 = global_site%31;
site_crc = crc32(0,(unsigned char *)site_buf,sizeof(fobj));
// std::cout << "Site "<<local_site << " crc "<<std::hex<<site_crc<<std::dec<<std::endl;
// std::cout << "Site "<<local_site << std::hex<<site_buf[0] <<site_buf[1]<<std::dec <<std::endl;
scidac_csuma_thr ^= site_crc<<gsite29 | site_crc>>(32-gsite29);
scidac_csumb_thr ^= site_crc<<gsite31 | site_crc>>(32-gsite31);
}
PARALLEL_CRITICAL
{
scidac_csuma^= scidac_csuma_thr;
scidac_csumb^= scidac_csumb_thr;
}
}
}
// Network is big endian
static inline void htobe32_v(void *file_object,uint32_t bytes){ be32toh_v(file_object,bytes);}
static inline void htobe64_v(void *file_object,uint32_t bytes){ be64toh_v(file_object,bytes);}
static inline void htole32_v(void *file_object,uint32_t bytes){ le32toh_v(file_object,bytes);}
static inline void htole64_v(void *file_object,uint32_t bytes){ le64toh_v(file_object,bytes);}
static inline void be32toh_v(void *file_object,uint64_t bytes)
{
uint32_t * f = (uint32_t *)file_object;
uint64_t count = bytes/sizeof(uint32_t);
parallel_for(uint64_t i=0;i<count;i++){
f[i] = ntohl(f[i]);
}
}
// LE must Swap and switch to host
static inline void le32toh_v(void *file_object,uint64_t bytes)
{
uint32_t *fp = (uint32_t *)file_object;
uint32_t f;
uint64_t count = bytes/sizeof(uint32_t);
parallel_for(uint64_t i=0;i<count;i++){
f = fp[i];
// got network order and the network to host
f = ((f&0xFF)<<24) | ((f&0xFF00)<<8) | ((f&0xFF0000)>>8) | ((f&0xFF000000UL)>>24) ;
fp[i] = ntohl(f);
}
}
// BE is same as network
static inline void be64toh_v(void *file_object,uint64_t bytes)
{
uint64_t * f = (uint64_t *)file_object;
uint64_t count = bytes/sizeof(uint64_t);
parallel_for(uint64_t i=0;i<count;i++){
f[i] = Grid_ntohll(f[i]);
}
}
// LE must swap and switch;
static inline void le64toh_v(void *file_object,uint64_t bytes)
{
uint64_t *fp = (uint64_t *)file_object;
uint64_t f,g;
uint64_t count = bytes/sizeof(uint64_t);
parallel_for(uint64_t i=0;i<count;i++){
f = fp[i];
// got network order and the network to host
g = ((f&0xFF)<<24) | ((f&0xFF00)<<8) | ((f&0xFF0000)>>8) | ((f&0xFF000000UL)>>24) ;
g = g << 32;
f = f >> 32;
g|= ((f&0xFF)<<24) | ((f&0xFF00)<<8) | ((f&0xFF0000)>>8) | ((f&0xFF000000UL)>>24) ;
fp[i] = Grid_ntohll(g);
}
}
/////////////////////////////////////////////////////////////////////////////
// Real action:
// Read or Write distributed lexico array of ANY object to a specific location in file
//////////////////////////////////////////////////////////////////////////////////////
static const int BINARYIO_MASTER_APPEND = 0x10;
static const int BINARYIO_UNORDERED = 0x08;
static const int BINARYIO_LEXICOGRAPHIC = 0x04;
static const int BINARYIO_READ = 0x02;
static const int BINARYIO_WRITE = 0x01;
template<class word,class fobj>
static inline void IOobject(word w,
GridBase *grid,
std::vector<fobj> &iodata,
std::string file,
uint64_t& offset,
const std::string &format, int control,
uint32_t &nersc_csum,
uint32_t &scidac_csuma,
uint32_t &scidac_csumb)
{
grid->Barrier();
GridStopWatch timer;
GridStopWatch bstimer;
nersc_csum=0;
scidac_csuma=0;
scidac_csumb=0;
int ndim = grid->Dimensions();
int nrank = grid->ProcessorCount();
int myrank = grid->ThisRank();
std::vector<int> psizes = grid->ProcessorGrid();
std::vector<int> pcoor = grid->ThisProcessorCoor();
std::vector<int> gLattice= grid->GlobalDimensions();
std::vector<int> lLattice= grid->LocalDimensions();
std::vector<int> lStart(ndim);
std::vector<int> gStart(ndim);
// Flatten the file
uint64_t lsites = grid->lSites();
if ( control & BINARYIO_MASTER_APPEND ) {
assert(iodata.size()==1);
} else {
assert(lsites==iodata.size());
}
for(int d=0;d<ndim;d++){
gStart[d] = lLattice[d]*pcoor[d];
lStart[d] = 0;
}
#ifdef USE_MPI_IO
std::vector<int> distribs(ndim,MPI_DISTRIBUTE_BLOCK);
std::vector<int> dargs (ndim,MPI_DISTRIBUTE_DFLT_DARG);
MPI_Datatype mpiObject;
MPI_Datatype fileArray;
MPI_Datatype localArray;
MPI_Datatype mpiword;
MPI_Offset disp = offset;
MPI_File fh ;
MPI_Status status;
int numword;
if ( sizeof( word ) == sizeof(float ) ) {
numword = sizeof(fobj)/sizeof(float);
mpiword = MPI_FLOAT;
} else {
numword = sizeof(fobj)/sizeof(double);
mpiword = MPI_DOUBLE;
}
//////////////////////////////////////////////////////////////////////////////
// Sobj in MPI phrasing
//////////////////////////////////////////////////////////////////////////////
int ierr;
ierr = MPI_Type_contiguous(numword,mpiword,&mpiObject); assert(ierr==0);
ierr = MPI_Type_commit(&mpiObject);
//////////////////////////////////////////////////////////////////////////////
// File global array data type
//////////////////////////////////////////////////////////////////////////////
ierr=MPI_Type_create_subarray(ndim,&gLattice[0],&lLattice[0],&gStart[0],MPI_ORDER_FORTRAN, mpiObject,&fileArray); assert(ierr==0);
ierr=MPI_Type_commit(&fileArray); assert(ierr==0);
//////////////////////////////////////////////////////////////////////////////
// local lattice array
//////////////////////////////////////////////////////////////////////////////
ierr=MPI_Type_create_subarray(ndim,&lLattice[0],&lLattice[0],&lStart[0],MPI_ORDER_FORTRAN, mpiObject,&localArray); assert(ierr==0);
ierr=MPI_Type_commit(&localArray); assert(ierr==0);
#endif
//////////////////////////////////////////////////////////////////////////////
// Byte order
//////////////////////////////////////////////////////////////////////////////
int ieee32big = (format == std::string("IEEE32BIG"));
int ieee32 = (format == std::string("IEEE32"));
int ieee64big = (format == std::string("IEEE64BIG"));
int ieee64 = (format == std::string("IEEE64"));
//////////////////////////////////////////////////////////////////////////////
// Do the I/O
//////////////////////////////////////////////////////////////////////////////
if ( control & BINARYIO_READ ) {
timer.Start();
if ( (control & BINARYIO_LEXICOGRAPHIC) && (nrank > 1) ) {
#ifdef USE_MPI_IO
std::cout<< GridLogMessage<<"IOobject: MPI read I/O "<< file<< std::endl;
ierr=MPI_File_open(grid->communicator,(char *) file.c_str(), MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); assert(ierr==0);
ierr=MPI_File_set_view(fh, disp, mpiObject, fileArray, "native", MPI_INFO_NULL); assert(ierr==0);
ierr=MPI_File_read_all(fh, &iodata[0], 1, localArray, &status); assert(ierr==0);
MPI_File_close(&fh);
MPI_Type_free(&fileArray);
MPI_Type_free(&localArray);
#else
assert(0);
#endif
} else {
std::cout << GridLogMessage <<"IOobject: C++ read I/O " << file << " : "
<< iodata.size() * sizeof(fobj) << " bytes" << std::endl;
std::ifstream fin;
fin.open(file, std::ios::binary | std::ios::in);
if (control & BINARYIO_MASTER_APPEND)
{
fin.seekg(-sizeof(fobj), fin.end);
}
else
{
fin.seekg(offset + myrank * lsites * sizeof(fobj));
}
fin.read((char *)&iodata[0], iodata.size() * sizeof(fobj));
assert(fin.fail() == 0);
fin.close();
}
timer.Stop();
grid->Barrier();
bstimer.Start();
ScidacChecksum(grid,iodata,scidac_csuma,scidac_csumb);
if (ieee32big) be32toh_v((void *)&iodata[0], sizeof(fobj)*iodata.size());
if (ieee32) le32toh_v((void *)&iodata[0], sizeof(fobj)*iodata.size());
if (ieee64big) be64toh_v((void *)&iodata[0], sizeof(fobj)*iodata.size());
if (ieee64) le64toh_v((void *)&iodata[0], sizeof(fobj)*iodata.size());
NerscChecksum(grid,iodata,nersc_csum);
bstimer.Stop();
}
if ( control & BINARYIO_WRITE ) {
bstimer.Start();
NerscChecksum(grid,iodata,nersc_csum);
if (ieee32big) htobe32_v((void *)&iodata[0], sizeof(fobj)*iodata.size());
if (ieee32) htole32_v((void *)&iodata[0], sizeof(fobj)*iodata.size());
if (ieee64big) htobe64_v((void *)&iodata[0], sizeof(fobj)*iodata.size());
if (ieee64) htole64_v((void *)&iodata[0], sizeof(fobj)*iodata.size());
ScidacChecksum(grid,iodata,scidac_csuma,scidac_csumb);
bstimer.Stop();
grid->Barrier();
timer.Start();
if ( (control & BINARYIO_LEXICOGRAPHIC) && (nrank > 1) ) {
#ifdef USE_MPI_IO
std::cout << GridLogMessage <<"IOobject: MPI write I/O " << file << std::endl;
ierr = MPI_File_open(grid->communicator, (char *)file.c_str(), MPI_MODE_RDWR | MPI_MODE_CREATE, MPI_INFO_NULL, &fh);
// std::cout << GridLogMessage << "Checking for errors" << std::endl;
if (ierr != MPI_SUCCESS)
{
char error_string[BUFSIZ];
int length_of_error_string, error_class;
MPI_Error_class(ierr, &error_class);
MPI_Error_string(error_class, error_string, &length_of_error_string);
fprintf(stderr, "%3d: %s\n", myrank, error_string);
MPI_Error_string(ierr, error_string, &length_of_error_string);
fprintf(stderr, "%3d: %s\n", myrank, error_string);
MPI_Abort(MPI_COMM_WORLD, 1); //assert(ierr == 0);
}
std::cout << GridLogDebug << "MPI write I/O set view " << file << std::endl;
ierr = MPI_File_set_view(fh, disp, mpiObject, fileArray, "native", MPI_INFO_NULL);
assert(ierr == 0);
std::cout << GridLogDebug << "MPI write I/O write all " << file << std::endl;
ierr = MPI_File_write_all(fh, &iodata[0], 1, localArray, &status);
assert(ierr == 0);
MPI_Offset os;
MPI_File_get_position(fh, &os);
MPI_File_get_byte_offset(fh, os, &disp);
offset = disp;
MPI_File_close(&fh);
MPI_Type_free(&fileArray);
MPI_Type_free(&localArray);
#else
assert(0);
#endif
} else {
std::cout << GridLogMessage << "IOobject: C++ write I/O " << file << " : "
<< iodata.size() * sizeof(fobj) << " bytes and offset " << offset << std::endl;
std::ofstream fout;
fout.exceptions ( std::fstream::failbit | std::fstream::badbit );
try {
if (offset) { // Must already exist and contain data
fout.open(file,std::ios::binary|std::ios::out|std::ios::in);
} else { // Allow create
fout.open(file,std::ios::binary|std::ios::out);
}
} catch (const std::fstream::failure& exc) {
std::cout << GridLogError << "Error in opening the file " << file << " for output" <<std::endl;
std::cout << GridLogError << "Exception description: " << exc.what() << std::endl;
// std::cout << GridLogError << "Probable cause: wrong path, inaccessible location "<< std::endl;
#ifdef USE_MPI_IO
MPI_Abort(MPI_COMM_WORLD,1);
#else
exit(1);
#endif
}
if ( control & BINARYIO_MASTER_APPEND ) {
try {
fout.seekp(0,fout.end);
} catch (const std::fstream::failure& exc) {
std::cout << "Exception in seeking file end " << file << std::endl;
}
} else {
try {
fout.seekp(offset+myrank*lsites*sizeof(fobj));
} catch (const std::fstream::failure& exc) {
std::cout << "Exception in seeking file " << file <<" offset "<< offset << std::endl;
}
}
try {
fout.write((char *)&iodata[0],iodata.size()*sizeof(fobj));//assert( fout.fail()==0);
}
catch (const std::fstream::failure& exc) {
std::cout << "Exception in writing file " << file << std::endl;
std::cout << GridLogError << "Exception description: "<< exc.what() << std::endl;
#ifdef USE_MPI_IO
MPI_Abort(MPI_COMM_WORLD,1);
#else
exit(1);
#endif
}
offset = fout.tellp();
fout.close();
}
timer.Stop();
}
std::cout<<GridLogMessage<<"IOobject: ";
if ( control & BINARYIO_READ) std::cout << " read ";
else std::cout << " write ";
uint64_t bytes = sizeof(fobj)*iodata.size()*nrank;
std::cout<< bytes <<" bytes in "<<timer.Elapsed() <<" "
<< (double)bytes/ (double)timer.useconds() <<" MB/s "<<std::endl;
std::cout<<GridLogMessage<<"IOobject: endian and checksum overhead "<<bstimer.Elapsed() <<std::endl;
//////////////////////////////////////////////////////////////////////////////
// Safety check
//////////////////////////////////////////////////////////////////////////////
// if the data size is 1 we do not want to sum over the MPI ranks
if (iodata.size() != 1){
grid->Barrier();
grid->GlobalSum(nersc_csum);
grid->GlobalXOR(scidac_csuma);
grid->GlobalXOR(scidac_csumb);
grid->Barrier();
}
}
/////////////////////////////////////////////////////////////////////////////
// Read a Lattice of object
//////////////////////////////////////////////////////////////////////////////////////
template<class vobj,class fobj,class munger>
static inline void readLatticeObject(Lattice<vobj> &Umu,
std::string file,
munger munge,
uint64_t offset,
const std::string &format,
uint32_t &nersc_csum,
uint32_t &scidac_csuma,
uint32_t &scidac_csumb)
{
typedef typename vobj::scalar_object sobj;
typedef typename vobj::Realified::scalar_type word; word w=0;
GridBase *grid = Umu._grid;
uint64_t lsites = grid->lSites();
std::vector<sobj> scalardata(lsites);
std::vector<fobj> iodata(lsites); // Munge, checksum, byte order in here
IOobject(w,grid,iodata,file,offset,format,BINARYIO_READ|BINARYIO_LEXICOGRAPHIC,
nersc_csum,scidac_csuma,scidac_csumb);
GridStopWatch timer;
timer.Start();
parallel_for(uint64_t x=0;x<lsites;x++) munge(iodata[x], scalardata[x]);
vectorizeFromLexOrdArray(scalardata,Umu);
grid->Barrier();
timer.Stop();
std::cout<<GridLogMessage<<"readLatticeObject: vectorize overhead "<<timer.Elapsed() <<std::endl;
}
/////////////////////////////////////////////////////////////////////////////
// Write a Lattice of object
//////////////////////////////////////////////////////////////////////////////////////
template<class vobj,class fobj,class munger>
static inline void writeLatticeObject(Lattice<vobj> &Umu,
std::string file,
munger munge,
uint64_t offset,
const std::string &format,
uint32_t &nersc_csum,
uint32_t &scidac_csuma,
uint32_t &scidac_csumb)
{
typedef typename vobj::scalar_object sobj;
typedef typename vobj::Realified::scalar_type word; word w=0;
GridBase *grid = Umu._grid;
uint64_t lsites = grid->lSites();
std::vector<sobj> scalardata(lsites);
std::vector<fobj> iodata(lsites); // Munge, checksum, byte order in here
//////////////////////////////////////////////////////////////////////////////
// Munge [ .e.g 3rd row recon ]
//////////////////////////////////////////////////////////////////////////////
GridStopWatch timer; timer.Start();
unvectorizeToLexOrdArray(scalardata,Umu);
parallel_for(uint64_t x=0;x<lsites;x++) munge(scalardata[x],iodata[x]);
grid->Barrier();
timer.Stop();
IOobject(w,grid,iodata,file,offset,format,BINARYIO_WRITE|BINARYIO_LEXICOGRAPHIC,
nersc_csum,scidac_csuma,scidac_csumb);
std::cout<<GridLogMessage<<"writeLatticeObject: unvectorize overhead "<<timer.Elapsed() <<std::endl;
}
/////////////////////////////////////////////////////////////////////////////
// Read a RNG; use IOobject and lexico map to an array of state
//////////////////////////////////////////////////////////////////////////////////////
static inline void readRNG(GridSerialRNG &serial,
GridParallelRNG &parallel,
std::string file,
uint64_t offset,
uint32_t &nersc_csum,
uint32_t &scidac_csuma,
uint32_t &scidac_csumb)
{
typedef typename GridSerialRNG::RngStateType RngStateType;
const int RngStateCount = GridSerialRNG::RngStateCount;
typedef std::array<RngStateType,RngStateCount> RNGstate;
typedef RngStateType word; word w=0;
std::string format = "IEEE32BIG";
GridBase *grid = parallel._grid;
uint64_t gsites = grid->gSites();
uint64_t lsites = grid->lSites();
uint32_t nersc_csum_tmp = 0;
uint32_t scidac_csuma_tmp = 0;
uint32_t scidac_csumb_tmp = 0;
GridStopWatch timer;
std::cout << GridLogMessage << "RNG read I/O on file " << file << std::endl;
std::vector<RNGstate> iodata(lsites);
IOobject(w,grid,iodata,file,offset,format,BINARYIO_READ|BINARYIO_LEXICOGRAPHIC,
nersc_csum,scidac_csuma,scidac_csumb);
timer.Start();
parallel_for(uint64_t lidx=0;lidx<lsites;lidx++){
std::vector<RngStateType> tmp(RngStateCount);
std::copy(iodata[lidx].begin(),iodata[lidx].end(),tmp.begin());
parallel.SetState(tmp,lidx);
}
timer.Stop();
iodata.resize(1);
IOobject(w,grid,iodata,file,offset,format,BINARYIO_READ|BINARYIO_MASTER_APPEND,
nersc_csum_tmp,scidac_csuma_tmp,scidac_csumb_tmp);
{
std::vector<RngStateType> tmp(RngStateCount);
std::copy(iodata[0].begin(),iodata[0].end(),tmp.begin());
serial.SetState(tmp,0);
}
nersc_csum = nersc_csum + nersc_csum_tmp;
scidac_csuma = scidac_csuma ^ scidac_csuma_tmp;
scidac_csumb = scidac_csumb ^ scidac_csumb_tmp;
std::cout << GridLogMessage << "RNG file nersc_checksum " << std::hex << nersc_csum << std::dec << std::endl;
std::cout << GridLogMessage << "RNG file scidac_checksuma " << std::hex << scidac_csuma << std::dec << std::endl;
std::cout << GridLogMessage << "RNG file scidac_checksumb " << std::hex << scidac_csumb << std::dec << std::endl;
std::cout << GridLogMessage << "RNG state overhead " << timer.Elapsed() << std::endl;
}
/////////////////////////////////////////////////////////////////////////////
// Write a RNG; lexico map to an array of state and use IOobject
//////////////////////////////////////////////////////////////////////////////////////
static inline void writeRNG(GridSerialRNG &serial,
GridParallelRNG &parallel,
std::string file,
uint64_t offset,
uint32_t &nersc_csum,
uint32_t &scidac_csuma,
uint32_t &scidac_csumb)
{
typedef typename GridSerialRNG::RngStateType RngStateType;
typedef RngStateType word; word w=0;
const int RngStateCount = GridSerialRNG::RngStateCount;
typedef std::array<RngStateType,RngStateCount> RNGstate;
GridBase *grid = parallel._grid;
uint64_t gsites = grid->gSites();
uint64_t lsites = grid->lSites();
uint32_t nersc_csum_tmp;
uint32_t scidac_csuma_tmp;
uint32_t scidac_csumb_tmp;
GridStopWatch timer;
std::string format = "IEEE32BIG";
std::cout << GridLogMessage << "RNG write I/O on file " << file << std::endl;
timer.Start();
std::vector<RNGstate> iodata(lsites);
parallel_for(uint64_t lidx=0;lidx<lsites;lidx++){
std::vector<RngStateType> tmp(RngStateCount);
parallel.GetState(tmp,lidx);
std::copy(tmp.begin(),tmp.end(),iodata[lidx].begin());
}
timer.Stop();
IOobject(w,grid,iodata,file,offset,format,BINARYIO_WRITE|BINARYIO_LEXICOGRAPHIC,
nersc_csum,scidac_csuma,scidac_csumb);
iodata.resize(1);
{
std::vector<RngStateType> tmp(RngStateCount);
serial.GetState(tmp,0);
std::copy(tmp.begin(),tmp.end(),iodata[0].begin());
}
IOobject(w,grid,iodata,file,offset,format,BINARYIO_WRITE|BINARYIO_MASTER_APPEND,
nersc_csum_tmp,scidac_csuma_tmp,scidac_csumb_tmp);
nersc_csum = nersc_csum + nersc_csum_tmp;
scidac_csuma = scidac_csuma ^ scidac_csuma_tmp;
scidac_csumb = scidac_csumb ^ scidac_csumb_tmp;
std::cout << GridLogMessage << "RNG file checksum " << std::hex << nersc_csum << std::dec << std::endl;
std::cout << GridLogMessage << "RNG file checksuma " << std::hex << scidac_csuma << std::dec << std::endl;
std::cout << GridLogMessage << "RNG file checksumb " << std::hex << scidac_csumb << std::dec << std::endl;
std::cout << GridLogMessage << "RNG state overhead " << timer.Elapsed() << std::endl;
}
};
}
#endif
+875
View File
@@ -0,0 +1,875 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/parallelIO/IldgIO.h
Copyright (C) 2015
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_ILDG_IO_H
#define GRID_ILDG_IO_H
#ifdef HAVE_LIME
#include <algorithm>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <map>
#include <pwd.h>
#include <sys/utsname.h>
#include <unistd.h>
//C-Lime is a must have for this functionality
extern "C" {
#include "lime.h"
}
namespace Grid {
namespace QCD {
/////////////////////////////////
// Encode word types as strings
/////////////////////////////////
template<class word> inline std::string ScidacWordMnemonic(void){ return std::string("unknown"); }
template<> inline std::string ScidacWordMnemonic<double> (void){ return std::string("D"); }
template<> inline std::string ScidacWordMnemonic<float> (void){ return std::string("F"); }
template<> inline std::string ScidacWordMnemonic< int32_t>(void){ return std::string("I32_t"); }
template<> inline std::string ScidacWordMnemonic<uint32_t>(void){ return std::string("U32_t"); }
template<> inline std::string ScidacWordMnemonic< int64_t>(void){ return std::string("I64_t"); }
template<> inline std::string ScidacWordMnemonic<uint64_t>(void){ return std::string("U64_t"); }
/////////////////////////////////////////
// Encode a generic tensor as a string
/////////////////////////////////////////
template<class vobj> std::string ScidacRecordTypeString(int &colors, int &spins, int & typesize,int &datacount) {
typedef typename getPrecision<vobj>::real_scalar_type stype;
int _ColourN = indexRank<ColourIndex,vobj>();
int _ColourScalar = isScalar<ColourIndex,vobj>();
int _ColourVector = isVector<ColourIndex,vobj>();
int _ColourMatrix = isMatrix<ColourIndex,vobj>();
int _SpinN = indexRank<SpinIndex,vobj>();
int _SpinScalar = isScalar<SpinIndex,vobj>();
int _SpinVector = isVector<SpinIndex,vobj>();
int _SpinMatrix = isMatrix<SpinIndex,vobj>();
int _LorentzN = indexRank<LorentzIndex,vobj>();
int _LorentzScalar = isScalar<LorentzIndex,vobj>();
int _LorentzVector = isVector<LorentzIndex,vobj>();
int _LorentzMatrix = isMatrix<LorentzIndex,vobj>();
std::stringstream stream;
stream << "GRID_";
stream << ScidacWordMnemonic<stype>();
if ( _LorentzVector ) stream << "_LorentzVector"<<_LorentzN;
if ( _LorentzMatrix ) stream << "_LorentzMatrix"<<_LorentzN;
if ( _SpinVector ) stream << "_SpinVector"<<_SpinN;
if ( _SpinMatrix ) stream << "_SpinMatrix"<<_SpinN;
if ( _ColourVector ) stream << "_ColourVector"<<_ColourN;
if ( _ColourMatrix ) stream << "_ColourMatrix"<<_ColourN;
if ( _ColourScalar && _LorentzScalar && _SpinScalar ) stream << "_Complex";
typesize = sizeof(typename vobj::scalar_type);
if ( _ColourMatrix ) typesize*= _ColourN*_ColourN;
else typesize*= _ColourN;
if ( _SpinMatrix ) typesize*= _SpinN*_SpinN;
else typesize*= _SpinN;
colors = _ColourN;
spins = _SpinN;
datacount = _LorentzN;
return stream.str();
}
template<class vobj> std::string ScidacRecordTypeString(Lattice<vobj> & lat,int &colors, int &spins, int & typesize,int &datacount) {
return ScidacRecordTypeString<vobj>(colors,spins,typesize,datacount);
};
////////////////////////////////////////////////////////////
// Helper to fill out metadata
////////////////////////////////////////////////////////////
template<class vobj> void ScidacMetaData(Lattice<vobj> & field,
FieldMetaData &header,
scidacRecord & _scidacRecord,
scidacFile & _scidacFile)
{
typedef typename getPrecision<vobj>::real_scalar_type stype;
/////////////////////////////////////
// Pull Grid's metadata
/////////////////////////////////////
PrepareMetaData(field,header);
/////////////////////////////////////
// Scidac Private File structure
/////////////////////////////////////
_scidacFile = scidacFile(field._grid);
/////////////////////////////////////
// Scidac Private Record structure
/////////////////////////////////////
scidacRecord sr;
sr.datatype = ScidacRecordTypeString(field,sr.colors,sr.spins,sr.typesize,sr.datacount);
sr.date = header.creation_date;
sr.precision = ScidacWordMnemonic<stype>();
sr.recordtype = GRID_IO_FIELD;
_scidacRecord = sr;
// std::cout << GridLogMessage << "Build SciDAC datatype " <<sr.datatype<<std::endl;
}
///////////////////////////////////////////////////////
// Scidac checksum
///////////////////////////////////////////////////////
static int scidacChecksumVerify(scidacChecksum &scidacChecksum_,uint32_t scidac_csuma,uint32_t scidac_csumb)
{
uint32_t scidac_checksuma = stoull(scidacChecksum_.suma,0,16);
uint32_t scidac_checksumb = stoull(scidacChecksum_.sumb,0,16);
if ( scidac_csuma !=scidac_checksuma) return 0;
if ( scidac_csumb !=scidac_checksumb) return 0;
return 1;
}
////////////////////////////////////////////////////////////////////////////////////
// Lime, ILDG and Scidac I/O classes
////////////////////////////////////////////////////////////////////////////////////
class GridLimeReader : public BinaryIO {
public:
///////////////////////////////////////////////////
// FIXME: format for RNG? Now just binary out instead
///////////////////////////////////////////////////
FILE *File;
LimeReader *LimeR;
std::string filename;
/////////////////////////////////////////////
// Open the file
/////////////////////////////////////////////
void open(const std::string &_filename)
{
filename= _filename;
File = fopen(filename.c_str(), "r");
if (File == nullptr)
{
std::cerr << "cannot open file '" << filename << "'" << std::endl;
abort();
}
LimeR = limeCreateReader(File);
}
/////////////////////////////////////////////
// Close the file
/////////////////////////////////////////////
void close(void){
fclose(File);
// limeDestroyReader(LimeR);
}
////////////////////////////////////////////
// Read a generic lattice field and verify checksum
////////////////////////////////////////////
template<class vobj>
void readLimeLatticeBinaryObject(Lattice<vobj> &field,std::string record_name)
{
typedef typename vobj::scalar_object sobj;
scidacChecksum scidacChecksum_;
uint32_t nersc_csum,scidac_csuma,scidac_csumb;
std::string format = getFormatString<vobj>();
while ( limeReaderNextRecord(LimeR) == LIME_SUCCESS ) {
uint64_t file_bytes =limeReaderBytes(LimeR);
// std::cout << GridLogMessage << limeReaderType(LimeR) << " "<< file_bytes <<" bytes "<<std::endl;
// std::cout << GridLogMessage<< " readLimeObject seeking "<< record_name <<" found record :" <<limeReaderType(LimeR) <<std::endl;
if ( !strncmp(limeReaderType(LimeR), record_name.c_str(),strlen(record_name.c_str()) ) ) {
// std::cout << GridLogMessage<< " readLimeLatticeBinaryObject matches ! " <<std::endl;
uint64_t PayloadSize = sizeof(sobj) * field._grid->_gsites;
// std::cout << "R sizeof(sobj)= " <<sizeof(sobj)<<std::endl;
// std::cout << "R Gsites " <<field._grid->_gsites<<std::endl;
// std::cout << "R Payload expected " <<PayloadSize<<std::endl;
// std::cout << "R file size " <<file_bytes <<std::endl;
assert(PayloadSize == file_bytes);// Must match or user error
uint64_t offset= ftello(File);
// std::cout << " ReadLatticeObject from offset "<<offset << std::endl;
BinarySimpleMunger<sobj,sobj> munge;
BinaryIO::readLatticeObject< vobj, sobj >(field, filename, munge, offset, format,nersc_csum,scidac_csuma,scidac_csumb);
/////////////////////////////////////////////
// Insist checksum is next record
/////////////////////////////////////////////
readLimeObject(scidacChecksum_,std::string("scidacChecksum"),std::string(SCIDAC_CHECKSUM));
/////////////////////////////////////////////
// Verify checksums
/////////////////////////////////////////////
assert(scidacChecksumVerify(scidacChecksum_,scidac_csuma,scidac_csumb)==1);
return;
}
}
}
////////////////////////////////////////////
// Read a generic serialisable object
////////////////////////////////////////////
void readLimeObject(std::string &xmlstring,std::string record_name)
{
// should this be a do while; can we miss a first record??
while ( limeReaderNextRecord(LimeR) == LIME_SUCCESS ) {
// std::cout << GridLogMessage<< " readLimeObject seeking "<< record_name <<" found record :" <<limeReaderType(LimeR) <<std::endl;
uint64_t nbytes = limeReaderBytes(LimeR);//size of this record (configuration)
if ( !strncmp(limeReaderType(LimeR), record_name.c_str(),strlen(record_name.c_str()) ) ) {
// std::cout << GridLogMessage<< " readLimeObject matches ! " << record_name <<std::endl;
std::vector<char> xmlc(nbytes+1,'\0');
limeReaderReadData((void *)&xmlc[0], &nbytes, LimeR);
// std::cout << GridLogMessage<< " readLimeObject matches XML " << &xmlc[0] <<std::endl;
xmlstring = std::string(&xmlc[0]);
return;
}
}
assert(0);
}
template<class serialisable_object>
void readLimeObject(serialisable_object &object,std::string object_name,std::string record_name)
{
std::string xmlstring;
readLimeObject(xmlstring, record_name);
XmlReader RD(xmlstring, true, "");
read(RD,object_name,object);
}
};
class GridLimeWriter : public BinaryIO
{
public:
///////////////////////////////////////////////////
// FIXME: format for RNG? Now just binary out instead
// FIXME: collective calls or not ?
// : must know if I am the I/O boss
///////////////////////////////////////////////////
FILE *File;
LimeWriter *LimeW;
std::string filename;
bool boss_node;
GridLimeWriter( bool isboss = true) {
boss_node = isboss;
}
void open(const std::string &_filename) {
filename= _filename;
if ( boss_node ) {
File = fopen(filename.c_str(), "w");
LimeW = limeCreateWriter(File); assert(LimeW != NULL );
}
}
/////////////////////////////////////////////
// Close the file
/////////////////////////////////////////////
void close(void) {
if ( boss_node ) {
fclose(File);
}
// limeDestroyWriter(LimeW);
}
///////////////////////////////////////////////////////
// Lime utility functions
///////////////////////////////////////////////////////
int createLimeRecordHeader(std::string message, int MB, int ME, size_t PayloadSize)
{
if ( boss_node ) {
LimeRecordHeader *h;
h = limeCreateHeader(MB, ME, const_cast<char *>(message.c_str()), PayloadSize);
assert(limeWriteRecordHeader(h, LimeW) >= 0);
limeDestroyHeader(h);
}
return LIME_SUCCESS;
}
////////////////////////////////////////////
// Write a generic serialisable object
////////////////////////////////////////////
void writeLimeObject(int MB,int ME,XmlWriter &writer,std::string object_name,std::string record_name)
{
if ( boss_node ) {
std::string xmlstring = writer.docString();
// std::cout << "WriteLimeObject" << record_name <<std::endl;
uint64_t nbytes = xmlstring.size();
// std::cout << " xmlstring "<< nbytes<< " " << xmlstring <<std::endl;
int err;
LimeRecordHeader *h = limeCreateHeader(MB, ME,const_cast<char *>(record_name.c_str()), nbytes);
assert(h!= NULL);
err=limeWriteRecordHeader(h, LimeW); assert(err>=0);
err=limeWriteRecordData(&xmlstring[0], &nbytes, LimeW); assert(err>=0);
err=limeWriterCloseRecord(LimeW); assert(err>=0);
limeDestroyHeader(h);
}
}
template<class serialisable_object>
void writeLimeObject(int MB,int ME,serialisable_object &object,std::string object_name,std::string record_name, const unsigned int scientificPrec = 0)
{
XmlWriter WR("","");
if (scientificPrec)
{
WR.scientificFormat(true);
WR.setPrecision(scientificPrec);
}
write(WR,object_name,object);
writeLimeObject(MB, ME, WR, object_name, record_name);
}
////////////////////////////////////////////////////
// Write a generic lattice field and csum
// This routine is Collectively called by all nodes
// in communicator used by the field._grid
////////////////////////////////////////////////////
template<class vobj>
void writeLimeLatticeBinaryObject(Lattice<vobj> &field,std::string record_name)
{
////////////////////////////////////////////////////////////////////
// NB: FILE and iostream are jointly writing disjoint sequences in the
// the same file through different file handles (integer units).
//
// These are both buffered, so why I think this code is right is as follows.
//
// i) write record header to FILE *File, telegraphing the size; flush
// ii) ftello reads the offset from FILE *File .
// iii) iostream / MPI Open independently seek this offset. Write sequence direct to disk.
// Closes iostream and flushes.
// iv) fseek on FILE * to end of this disjoint section.
// v) Continue writing scidac record.
////////////////////////////////////////////////////////////////////
GridBase *grid = field._grid;
assert(boss_node == field._grid->IsBoss() );
////////////////////////////////////////////
// Create record header
////////////////////////////////////////////
typedef typename vobj::scalar_object sobj;
int err;
uint32_t nersc_csum,scidac_csuma,scidac_csumb;
uint64_t PayloadSize = sizeof(sobj) * grid->_gsites;
if ( boss_node ) {
createLimeRecordHeader(record_name, 0, 0, PayloadSize);
fflush(File);
}
// std::cout << "W sizeof(sobj)" <<sizeof(sobj)<<std::endl;
// std::cout << "W Gsites " <<field._grid->_gsites<<std::endl;
// std::cout << "W Payload expected " <<PayloadSize<<std::endl;
////////////////////////////////////////////////
// Check all nodes agree on file position
////////////////////////////////////////////////
uint64_t offset1;
if ( boss_node ) {
offset1 = ftello(File);
}
grid->Broadcast(0,(void *)&offset1,sizeof(offset1));
///////////////////////////////////////////
// The above is collective. Write by other means into the binary record
///////////////////////////////////////////
std::string format = getFormatString<vobj>();
BinarySimpleMunger<sobj,sobj> munge;
BinaryIO::writeLatticeObject<vobj,sobj>(field, filename, munge, offset1, format,nersc_csum,scidac_csuma,scidac_csumb);
///////////////////////////////////////////
// Wind forward and close the record
///////////////////////////////////////////
if ( boss_node ) {
fseek(File,0,SEEK_END);
uint64_t offset2 = ftello(File); // std::cout << " now at offset "<<offset2 << std::endl;
assert( (offset2-offset1) == PayloadSize);
}
/////////////////////////////////////////////////////////////
// Check MPI-2 I/O did what we expect to file
/////////////////////////////////////////////////////////////
if ( boss_node ) {
err=limeWriterCloseRecord(LimeW); assert(err>=0);
}
////////////////////////////////////////
// Write checksum element, propagaing forward from the BinaryIO
// Always pair a checksum with a binary object, and close message
////////////////////////////////////////
scidacChecksum checksum;
std::stringstream streama; streama << std::hex << scidac_csuma;
std::stringstream streamb; streamb << std::hex << scidac_csumb;
checksum.suma= streama.str();
checksum.sumb= streamb.str();
if ( boss_node ) {
writeLimeObject(0,1,checksum,std::string("scidacChecksum"),std::string(SCIDAC_CHECKSUM));
}
}
};
class ScidacWriter : public GridLimeWriter {
public:
ScidacWriter(bool isboss =true ) : GridLimeWriter(isboss) { };
template<class SerialisableUserFile>
void writeScidacFileRecord(GridBase *grid,SerialisableUserFile &_userFile)
{
scidacFile _scidacFile(grid);
if ( this->boss_node ) {
writeLimeObject(1,0,_scidacFile,_scidacFile.SerialisableClassName(),std::string(SCIDAC_PRIVATE_FILE_XML));
writeLimeObject(0,1,_userFile,_userFile.SerialisableClassName(),std::string(SCIDAC_FILE_XML));
}
}
////////////////////////////////////////////////
// Write generic lattice field in scidac format
////////////////////////////////////////////////
template <class vobj, class userRecord>
void writeScidacFieldRecord(Lattice<vobj> &field,userRecord _userRecord,
const unsigned int recordScientificPrec = 0)
{
GridBase * grid = field._grid;
////////////////////////////////////////
// fill the Grid header
////////////////////////////////////////
FieldMetaData header;
scidacRecord _scidacRecord;
scidacFile _scidacFile;
ScidacMetaData(field,header,_scidacRecord,_scidacFile);
//////////////////////////////////////////////
// Fill the Lime file record by record
//////////////////////////////////////////////
if ( this->boss_node ) {
writeLimeObject(1,0,header ,std::string("FieldMetaData"),std::string(GRID_FORMAT)); // Open message
writeLimeObject(0,0,_userRecord,_userRecord.SerialisableClassName(),std::string(SCIDAC_RECORD_XML), recordScientificPrec);
writeLimeObject(0,0,_scidacRecord,_scidacRecord.SerialisableClassName(),std::string(SCIDAC_PRIVATE_RECORD_XML));
}
// Collective call
writeLimeLatticeBinaryObject(field,std::string(ILDG_BINARY_DATA)); // Closes message with checksum
}
};
class ScidacReader : public GridLimeReader {
public:
template<class SerialisableUserFile>
void readScidacFileRecord(GridBase *grid,SerialisableUserFile &_userFile)
{
scidacFile _scidacFile(grid);
readLimeObject(_scidacFile,_scidacFile.SerialisableClassName(),std::string(SCIDAC_PRIVATE_FILE_XML));
readLimeObject(_userFile,_userFile.SerialisableClassName(),std::string(SCIDAC_FILE_XML));
}
////////////////////////////////////////////////
// Write generic lattice field in scidac format
////////////////////////////////////////////////
template <class vobj, class userRecord>
void readScidacFieldRecord(Lattice<vobj> &field,userRecord &_userRecord)
{
typedef typename vobj::scalar_object sobj;
GridBase * grid = field._grid;
////////////////////////////////////////
// fill the Grid header
////////////////////////////////////////
FieldMetaData header;
scidacRecord _scidacRecord;
scidacFile _scidacFile;
//////////////////////////////////////////////
// Fill the Lime file record by record
//////////////////////////////////////////////
readLimeObject(header ,std::string("FieldMetaData"),std::string(GRID_FORMAT)); // Open message
readLimeObject(_userRecord,_userRecord.SerialisableClassName(),std::string(SCIDAC_RECORD_XML));
readLimeObject(_scidacRecord,_scidacRecord.SerialisableClassName(),std::string(SCIDAC_PRIVATE_RECORD_XML));
readLimeLatticeBinaryObject(field,std::string(ILDG_BINARY_DATA));
}
void skipPastBinaryRecord(void) {
std::string rec_name(ILDG_BINARY_DATA);
while ( limeReaderNextRecord(LimeR) == LIME_SUCCESS ) {
if ( !strncmp(limeReaderType(LimeR), rec_name.c_str(),strlen(rec_name.c_str()) ) ) {
skipPastObjectRecord(std::string(SCIDAC_CHECKSUM));
return;
}
}
}
void skipPastObjectRecord(std::string rec_name) {
while ( limeReaderNextRecord(LimeR) == LIME_SUCCESS ) {
if ( !strncmp(limeReaderType(LimeR), rec_name.c_str(),strlen(rec_name.c_str()) ) ) {
return;
}
}
}
void skipScidacFieldRecord() {
skipPastObjectRecord(std::string(GRID_FORMAT));
skipPastObjectRecord(std::string(SCIDAC_RECORD_XML));
skipPastObjectRecord(std::string(SCIDAC_PRIVATE_RECORD_XML));
skipPastBinaryRecord();
}
};
class IldgWriter : public ScidacWriter {
public:
IldgWriter(bool isboss) : ScidacWriter(isboss) {};
///////////////////////////////////
// A little helper
///////////////////////////////////
void writeLimeIldgLFN(std::string &LFN)
{
uint64_t PayloadSize = LFN.size();
int err;
createLimeRecordHeader(ILDG_DATA_LFN, 0 , 0, PayloadSize);
err=limeWriteRecordData(const_cast<char*>(LFN.c_str()), &PayloadSize,LimeW); assert(err>=0);
err=limeWriterCloseRecord(LimeW); assert(err>=0);
}
////////////////////////////////////////////////////////////////
// Special ILDG operations ; gauge configs only.
// Don't require scidac records EXCEPT checksum
// Use Grid MetaData object if present.
////////////////////////////////////////////////////////////////
template <class vsimd>
void writeConfiguration(Lattice<iLorentzColourMatrix<vsimd> > &Umu,int sequence,std::string LFN,std::string description)
{
GridBase * grid = Umu._grid;
typedef Lattice<iLorentzColourMatrix<vsimd> > GaugeField;
typedef iLorentzColourMatrix<vsimd> vobj;
typedef typename vobj::scalar_object sobj;
////////////////////////////////////////
// fill the Grid header
////////////////////////////////////////
FieldMetaData header;
scidacRecord _scidacRecord;
scidacFile _scidacFile;
ScidacMetaData(Umu,header,_scidacRecord,_scidacFile);
std::string format = header.floating_point;
header.ensemble_id = description;
header.ensemble_label = description;
header.sequence_number = sequence;
header.ildg_lfn = LFN;
assert ( (format == std::string("IEEE32BIG"))
||(format == std::string("IEEE64BIG")) );
//////////////////////////////////////////////////////
// Fill ILDG header data struct
//////////////////////////////////////////////////////
ildgFormat ildgfmt ;
ildgfmt.field = std::string("su3gauge");
if ( format == std::string("IEEE32BIG") ) {
ildgfmt.precision = 32;
} else {
ildgfmt.precision = 64;
}
ildgfmt.version = 1.0;
ildgfmt.lx = header.dimension[0];
ildgfmt.ly = header.dimension[1];
ildgfmt.lz = header.dimension[2];
ildgfmt.lt = header.dimension[3];
assert(header.nd==4);
assert(header.nd==header.dimension.size());
//////////////////////////////////////////////////////////////////////////////
// Fill the USQCD info field
//////////////////////////////////////////////////////////////////////////////
usqcdInfo info;
info.version=1.0;
info.plaq = header.plaquette;
info.linktr = header.link_trace;
std::cout << GridLogMessage << " Writing config; IldgIO "<<std::endl;
//////////////////////////////////////////////
// Fill the Lime file record by record
//////////////////////////////////////////////
writeLimeObject(1,0,header ,std::string("FieldMetaData"),std::string(GRID_FORMAT)); // Open message
writeLimeObject(0,0,_scidacFile,_scidacFile.SerialisableClassName(),std::string(SCIDAC_PRIVATE_FILE_XML));
writeLimeObject(0,1,info,info.SerialisableClassName(),std::string(SCIDAC_FILE_XML));
writeLimeObject(1,0,_scidacRecord,_scidacRecord.SerialisableClassName(),std::string(SCIDAC_PRIVATE_RECORD_XML));
writeLimeObject(0,0,info,info.SerialisableClassName(),std::string(SCIDAC_RECORD_XML));
writeLimeObject(0,0,ildgfmt,std::string("ildgFormat") ,std::string(ILDG_FORMAT)); // rec
writeLimeIldgLFN(header.ildg_lfn); // rec
writeLimeLatticeBinaryObject(Umu,std::string(ILDG_BINARY_DATA)); // Closes message with checksum
// limeDestroyWriter(LimeW);
}
};
class IldgReader : public GridLimeReader {
public:
////////////////////////////////////////////////////////////////
// Read either Grid/SciDAC/ILDG configuration
// Don't require scidac records EXCEPT checksum
// Use Grid MetaData object if present.
// Else use ILDG MetaData object if present.
// Else use SciDAC MetaData object if present.
////////////////////////////////////////////////////////////////
template <class vsimd>
void readConfiguration(Lattice<iLorentzColourMatrix<vsimd> > &Umu, FieldMetaData &FieldMetaData_) {
typedef Lattice<iLorentzColourMatrix<vsimd> > GaugeField;
typedef typename GaugeField::vector_object vobj;
typedef typename vobj::scalar_object sobj;
typedef LorentzColourMatrixF fobj;
typedef LorentzColourMatrixD dobj;
GridBase *grid = Umu._grid;
std::vector<int> dims = Umu._grid->FullDimensions();
assert(dims.size()==4);
// Metadata holders
ildgFormat ildgFormat_ ;
std::string ildgLFN_ ;
scidacChecksum scidacChecksum_;
usqcdInfo usqcdInfo_ ;
// track what we read from file
int found_ildgFormat =0;
int found_ildgLFN =0;
int found_scidacChecksum=0;
int found_usqcdInfo =0;
int found_ildgBinary =0;
int found_FieldMetaData =0;
uint32_t nersc_csum;
uint32_t scidac_csuma;
uint32_t scidac_csumb;
// Binary format
std::string format;
//////////////////////////////////////////////////////////////////////////
// Loop over all records
// -- Order is poorly guaranteed except ILDG header preceeds binary section.
// -- Run like an event loop.
// -- Impose trust hierarchy. Grid takes precedence & look for ILDG, and failing
// that Scidac.
// -- Insist on Scidac checksum record.
//////////////////////////////////////////////////////////////////////////
while ( limeReaderNextRecord(LimeR) == LIME_SUCCESS ) {
uint64_t nbytes = limeReaderBytes(LimeR);//size of this record (configuration)
//////////////////////////////////////////////////////////////////
// If not BINARY_DATA read a string and parse
//////////////////////////////////////////////////////////////////
if ( strncmp(limeReaderType(LimeR), ILDG_BINARY_DATA,strlen(ILDG_BINARY_DATA) ) ) {
// Copy out the string
std::vector<char> xmlc(nbytes+1,'\0');
limeReaderReadData((void *)&xmlc[0], &nbytes, LimeR);
// std::cout << GridLogMessage<< "Non binary record :" <<limeReaderType(LimeR) <<std::endl; //<<"\n"<<(&xmlc[0])<<std::endl;
//////////////////////////////////
// ILDG format record
std::string xmlstring(&xmlc[0]);
if ( !strncmp(limeReaderType(LimeR), ILDG_FORMAT,strlen(ILDG_FORMAT)) ) {
XmlReader RD(xmlstring, true, "");
read(RD,"ildgFormat",ildgFormat_);
if ( ildgFormat_.precision == 64 ) format = std::string("IEEE64BIG");
if ( ildgFormat_.precision == 32 ) format = std::string("IEEE32BIG");
assert( ildgFormat_.lx == dims[0]);
assert( ildgFormat_.ly == dims[1]);
assert( ildgFormat_.lz == dims[2]);
assert( ildgFormat_.lt == dims[3]);
found_ildgFormat = 1;
}
if ( !strncmp(limeReaderType(LimeR), ILDG_DATA_LFN,strlen(ILDG_DATA_LFN)) ) {
FieldMetaData_.ildg_lfn = xmlstring;
found_ildgLFN = 1;
}
if ( !strncmp(limeReaderType(LimeR), GRID_FORMAT,strlen(ILDG_FORMAT)) ) {
XmlReader RD(xmlstring, true, "");
read(RD,"FieldMetaData",FieldMetaData_);
format = FieldMetaData_.floating_point;
assert(FieldMetaData_.dimension[0] == dims[0]);
assert(FieldMetaData_.dimension[1] == dims[1]);
assert(FieldMetaData_.dimension[2] == dims[2]);
assert(FieldMetaData_.dimension[3] == dims[3]);
found_FieldMetaData = 1;
}
if ( !strncmp(limeReaderType(LimeR), SCIDAC_RECORD_XML,strlen(SCIDAC_RECORD_XML)) ) {
// is it a USQCD info field
if ( xmlstring.find(std::string("usqcdInfo")) != std::string::npos ) {
// std::cout << GridLogMessage<<"...found a usqcdInfo field"<<std::endl;
XmlReader RD(xmlstring, true, "");
read(RD,"usqcdInfo",usqcdInfo_);
found_usqcdInfo = 1;
}
}
if ( !strncmp(limeReaderType(LimeR), SCIDAC_CHECKSUM,strlen(SCIDAC_CHECKSUM)) ) {
XmlReader RD(xmlstring, true, "");
read(RD,"scidacChecksum",scidacChecksum_);
found_scidacChecksum = 1;
}
} else {
/////////////////////////////////
// Binary data
/////////////////////////////////
std::cout << GridLogMessage << "ILDG Binary record found : " ILDG_BINARY_DATA << std::endl;
uint64_t offset= ftello(File);
if ( format == std::string("IEEE64BIG") ) {
GaugeSimpleMunger<dobj, sobj> munge;
BinaryIO::readLatticeObject< vobj, dobj >(Umu, filename, munge, offset, format,nersc_csum,scidac_csuma,scidac_csumb);
} else {
GaugeSimpleMunger<fobj, sobj> munge;
BinaryIO::readLatticeObject< vobj, fobj >(Umu, filename, munge, offset, format,nersc_csum,scidac_csuma,scidac_csumb);
}
found_ildgBinary = 1;
}
}
//////////////////////////////////////////////////////
// Minimally must find binary segment and checksum
// Since this is an ILDG reader require ILDG format
//////////////////////////////////////////////////////
assert(found_ildgBinary);
assert(found_ildgFormat);
assert(found_scidacChecksum);
// Must find something with the lattice dimensions
assert(found_FieldMetaData||found_ildgFormat);
if ( found_FieldMetaData ) {
std::cout << GridLogMessage<<"Grid MetaData was record found: configuration was probably written by Grid ! Yay ! "<<std::endl;
} else {
assert(found_ildgFormat);
assert ( ildgFormat_.field == std::string("su3gauge") );
///////////////////////////////////////////////////////////////////////////////////////
// Populate our Grid metadata as best we can
///////////////////////////////////////////////////////////////////////////////////////
std::ostringstream vers; vers << ildgFormat_.version;
FieldMetaData_.hdr_version = vers.str();
FieldMetaData_.data_type = std::string("4D_SU3_GAUGE_3X3");
FieldMetaData_.nd=4;
FieldMetaData_.dimension.resize(4);
FieldMetaData_.dimension[0] = ildgFormat_.lx ;
FieldMetaData_.dimension[1] = ildgFormat_.ly ;
FieldMetaData_.dimension[2] = ildgFormat_.lz ;
FieldMetaData_.dimension[3] = ildgFormat_.lt ;
if ( found_usqcdInfo ) {
FieldMetaData_.plaquette = usqcdInfo_.plaq;
FieldMetaData_.link_trace= usqcdInfo_.linktr;
std::cout << GridLogMessage <<"This configuration was probably written by USQCD "<<std::endl;
std::cout << GridLogMessage <<"USQCD xml record Plaquette : "<<FieldMetaData_.plaquette<<std::endl;
std::cout << GridLogMessage <<"USQCD xml record LinkTrace : "<<FieldMetaData_.link_trace<<std::endl;
} else {
FieldMetaData_.plaquette = 0.0;
FieldMetaData_.link_trace= 0.0;
std::cout << GridLogWarning << "This configuration is unsafe with no plaquette records that can verify it !!! "<<std::endl;
}
}
////////////////////////////////////////////////////////////
// Really really want to mandate a scidac checksum
////////////////////////////////////////////////////////////
if ( found_scidacChecksum ) {
FieldMetaData_.scidac_checksuma = stoull(scidacChecksum_.suma,0,16);
FieldMetaData_.scidac_checksumb = stoull(scidacChecksum_.sumb,0,16);
scidacChecksumVerify(scidacChecksum_,scidac_csuma,scidac_csumb);
assert( scidac_csuma ==FieldMetaData_.scidac_checksuma);
assert( scidac_csumb ==FieldMetaData_.scidac_checksumb);
std::cout << GridLogMessage<<"SciDAC checksums match " << std::endl;
} else {
std::cout << GridLogWarning<<"SciDAC checksums not found. This is unsafe. " << std::endl;
assert(0); // Can I insist always checksum ?
}
if ( found_FieldMetaData || found_usqcdInfo ) {
FieldMetaData checker;
GaugeStatistics(Umu,checker);
assert(fabs(checker.plaquette - FieldMetaData_.plaquette )<1.0e-5);
assert(fabs(checker.link_trace - FieldMetaData_.link_trace)<1.0e-5);
std::cout << GridLogMessage<<"Plaquette and link trace match " << std::endl;
}
}
};
}}
//HAVE_LIME
#endif
#endif
+237
View File
@@ -0,0 +1,237 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/parallelIO/IldgIO.h
Copyright (C) 2015
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution
directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_ILDGTYPES_IO_H
#define GRID_ILDGTYPES_IO_H
#ifdef HAVE_LIME
extern "C" { // for linkage
#include "lime.h"
}
namespace Grid {
/////////////////////////////////////////////////////////////////////////////////
// Data representation of records that enter ILDG and SciDac formats
/////////////////////////////////////////////////////////////////////////////////
#define GRID_FORMAT "grid-format"
#define ILDG_FORMAT "ildg-format"
#define ILDG_BINARY_DATA "ildg-binary-data"
#define ILDG_DATA_LFN "ildg-data-lfn"
#define SCIDAC_CHECKSUM "scidac-checksum"
#define SCIDAC_PRIVATE_FILE_XML "scidac-private-file-xml"
#define SCIDAC_FILE_XML "scidac-file-xml"
#define SCIDAC_PRIVATE_RECORD_XML "scidac-private-record-xml"
#define SCIDAC_RECORD_XML "scidac-record-xml"
#define SCIDAC_BINARY_DATA "scidac-binary-data"
// Unused SCIDAC records names; could move to support this functionality
#define SCIDAC_SITELIST "scidac-sitelist"
////////////////////////////////////////////////////////////
const int GRID_IO_SINGLEFILE = 0; // hardcode lift from QIO compat
const int GRID_IO_MULTIFILE = 1; // hardcode lift from QIO compat
const int GRID_IO_FIELD = 0; // hardcode lift from QIO compat
const int GRID_IO_GLOBAL = 1; // hardcode lift from QIO compat
////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////////
// QIO uses mandatory "private" records fixed format
// Private is in principle "opaque" however it can't be changed now because that would break existing
// file compatability, so should be correct to assume the undocumented but defacto file structure.
/////////////////////////////////////////////////////////////////////////////////
struct emptyUserRecord : Serializable {
GRID_SERIALIZABLE_CLASS_MEMBERS(emptyUserRecord,int,dummy);
emptyUserRecord() { dummy=0; };
};
////////////////////////
// Scidac private file xml
// <?xml version="1.0" encoding="UTF-8"?><scidacFile><version>1.1</version><spacetime>4</spacetime><dims>16 16 16 32 </dims><volfmt>0</volfmt></scidacFile>
////////////////////////
struct scidacFile : Serializable {
public:
GRID_SERIALIZABLE_CLASS_MEMBERS(scidacFile,
double, version,
int, spacetime,
std::string, dims, // must convert to int
int, volfmt);
std::vector<int> getDimensions(void) {
std::stringstream stream(dims);
std::vector<int> dimensions;
int n;
while(stream >> n){
dimensions.push_back(n);
}
return dimensions;
}
void setDimensions(std::vector<int> dimensions) {
char delimiter = ' ';
std::stringstream stream;
for(int i=0;i<dimensions.size();i++){
stream << dimensions[i];
if ( i != dimensions.size()-1) {
stream << delimiter <<std::endl;
}
}
dims = stream.str();
}
// Constructor provides Grid
scidacFile() =default; // default constructor
scidacFile(GridBase * grid){
version = 1.0;
spacetime = grid->_ndimension;
setDimensions(grid->FullDimensions());
volfmt = GRID_IO_SINGLEFILE;
}
};
///////////////////////////////////////////////////////////////////////
// scidac-private-record-xml : example
// <scidacRecord>
// <version>1.1</version><date>Tue Jul 26 21:14:44 2011 UTC</date><recordtype>0</recordtype>
// <datatype>QDP_D3_ColorMatrix</datatype><precision>D</precision><colors>3</colors><spins>4</spins>
// <typesize>144</typesize><datacount>4</datacount>
// </scidacRecord>
///////////////////////////////////////////////////////////////////////
struct scidacRecord : Serializable {
public:
GRID_SERIALIZABLE_CLASS_MEMBERS(scidacRecord,
double, version,
std::string, date,
int, recordtype,
std::string, datatype,
std::string, precision,
int, colors,
int, spins,
int, typesize,
int, datacount);
scidacRecord()
: version(1.0), recordtype(0), colors(0), spins(0), typesize(0), datacount(0)
{}
};
////////////////////////
// ILDG format
////////////////////////
struct ildgFormat : Serializable {
public:
GRID_SERIALIZABLE_CLASS_MEMBERS(ildgFormat,
double, version,
std::string, field,
int, precision,
int, lx,
int, ly,
int, lz,
int, lt);
ildgFormat() { version=1.0; };
};
////////////////////////
// USQCD info
////////////////////////
struct usqcdInfo : Serializable {
public:
GRID_SERIALIZABLE_CLASS_MEMBERS(usqcdInfo,
double, version,
double, plaq,
double, linktr,
std::string, info);
usqcdInfo() {
version=1.0;
};
};
////////////////////////
// Scidac Checksum
////////////////////////
struct scidacChecksum : Serializable {
public:
GRID_SERIALIZABLE_CLASS_MEMBERS(scidacChecksum,
double, version,
std::string, suma,
std::string, sumb);
scidacChecksum() {
version=1.0;
};
};
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Type: scidac-file-xml <title>MILC ILDG archival gauge configuration</title>
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Type:
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////
// Scidac private file xml
// <?xml version="1.0" encoding="UTF-8"?><scidacFile><version>1.1</version><spacetime>4</spacetime><dims>16 16 16 32 </dims><volfmt>0</volfmt></scidacFile>
////////////////////////
#if 0
////////////////////////////////////////////////////////////////////////////////////////
// From http://www.physics.utah.edu/~detar/scidac/qio_2p3.pdf
////////////////////////////////////////////////////////////////////////////////////////
struct usqcdPropFile : Serializable {
public:
GRID_SERIALIZABLE_CLASS_MEMBERS(usqcdPropFile,
double, version,
std::string, type,
std::string, info);
usqcdPropFile() {
version=1.0;
};
};
struct usqcdSourceInfo : Serializable {
public:
GRID_SERIALIZABLE_CLASS_MEMBERS(usqcdSourceInfo,
double, version,
std::string, info);
usqcdSourceInfo() {
version=1.0;
};
};
struct usqcdPropInfo : Serializable {
public:
GRID_SERIALIZABLE_CLASS_MEMBERS(usqcdPropInfo,
double, version,
int, spin,
int, color,
std::string, info);
usqcdPropInfo() {
version=1.0;
};
};
#endif
}
#endif
#endif
+327
View File
@@ -0,0 +1,327 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/parallelIO/NerscIO.h
Copyright (C) 2015
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <algorithm>
#include <iostream>
#include <iomanip>
#include <fstream>
#include <map>
#include <unistd.h>
#include <sys/utsname.h>
#include <pwd.h>
namespace Grid {
///////////////////////////////////////////////////////
// Precision mapping
///////////////////////////////////////////////////////
template<class vobj> static std::string getFormatString (void)
{
std::string format;
typedef typename getPrecision<vobj>::real_scalar_type stype;
if ( sizeof(stype) == sizeof(float) ) {
format = std::string("IEEE32BIG");
}
if ( sizeof(stype) == sizeof(double) ) {
format = std::string("IEEE64BIG");
}
return format;
}
////////////////////////////////////////////////////////////////////////////////
// header specification/interpretation
////////////////////////////////////////////////////////////////////////////////
class FieldMetaData : Serializable {
public:
GRID_SERIALIZABLE_CLASS_MEMBERS(FieldMetaData,
int, nd,
std::vector<int>, dimension,
std::vector<std::string>, boundary,
int, data_start,
std::string, hdr_version,
std::string, storage_format,
double, link_trace,
double, plaquette,
uint32_t, checksum,
uint32_t, scidac_checksuma,
uint32_t, scidac_checksumb,
unsigned int, sequence_number,
std::string, data_type,
std::string, ensemble_id,
std::string, ensemble_label,
std::string, ildg_lfn,
std::string, creator,
std::string, creator_hardware,
std::string, creation_date,
std::string, archive_date,
std::string, floating_point);
// WARNING: non-initialised values might lead to twisted parallel IO
// issues, std::string are fine because they initliase to size 0
// as per C++ standard.
FieldMetaData(void)
: nd(4), dimension(4,0), boundary(4, ""), data_start(0),
link_trace(0.), plaquette(0.), checksum(0),
scidac_checksuma(0), scidac_checksumb(0), sequence_number(0)
{}
};
namespace QCD {
using namespace Grid;
//////////////////////////////////////////////////////////////////////
// Bit and Physical Checksumming and QA of data
//////////////////////////////////////////////////////////////////////
inline void GridMetaData(GridBase *grid,FieldMetaData &header)
{
int nd = grid->_ndimension;
header.nd = nd;
header.dimension.resize(nd);
header.boundary.resize(nd);
header.data_start = 0;
for(int d=0;d<nd;d++) {
header.dimension[d] = grid->_fdimensions[d];
}
for(int d=0;d<nd;d++) {
header.boundary[d] = std::string("PERIODIC");
}
}
inline void MachineCharacteristics(FieldMetaData &header)
{
// Who
struct passwd *pw = getpwuid (getuid());
if (pw) header.creator = std::string(pw->pw_name);
// When
std::time_t t = std::time(nullptr);
std::tm tm_ = *std::localtime(&t);
std::ostringstream oss;
// oss << std::put_time(&tm_, "%c %Z");
header.creation_date = oss.str();
header.archive_date = header.creation_date;
// What
struct utsname name; uname(&name);
header.creator_hardware = std::string(name.nodename)+"-";
header.creator_hardware+= std::string(name.machine)+"-";
header.creator_hardware+= std::string(name.sysname)+"-";
header.creator_hardware+= std::string(name.release);
}
#define dump_meta_data(field, s) \
s << "BEGIN_HEADER" << std::endl; \
s << "HDR_VERSION = " << field.hdr_version << std::endl; \
s << "DATATYPE = " << field.data_type << std::endl; \
s << "STORAGE_FORMAT = " << field.storage_format << std::endl; \
for(int i=0;i<4;i++){ \
s << "DIMENSION_" << i+1 << " = " << field.dimension[i] << std::endl ; \
} \
s << "LINK_TRACE = " << std::setprecision(10) << field.link_trace << std::endl; \
s << "PLAQUETTE = " << std::setprecision(10) << field.plaquette << std::endl; \
for(int i=0;i<4;i++){ \
s << "BOUNDARY_"<<i+1<<" = " << field.boundary[i] << std::endl; \
} \
\
s << "CHECKSUM = "<< std::hex << std::setw(10) << field.checksum << std::dec<<std::endl; \
s << "SCIDAC_CHECKSUMA = "<< std::hex << std::setw(10) << field.scidac_checksuma << std::dec<<std::endl; \
s << "SCIDAC_CHECKSUMB = "<< std::hex << std::setw(10) << field.scidac_checksumb << std::dec<<std::endl; \
s << "ENSEMBLE_ID = " << field.ensemble_id << std::endl; \
s << "ENSEMBLE_LABEL = " << field.ensemble_label << std::endl; \
s << "SEQUENCE_NUMBER = " << field.sequence_number << std::endl; \
s << "CREATOR = " << field.creator << std::endl; \
s << "CREATOR_HARDWARE = "<< field.creator_hardware << std::endl; \
s << "CREATION_DATE = " << field.creation_date << std::endl; \
s << "ARCHIVE_DATE = " << field.archive_date << std::endl; \
s << "FLOATING_POINT = " << field.floating_point << std::endl; \
s << "END_HEADER" << std::endl;
template<class vobj> inline void PrepareMetaData(Lattice<vobj> & field, FieldMetaData &header)
{
GridBase *grid = field._grid;
std::string format = getFormatString<vobj>();
header.floating_point = format;
header.checksum = 0x0; // Nersc checksum unused in ILDG, Scidac
GridMetaData(grid,header);
MachineCharacteristics(header);
}
inline void GaugeStatistics(Lattice<vLorentzColourMatrixF> & data,FieldMetaData &header)
{
// How to convert data precision etc...
header.link_trace=Grid::QCD::WilsonLoops<PeriodicGimplF>::linkTrace(data);
header.plaquette =Grid::QCD::WilsonLoops<PeriodicGimplF>::avgPlaquette(data);
}
inline void GaugeStatistics(Lattice<vLorentzColourMatrixD> & data,FieldMetaData &header)
{
// How to convert data precision etc...
header.link_trace=Grid::QCD::WilsonLoops<PeriodicGimplD>::linkTrace(data);
header.plaquette =Grid::QCD::WilsonLoops<PeriodicGimplD>::avgPlaquette(data);
}
template<> inline void PrepareMetaData<vLorentzColourMatrixF>(Lattice<vLorentzColourMatrixF> & field, FieldMetaData &header)
{
GridBase *grid = field._grid;
std::string format = getFormatString<vLorentzColourMatrixF>();
header.floating_point = format;
header.checksum = 0x0; // Nersc checksum unused in ILDG, Scidac
GridMetaData(grid,header);
GaugeStatistics(field,header);
MachineCharacteristics(header);
}
template<> inline void PrepareMetaData<vLorentzColourMatrixD>(Lattice<vLorentzColourMatrixD> & field, FieldMetaData &header)
{
GridBase *grid = field._grid;
std::string format = getFormatString<vLorentzColourMatrixD>();
header.floating_point = format;
header.checksum = 0x0; // Nersc checksum unused in ILDG, Scidac
GridMetaData(grid,header);
GaugeStatistics(field,header);
MachineCharacteristics(header);
}
//////////////////////////////////////////////////////////////////////
// Utilities ; these are QCD aware
//////////////////////////////////////////////////////////////////////
inline void reconstruct3(LorentzColourMatrix & cm)
{
const int x=0;
const int y=1;
const int z=2;
for(int mu=0;mu<Nd;mu++){
cm(mu)()(2,x) = adj(cm(mu)()(0,y)*cm(mu)()(1,z)-cm(mu)()(0,z)*cm(mu)()(1,y)); //x= yz-zy
cm(mu)()(2,y) = adj(cm(mu)()(0,z)*cm(mu)()(1,x)-cm(mu)()(0,x)*cm(mu)()(1,z)); //y= zx-xz
cm(mu)()(2,z) = adj(cm(mu)()(0,x)*cm(mu)()(1,y)-cm(mu)()(0,y)*cm(mu)()(1,x)); //z= xy-yx
}
}
////////////////////////////////////////////////////////////////////////////////
// Some data types for intermediate storage
////////////////////////////////////////////////////////////////////////////////
template<typename vtype> using iLorentzColour2x3 = iVector<iVector<iVector<vtype, Nc>, 2>, Nd >;
typedef iLorentzColour2x3<Complex> LorentzColour2x3;
typedef iLorentzColour2x3<ComplexF> LorentzColour2x3F;
typedef iLorentzColour2x3<ComplexD> LorentzColour2x3D;
/////////////////////////////////////////////////////////////////////////////////
// Simple classes for precision conversion
/////////////////////////////////////////////////////////////////////////////////
template <class fobj, class sobj>
struct BinarySimpleUnmunger {
typedef typename getPrecision<fobj>::real_scalar_type fobj_stype;
typedef typename getPrecision<sobj>::real_scalar_type sobj_stype;
void operator()(sobj &in, fobj &out) {
// take word by word and transform accoding to the status
fobj_stype *out_buffer = (fobj_stype *)&out;
sobj_stype *in_buffer = (sobj_stype *)&in;
size_t fobj_words = sizeof(out) / sizeof(fobj_stype);
size_t sobj_words = sizeof(in) / sizeof(sobj_stype);
assert(fobj_words == sobj_words);
for (unsigned int word = 0; word < sobj_words; word++)
out_buffer[word] = in_buffer[word]; // type conversion on the fly
}
};
template <class fobj, class sobj>
struct BinarySimpleMunger {
typedef typename getPrecision<fobj>::real_scalar_type fobj_stype;
typedef typename getPrecision<sobj>::real_scalar_type sobj_stype;
void operator()(fobj &in, sobj &out) {
// take word by word and transform accoding to the status
fobj_stype *in_buffer = (fobj_stype *)&in;
sobj_stype *out_buffer = (sobj_stype *)&out;
size_t fobj_words = sizeof(in) / sizeof(fobj_stype);
size_t sobj_words = sizeof(out) / sizeof(sobj_stype);
assert(fobj_words == sobj_words);
for (unsigned int word = 0; word < sobj_words; word++)
out_buffer[word] = in_buffer[word]; // type conversion on the fly
}
};
template<class fobj,class sobj>
struct GaugeSimpleMunger{
void operator()(fobj &in, sobj &out) {
for (int mu = 0; mu < Nd; mu++) {
for (int i = 0; i < Nc; i++) {
for (int j = 0; j < Nc; j++) {
out(mu)()(i, j) = in(mu)()(i, j);
}}
}
};
};
template <class fobj, class sobj>
struct GaugeSimpleUnmunger {
void operator()(sobj &in, fobj &out) {
for (int mu = 0; mu < Nd; mu++) {
for (int i = 0; i < Nc; i++) {
for (int j = 0; j < Nc; j++) {
out(mu)()(i, j) = in(mu)()(i, j);
}}
}
};
};
template<class fobj,class sobj>
struct Gauge3x2munger{
void operator() (fobj &in,sobj &out){
for(int mu=0;mu<Nd;mu++){
for(int i=0;i<2;i++){
for(int j=0;j<3;j++){
out(mu)()(i,j) = in(mu)(i)(j);
}}
}
reconstruct3(out);
}
};
template<class fobj,class sobj>
struct Gauge3x2unmunger{
void operator() (sobj &in,fobj &out){
for(int mu=0;mu<Nd;mu++){
for(int i=0;i<2;i++){
for(int j=0;j<3;j++){
out(mu)(i)(j) = in(mu)()(i,j);
}}
}
}
};
}
}
+363
View File
@@ -0,0 +1,363 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/parallelIO/NerscIO.h
Copyright (C) 2015
Author: Matt Spraggs <matthew.spraggs@gmail.com>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_NERSC_IO_H
#define GRID_NERSC_IO_H
namespace Grid {
namespace QCD {
using namespace Grid;
////////////////////////////////////////////////////////////////////////////////
// Write and read from fstream; comput header offset for payload
////////////////////////////////////////////////////////////////////////////////
class NerscIO : public BinaryIO {
public:
static inline void truncate(std::string file){
std::ofstream fout(file,std::ios::out);
}
static inline unsigned int writeHeader(FieldMetaData &field,std::string file)
{
std::ofstream fout(file,std::ios::out|std::ios::in);
fout.seekp(0,std::ios::beg);
dump_meta_data(field, fout);
field.data_start = fout.tellp();
return field.data_start;
}
// for the header-reader
static inline int readHeader(std::string file,GridBase *grid, FieldMetaData &field)
{
uint64_t offset=0;
std::map<std::string,std::string> header;
std::string line;
//////////////////////////////////////////////////
// read the header
//////////////////////////////////////////////////
std::ifstream fin(file);
getline(fin,line); // read one line and insist is
removeWhitespace(line);
std::cout << GridLogMessage << "* " << line << std::endl;
assert(line==std::string("BEGIN_HEADER"));
do {
getline(fin,line); // read one line
std::cout << GridLogMessage << "* "<<line<< std::endl;
int eq = line.find("=");
if(eq >0) {
std::string key=line.substr(0,eq);
std::string val=line.substr(eq+1);
removeWhitespace(key);
removeWhitespace(val);
header[key] = val;
}
} while( line.find("END_HEADER") == std::string::npos );
field.data_start = fin.tellg();
//////////////////////////////////////////////////
// chomp the values
//////////////////////////////////////////////////
field.hdr_version = header["HDR_VERSION"];
field.data_type = header["DATATYPE"];
field.storage_format = header["STORAGE_FORMAT"];
field.dimension[0] = std::stol(header["DIMENSION_1"]);
field.dimension[1] = std::stol(header["DIMENSION_2"]);
field.dimension[2] = std::stol(header["DIMENSION_3"]);
field.dimension[3] = std::stol(header["DIMENSION_4"]);
assert(grid->_ndimension == 4);
for(int d=0;d<4;d++){
assert(grid->_fdimensions[d]==field.dimension[d]);
}
field.link_trace = std::stod(header["LINK_TRACE"]);
field.plaquette = std::stod(header["PLAQUETTE"]);
field.boundary[0] = header["BOUNDARY_1"];
field.boundary[1] = header["BOUNDARY_2"];
field.boundary[2] = header["BOUNDARY_3"];
field.boundary[3] = header["BOUNDARY_4"];
field.checksum = std::stoul(header["CHECKSUM"],0,16);
field.ensemble_id = header["ENSEMBLE_ID"];
field.ensemble_label = header["ENSEMBLE_LABEL"];
field.sequence_number = std::stol(header["SEQUENCE_NUMBER"]);
field.creator = header["CREATOR"];
field.creator_hardware = header["CREATOR_HARDWARE"];
field.creation_date = header["CREATION_DATE"];
field.archive_date = header["ARCHIVE_DATE"];
field.floating_point = header["FLOATING_POINT"];
return field.data_start;
}
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Now the meat: the object readers
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
template<class vsimd>
static inline void readConfiguration(Lattice<iLorentzColourMatrix<vsimd> > &Umu,
FieldMetaData& header,
std::string file)
{
typedef Lattice<iLorentzColourMatrix<vsimd> > GaugeField;
GridBase *grid = Umu._grid;
uint64_t offset = readHeader(file,Umu._grid,header);
FieldMetaData clone(header);
std::string format(header.floating_point);
int ieee32big = (format == std::string("IEEE32BIG"));
int ieee32 = (format == std::string("IEEE32"));
int ieee64big = (format == std::string("IEEE64BIG"));
int ieee64 = (format == std::string("IEEE64"));
uint32_t nersc_csum,scidac_csuma,scidac_csumb;
// depending on datatype, set up munger;
// munger is a function of <floating point, Real, data_type>
if ( header.data_type == std::string("4D_SU3_GAUGE") ) {
if ( ieee32 || ieee32big ) {
BinaryIO::readLatticeObject<iLorentzColourMatrix<vsimd>, LorentzColour2x3F>
(Umu,file,Gauge3x2munger<LorentzColour2x3F,LorentzColourMatrix>(), offset,format,
nersc_csum,scidac_csuma,scidac_csumb);
}
if ( ieee64 || ieee64big ) {
BinaryIO::readLatticeObject<iLorentzColourMatrix<vsimd>, LorentzColour2x3D>
(Umu,file,Gauge3x2munger<LorentzColour2x3D,LorentzColourMatrix>(),offset,format,
nersc_csum,scidac_csuma,scidac_csumb);
}
} else if ( header.data_type == std::string("4D_SU3_GAUGE_3x3") ) {
if ( ieee32 || ieee32big ) {
BinaryIO::readLatticeObject<iLorentzColourMatrix<vsimd>,LorentzColourMatrixF>
(Umu,file,GaugeSimpleMunger<LorentzColourMatrixF,LorentzColourMatrix>(),offset,format,
nersc_csum,scidac_csuma,scidac_csumb);
}
if ( ieee64 || ieee64big ) {
BinaryIO::readLatticeObject<iLorentzColourMatrix<vsimd>,LorentzColourMatrixD>
(Umu,file,GaugeSimpleMunger<LorentzColourMatrixD,LorentzColourMatrix>(),offset,format,
nersc_csum,scidac_csuma,scidac_csumb);
}
} else {
assert(0);
}
GaugeStatistics(Umu,clone);
std::cout<<GridLogMessage <<"NERSC Configuration "<<file<<" checksum "<<std::hex<<nersc_csum<< std::dec
<<" header "<<std::hex<<header.checksum<<std::dec <<std::endl;
std::cout<<GridLogMessage <<"NERSC Configuration "<<file<<" plaquette "<<clone.plaquette
<<" header "<<header.plaquette<<std::endl;
std::cout<<GridLogMessage <<"NERSC Configuration "<<file<<" link_trace "<<clone.link_trace
<<" header "<<header.link_trace<<std::endl;
if ( fabs(clone.plaquette -header.plaquette ) >= 1.0e-5 ) {
std::cout << " Plaquette mismatch "<<std::endl;
std::cout << Umu[0]<<std::endl;
std::cout << Umu[1]<<std::endl;
}
if ( nersc_csum != header.checksum ) {
std::cerr << " checksum mismatch " << std::endl;
std::cerr << " plaqs " << clone.plaquette << " " << header.plaquette << std::endl;
std::cerr << " trace " << clone.link_trace<< " " << header.link_trace<< std::endl;
std::cerr << " nersc_csum " <<std::hex<< nersc_csum << " " << header.checksum<< std::dec<< std::endl;
exit(0);
}
assert(fabs(clone.plaquette -header.plaquette ) < 1.0e-5 );
assert(fabs(clone.link_trace-header.link_trace) < 1.0e-6 );
assert(nersc_csum == header.checksum );
std::cout<<GridLogMessage <<"NERSC Configuration "<<file<< " and plaquette, link trace, and checksum agree"<<std::endl;
}
template<class vsimd>
static inline void writeConfiguration(Lattice<iLorentzColourMatrix<vsimd> > &Umu,
std::string file,
int two_row,
int bits32)
{
typedef Lattice<iLorentzColourMatrix<vsimd> > GaugeField;
typedef iLorentzColourMatrix<vsimd> vobj;
typedef typename vobj::scalar_object sobj;
FieldMetaData header;
///////////////////////////////////////////
// Following should become arguments
///////////////////////////////////////////
header.sequence_number = 1;
header.ensemble_id = "UKQCD";
header.ensemble_label = "DWF";
typedef LorentzColourMatrixD fobj3D;
typedef LorentzColour2x3D fobj2D;
GridBase *grid = Umu._grid;
GridMetaData(grid,header);
assert(header.nd==4);
GaugeStatistics(Umu,header);
MachineCharacteristics(header);
uint64_t offset;
// Sod it -- always write 3x3 double
header.floating_point = std::string("IEEE64BIG");
header.data_type = std::string("4D_SU3_GAUGE_3x3");
GaugeSimpleUnmunger<fobj3D,sobj> munge;
if ( grid->IsBoss() ) {
truncate(file);
offset = writeHeader(header,file);
}
grid->Broadcast(0,(void *)&offset,sizeof(offset));
uint32_t nersc_csum,scidac_csuma,scidac_csumb;
BinaryIO::writeLatticeObject<vobj,fobj3D>(Umu,file,munge,offset,header.floating_point,
nersc_csum,scidac_csuma,scidac_csumb);
header.checksum = nersc_csum;
if ( grid->IsBoss() ) {
writeHeader(header,file);
}
std::cout<<GridLogMessage <<"Written NERSC Configuration on "<< file << " checksum "
<<std::hex<<header.checksum
<<std::dec<<" plaq "<< header.plaquette <<std::endl;
}
///////////////////////////////
// RNG state
///////////////////////////////
static inline void writeRNGState(GridSerialRNG &serial,GridParallelRNG &parallel,std::string file)
{
typedef typename GridParallelRNG::RngStateType RngStateType;
// Following should become arguments
FieldMetaData header;
header.sequence_number = 1;
header.ensemble_id = "UKQCD";
header.ensemble_label = "DWF";
GridBase *grid = parallel._grid;
GridMetaData(grid,header);
assert(header.nd==4);
header.link_trace=0.0;
header.plaquette=0.0;
MachineCharacteristics(header);
uint64_t offset;
#ifdef RNG_RANLUX
header.floating_point = std::string("UINT64");
header.data_type = std::string("RANLUX48");
#endif
#ifdef RNG_MT19937
header.floating_point = std::string("UINT32");
header.data_type = std::string("MT19937");
#endif
#ifdef RNG_SITMO
header.floating_point = std::string("UINT64");
header.data_type = std::string("SITMO");
#endif
if ( grid->IsBoss() ) {
truncate(file);
offset = writeHeader(header,file);
}
grid->Broadcast(0,(void *)&offset,sizeof(offset));
uint32_t nersc_csum,scidac_csuma,scidac_csumb;
BinaryIO::writeRNG(serial,parallel,file,offset,nersc_csum,scidac_csuma,scidac_csumb);
header.checksum = nersc_csum;
if ( grid->IsBoss() ) {
offset = writeHeader(header,file);
}
std::cout<<GridLogMessage
<<"Written NERSC RNG STATE "<<file<< " checksum "
<<std::hex<<header.checksum
<<std::dec<<std::endl;
}
static inline void readRNGState(GridSerialRNG &serial,GridParallelRNG & parallel,FieldMetaData& header,std::string file)
{
typedef typename GridParallelRNG::RngStateType RngStateType;
GridBase *grid = parallel._grid;
uint64_t offset = readHeader(file,grid,header);
FieldMetaData clone(header);
std::string format(header.floating_point);
std::string data_type(header.data_type);
#ifdef RNG_RANLUX
assert(format == std::string("UINT64"));
assert(data_type == std::string("RANLUX48"));
#endif
#ifdef RNG_MT19937
assert(format == std::string("UINT32"));
assert(data_type == std::string("MT19937"));
#endif
#ifdef RNG_SITMO
assert(format == std::string("UINT64"));
assert(data_type == std::string("SITMO"));
#endif
// depending on datatype, set up munger;
// munger is a function of <floating point, Real, data_type>
uint32_t nersc_csum,scidac_csuma,scidac_csumb;
BinaryIO::readRNG(serial,parallel,file,offset,nersc_csum,scidac_csuma,scidac_csumb);
if ( nersc_csum != header.checksum ) {
std::cerr << "checksum mismatch "<<std::hex<< nersc_csum <<" "<<header.checksum<<std::dec<<std::endl;
exit(0);
}
assert(nersc_csum == header.checksum );
std::cout<<GridLogMessage <<"Read NERSC RNG file "<<file<< " format "<< data_type <<std::endl;
}
};
}}
#endif
+75
View File
@@ -0,0 +1,75 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/PerfCount.cc
Copyright (C) 2015
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#include <Grid/GridCore.h>
#include <Grid/perfmon/PerfCount.h>
namespace Grid {
#define CacheControl(L,O,R) ((PERF_COUNT_HW_CACHE_##L)|(PERF_COUNT_HW_CACHE_OP_##O<<8)| (PERF_COUNT_HW_CACHE_RESULT_##R<<16))
#define RawConfig(A,B) (A<<8|B)
const PerformanceCounter::PerformanceCounterConfig PerformanceCounter::PerformanceCounterConfigs [] = {
#ifdef __linux__
{ PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_REFERENCES , "CACHE_REFERENCES..." , INSTRUCTIONS},
{ PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES , "CACHE_MISSES......." , CACHE_REFERENCES},
{ PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES , "CPUCYCLES.........." , INSTRUCTIONS},
{ PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS , "INSTRUCTIONS......." , CPUCYCLES },
// 4
#ifdef KNL
{ PERF_TYPE_RAW, RawConfig(0x40,0x04), "ALL_LOADS..........", CPUCYCLES },
{ PERF_TYPE_RAW, RawConfig(0x01,0x04), "L1_MISS_LOADS......", L1D_READ_ACCESS },
{ PERF_TYPE_RAW, RawConfig(0x40,0x04), "ALL_LOADS..........", L1D_READ_ACCESS },
{ PERF_TYPE_RAW, RawConfig(0x02,0x04), "L2_HIT_LOADS.......", L1D_READ_ACCESS },
{ PERF_TYPE_RAW, RawConfig(0x04,0x04), "L2_MISS_LOADS......", L1D_READ_ACCESS },
{ PERF_TYPE_RAW, RawConfig(0x10,0x04), "UTLB_MISS_LOADS....", L1D_READ_ACCESS },
{ PERF_TYPE_RAW, RawConfig(0x08,0x04), "DTLB_MISS_LOADS....", L1D_READ_ACCESS },
// 11
#else
{ PERF_TYPE_HW_CACHE, CacheControl(L1D,READ,ACCESS) , "L1D_READ_ACCESS....",INSTRUCTIONS},
{ PERF_TYPE_HW_CACHE, CacheControl(L1D,READ,MISS) , "L1D_READ_MISS......",L1D_READ_ACCESS},
{ PERF_TYPE_HW_CACHE, CacheControl(L1D,WRITE,MISS) , "L1D_WRITE_MISS.....",L1D_READ_ACCESS},
{ PERF_TYPE_HW_CACHE, CacheControl(L1D,WRITE,ACCESS) , "L1D_WRITE_ACCESS...",L1D_READ_ACCESS},
{ PERF_TYPE_HW_CACHE, CacheControl(L1D,PREFETCH,MISS) , "L1D_PREFETCH_MISS..",L1D_READ_ACCESS},
{ PERF_TYPE_HW_CACHE, CacheControl(L1D,PREFETCH,ACCESS) , "L1D_PREFETCH_ACCESS",L1D_READ_ACCESS},
{ PERF_TYPE_HW_CACHE, CacheControl(L1D,PREFETCH,ACCESS) , "L1D_PREFETCH_ACCESS",L1D_READ_ACCESS},
// 11
#endif
{ PERF_TYPE_HW_CACHE, CacheControl(LL,READ,MISS) , "LL_READ_MISS.......",L1D_READ_ACCESS},
{ PERF_TYPE_HW_CACHE, CacheControl(LL,READ,ACCESS) , "LL_READ_ACCESS.....",L1D_READ_ACCESS},
{ PERF_TYPE_HW_CACHE, CacheControl(LL,WRITE,MISS) , "LL_WRITE_MISS......",L1D_READ_ACCESS},
{ PERF_TYPE_HW_CACHE, CacheControl(LL,WRITE,ACCESS) , "LL_WRITE_ACCESS....",L1D_READ_ACCESS},
//15
{ PERF_TYPE_HW_CACHE, CacheControl(LL,PREFETCH,MISS) , "LL_PREFETCH_MISS...",L1D_READ_ACCESS},
{ PERF_TYPE_HW_CACHE, CacheControl(LL,PREFETCH,ACCESS) , "LL_PREFETCH_ACCESS.",L1D_READ_ACCESS},
{ PERF_TYPE_HW_CACHE, CacheControl(L1I,READ,MISS) , "L1I_READ_MISS......",INSTRUCTIONS},
{ PERF_TYPE_HW_CACHE, CacheControl(L1I,READ,ACCESS) , "L1I_READ_ACCESS....",INSTRUCTIONS}
//19
// { PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, "STALL_CYCLES" },
#endif
};
}
+245
View File
@@ -0,0 +1,245 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/PerfCount.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <peterboyle@MacBook-Pro.local>
Author: paboyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_PERFCOUNT_H
#define GRID_PERFCOUNT_H
#include <sys/time.h>
#include <ctime>
#include <chrono>
#include <string.h>
#include <unistd.h>
#include <sys/ioctl.h>
#ifdef __linux__
#include <syscall.h>
#include <linux/perf_event.h>
#else
#include <sys/syscall.h>
#endif
#ifdef __x86_64__
#include <x86intrin.h>
#endif
namespace Grid {
#ifdef __linux__
static long perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
int cpu, int group_fd, unsigned long flags)
{
int ret=0;
ret = syscall(__NR_perf_event_open, hw_event, pid, cpu,
group_fd, flags);
return ret;
}
#endif
#ifdef TIMERS_OFF
inline uint64_t cyclecount(void){
return 0;
}
#define __SSC_MARK(mark) __asm__ __volatile__ ("movl %0, %%ebx; .byte 0x64, 0x67, 0x90 " ::"i"(mark):"%ebx")
#define __SSC_STOP __SSC_MARK(0x110)
#define __SSC_START __SSC_MARK(0x111)
#else
#define __SSC_MARK(mark)
#define __SSC_STOP
#define __SSC_START
/*
* cycle counters arch dependent
*/
#ifdef __bgq__
inline uint64_t cyclecount(void){
uint64_t tmp;
asm volatile ("mfspr %0,0x10C" : "=&r" (tmp) );
return tmp;
}
#elif defined __x86_64__
inline uint64_t cyclecount(void){
return __rdtsc();
// unsigned int dummy;
// return __rdtscp(&dummy);
}
#else
inline uint64_t cyclecount(void){
return 0;
}
#endif
#endif
class PerformanceCounter {
private:
typedef struct {
public:
uint32_t type;
uint64_t config;
const char *name;
int normalisation;
} PerformanceCounterConfig;
static const PerformanceCounterConfig PerformanceCounterConfigs [];
public:
enum PerformanceCounterType {
CACHE_REFERENCES=0,
CACHE_MISSES=1,
CPUCYCLES=2,
INSTRUCTIONS=3,
L1D_READ_ACCESS=4,
PERFORMANCE_COUNTER_NUM_TYPES=19
};
public:
int PCT;
long long count;
long long cycles;
int fd;
int cyclefd;
unsigned long long elapsed;
uint64_t begin;
static int NumTypes(void){
return PERFORMANCE_COUNTER_NUM_TYPES;
}
PerformanceCounter(int _pct) {
#ifdef __linux__
assert(_pct>=0);
assert(_pct<PERFORMANCE_COUNTER_NUM_TYPES);
fd=-1;
cyclefd=-1;
count=0;
cycles=0;
PCT =_pct;
Open();
#endif
}
void Open(void)
{
#ifdef __linux__
struct perf_event_attr pe;
memset(&pe, 0, sizeof(struct perf_event_attr));
pe.size = sizeof(struct perf_event_attr);
pe.disabled = 1;
pe.exclude_kernel = 1;
pe.exclude_hv = 1;
pe.inherit = 1;
pe.type = PerformanceCounterConfigs[PCT].type;
pe.config= PerformanceCounterConfigs[PCT].config;
const char * name = PerformanceCounterConfigs[PCT].name;
fd = perf_event_open(&pe, 0, -1, -1, 0); // pid 0, cpu -1 current process any cpu. group -1
if (fd == -1) {
fprintf(stderr, "Error opening leader %llx for event %s\n",(long long) pe.config,name);
perror("Error is");
}
int norm = PerformanceCounterConfigs[PCT].normalisation;
pe.type = PerformanceCounterConfigs[norm].type;
pe.config= PerformanceCounterConfigs[norm].config;
name = PerformanceCounterConfigs[norm].name;
cyclefd = perf_event_open(&pe, 0, -1, -1, 0); // pid 0, cpu -1 current process any cpu. group -1
if (cyclefd == -1) {
fprintf(stderr, "Error opening leader %llx for event %s\n",(long long) pe.config,name);
perror("Error is");
}
#endif
}
void Start(void)
{
#ifdef __linux__
if ( fd!= -1) {
::ioctl(fd, PERF_EVENT_IOC_RESET, 0);
::ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
::ioctl(cyclefd, PERF_EVENT_IOC_RESET, 0);
::ioctl(cyclefd, PERF_EVENT_IOC_ENABLE, 0);
}
begin =cyclecount();
#else
begin = 0;
#endif
}
void Stop(void) {
count=0;
cycles=0;
#ifdef __linux__
ssize_t ign;
if ( fd!= -1) {
::ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
::ioctl(cyclefd, PERF_EVENT_IOC_DISABLE, 0);
ign=::read(fd, &count, sizeof(long long));
ign+=::read(cyclefd, &cycles, sizeof(long long));
assert(ign=2*sizeof(long long));
}
elapsed = cyclecount() - begin;
#else
elapsed = 0;
#endif
}
void Report(void) {
#ifdef __linux__
int N = PerformanceCounterConfigs[PCT].normalisation;
const char * sn = PerformanceCounterConfigs[N].name ;
const char * sc = PerformanceCounterConfigs[PCT].name;
std::printf("tsc = %llu %s = %llu %s = %20llu\n (%s/%s) rate = %lf\n", elapsed,sn ,cycles,
sc, count, sc,sn, (double)count/(double)cycles);
#else
std::printf("%llu cycles \n", elapsed );
#endif
}
~PerformanceCounter()
{
#ifdef __linux__
::close(fd); ::close(cyclefd);
#endif
}
};
}
#endif
+245
View File
@@ -0,0 +1,245 @@
#include <Grid/GridCore.h>
#include <Grid/perfmon/PerfCount.h>
#include <Grid/perfmon/Stat.h>
namespace Grid {
bool PmuStat::pmu_initialized=false;
void PmuStat::init(const char *regname)
{
#ifdef __x86_64__
name = regname;
if (!pmu_initialized)
{
std::cout<<"initialising pmu"<<std::endl;
pmu_initialized = true;
pmu_init();
}
clear();
#endif
}
void PmuStat::clear(void)
{
#ifdef __x86_64__
count = 0;
tregion = 0;
pmc0 = 0;
pmc1 = 0;
inst = 0;
cyc = 0;
ref = 0;
tcycles = 0;
reads = 0;
writes = 0;
#endif
}
void PmuStat::print(void)
{
#ifdef __x86_64__
std::cout <<"Reg "<<std::string(name)<<":\n";
std::cout <<" region "<<tregion<<std::endl;
std::cout <<" cycles "<<tcycles<<std::endl;
std::cout <<" inst "<<inst <<std::endl;
std::cout <<" cyc "<<cyc <<std::endl;
std::cout <<" ref "<<ref <<std::endl;
std::cout <<" pmc0 "<<pmc0 <<std::endl;
std::cout <<" pmc1 "<<pmc1 <<std::endl;
std::cout <<" count "<<count <<std::endl;
std::cout <<" reads "<<reads <<std::endl;
std::cout <<" writes "<<writes <<std::endl;
#endif
}
void PmuStat::start(void)
{
#ifdef __x86_64__
pmu_start();
++count;
xmemctrs(&mrstart, &mwstart);
tstart = __rdtsc();
#endif
}
void PmuStat::enter(int t)
{
#ifdef __x86_64__
counters[0][t] = __rdpmc(0);
counters[1][t] = __rdpmc(1);
counters[2][t] = __rdpmc((1<<30)|0);
counters[3][t] = __rdpmc((1<<30)|1);
counters[4][t] = __rdpmc((1<<30)|2);
counters[5][t] = __rdtsc();
#endif
}
void PmuStat::exit(int t)
{
#ifdef __x86_64__
counters[0][t] = __rdpmc(0) - counters[0][t];
counters[1][t] = __rdpmc(1) - counters[1][t];
counters[2][t] = __rdpmc((1<<30)|0) - counters[2][t];
counters[3][t] = __rdpmc((1<<30)|1) - counters[3][t];
counters[4][t] = __rdpmc((1<<30)|2) - counters[4][t];
counters[5][t] = __rdtsc() - counters[5][t];
#endif
}
void PmuStat::accum(int nthreads)
{
#ifdef __x86_64__
tend = __rdtsc();
xmemctrs(&mrend, &mwend);
pmu_stop();
for (int t = 0; t < nthreads; ++t) {
pmc0 += counters[0][t];
pmc1 += counters[1][t];
inst += counters[2][t];
cyc += counters[3][t];
ref += counters[4][t];
tcycles += counters[5][t];
}
uint64_t region = tend - tstart;
tregion += region;
uint64_t mreads = mrend - mrstart;
reads += mreads;
uint64_t mwrites = mwend - mwstart;
writes += mwrites;
#endif
}
void PmuStat::pmu_fini(void) {}
void PmuStat::pmu_start(void) {};
void PmuStat::pmu_stop(void) {};
void PmuStat::pmu_init(void)
{
#ifdef _KNIGHTS_LANDING_
KNLsetup();
#endif
}
void PmuStat::xmemctrs(uint64_t *mr, uint64_t *mw)
{
#ifdef _KNIGHTS_LANDING_
ctrs c;
KNLreadctrs(c);
uint64_t emr = 0, emw = 0;
for (int i = 0; i < NEDC; ++i)
{
emr += c.edcrd[i];
emw += c.edcwr[i];
}
*mr = emr;
*mw = emw;
#else
*mr = *mw = 0;
#endif
}
#ifdef _KNIGHTS_LANDING_
struct knl_gbl_ PmuStat::gbl;
#define PMU_MEM
void PmuStat::KNLevsetup(const char *ename, int &fd, int event, int umask)
{
char fname[1024];
snprintf(fname, sizeof(fname), "%s/type", ename);
FILE *fp = fopen(fname, "r");
if (fp == 0) {
::printf("open %s", fname);
::exit(0);
}
int type;
int ret = fscanf(fp, "%d", &type);
assert(ret == 1);
fclose(fp);
// std::cout << "Using PMU type "<<type<<" from " << std::string(ename) <<std::endl;
struct perf_event_attr hw = {};
hw.size = sizeof(hw);
hw.type = type;
// see /sys/devices/uncore_*/format/*
// All of the events we are interested in are configured the same way, but
// that isn't always true. Proper code would parse the format files
hw.config = event | (umask << 8);
//hw.read_format = PERF_FORMAT_GROUP;
// unfortunately the above only works within a single PMU; might
// as well just read them one at a time
int cpu = 0;
fd = perf_event_open(&hw, -1, cpu, -1, 0);
if (fd == -1) {
::printf("CPU %d, box %s, event 0x%lx", cpu, ename, hw.config);
::exit(0);
} else {
// std::cout << "event "<<std::string(ename)<<" set up for fd "<<fd<<" hw.config "<<hw.config <<std::endl;
}
}
void PmuStat::KNLsetup(void){
int ret;
char fname[1024];
// MC RPQ inserts and WPQ inserts (reads & writes)
for (int mc = 0; mc < NMC; ++mc)
{
::snprintf(fname, sizeof(fname), "/sys/devices/uncore_imc_%d",mc);
// RPQ Inserts
KNLevsetup(fname, gbl.mc_rd[mc], 0x1, 0x1);
// WPQ Inserts
KNLevsetup(fname, gbl.mc_wr[mc], 0x2, 0x1);
}
// EDC RPQ inserts and WPQ inserts
for (int edc=0; edc < NEDC; ++edc)
{
::snprintf(fname, sizeof(fname), "/sys/devices/uncore_edc_eclk_%d",edc);
// RPQ inserts
KNLevsetup(fname, gbl.edc_rd[edc], 0x1, 0x1);
// WPQ inserts
KNLevsetup(fname, gbl.edc_wr[edc], 0x2, 0x1);
}
// EDC HitE, HitM, MissE, MissM
for (int edc=0; edc < NEDC; ++edc)
{
::snprintf(fname, sizeof(fname), "/sys/devices/uncore_edc_uclk_%d", edc);
KNLevsetup(fname, gbl.edc_hite[edc], 0x2, 0x1);
KNLevsetup(fname, gbl.edc_hitm[edc], 0x2, 0x2);
KNLevsetup(fname, gbl.edc_misse[edc], 0x2, 0x4);
KNLevsetup(fname, gbl.edc_missm[edc], 0x2, 0x8);
}
}
uint64_t PmuStat::KNLreadctr(int fd)
{
uint64_t data;
size_t s = ::read(fd, &data, sizeof(data));
if (s != sizeof(uint64_t)){
::printf("read counter %lu", s);
::exit(0);
}
return data;
}
void PmuStat::KNLreadctrs(ctrs &c)
{
for (int i = 0; i < NMC; ++i)
{
c.mcrd[i] = KNLreadctr(gbl.mc_rd[i]);
c.mcwr[i] = KNLreadctr(gbl.mc_wr[i]);
}
for (int i = 0; i < NEDC; ++i)
{
c.edcrd[i] = KNLreadctr(gbl.edc_rd[i]);
c.edcwr[i] = KNLreadctr(gbl.edc_wr[i]);
}
for (int i = 0; i < NEDC; ++i)
{
c.edchite[i] = KNLreadctr(gbl.edc_hite[i]);
c.edchitm[i] = KNLreadctr(gbl.edc_hitm[i]);
c.edcmisse[i] = KNLreadctr(gbl.edc_misse[i]);
c.edcmissm[i] = KNLreadctr(gbl.edc_missm[i]);
}
}
#endif
}
+104
View File
@@ -0,0 +1,104 @@
#ifndef _GRID_STAT_H
#define _GRID_STAT_H
#ifdef AVX512
#define _KNIGHTS_LANDING_ROOTONLY
#endif
namespace Grid {
///////////////////////////////////////////////////////////////////////////////
// Extra KNL counters from MCDRAM
///////////////////////////////////////////////////////////////////////////////
#ifdef _KNIGHTS_LANDING_
#define NMC 6
#define NEDC 8
struct ctrs
{
uint64_t mcrd[NMC];
uint64_t mcwr[NMC];
uint64_t edcrd[NEDC];
uint64_t edcwr[NEDC];
uint64_t edchite[NEDC];
uint64_t edchitm[NEDC];
uint64_t edcmisse[NEDC];
uint64_t edcmissm[NEDC];
};
// Peter/Azusa:
// Our modification of a code provided by Larry Meadows from Intel
// Verified by email exchange non-NDA, ok for github. Should be as uses /sys/devices/ FS
// so is already public and in the linux kernel for KNL.
struct knl_gbl_
{
int mc_rd[NMC];
int mc_wr[NMC];
int edc_rd[NEDC];
int edc_wr[NEDC];
int edc_hite[NEDC];
int edc_hitm[NEDC];
int edc_misse[NEDC];
int edc_missm[NEDC];
};
#endif
///////////////////////////////////////////////////////////////////////////////
class PmuStat
{
uint64_t counters[8][256];
#ifdef _KNIGHTS_LANDING_
static struct knl_gbl_ gbl;
#endif
const char *name;
uint64_t reads; // memory reads
uint64_t writes; // memory writes
uint64_t mrstart; // memory read counter at start of parallel region
uint64_t mrend; // memory read counter at end of parallel region
uint64_t mwstart; // memory write counter at start of parallel region
uint64_t mwend; // memory write counter at end of parallel region
// cumulative counters
uint64_t count; // number of invocations
uint64_t tregion; // total time in parallel region (from thread 0)
uint64_t tcycles; // total cycles inside parallel region
uint64_t inst, ref, cyc; // fixed counters
uint64_t pmc0, pmc1;// pmu
// add memory counters here
// temp variables
uint64_t tstart; // tsc at start of parallel region
uint64_t tend; // tsc at end of parallel region
// map for ctrs values
// 0 pmc0 start
// 1 pmc0 end
// 2 pmc1 start
// 3 pmc1 end
// 4 tsc start
// 5 tsc end
static bool pmu_initialized;
public:
static bool is_init(void){ return pmu_initialized;}
static void pmu_init(void);
static void pmu_fini(void);
static void pmu_start(void);
static void pmu_stop(void);
void accum(int nthreads);
static void xmemctrs(uint64_t *mr, uint64_t *mw);
void start(void);
void enter(int t);
void exit(int t);
void print(void);
void init(const char *regname);
void clear(void);
#ifdef _KNIGHTS_LANDING_
static void KNLsetup(void);
static uint64_t KNLreadctr(int fd);
static void KNLreadctrs(ctrs &c);
static void KNLevsetup(const char *ename, int &fd, int event, int umask);
#endif
};
}
#endif
+111
View File
@@ -0,0 +1,111 @@
/*************************************************************************************
Grid physics library, www.github.com/paboyle/Grid
Source file: ./lib/Timer.h
Copyright (C) 2015
Author: Azusa Yamaguchi <ayamaguc@staffmail.ed.ac.uk>
Author: Peter Boyle <paboyle@ph.ed.ac.uk>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
See the full license in the file "LICENSE" in the top level distribution directory
*************************************************************************************/
/* END LEGAL */
#ifndef GRID_TIME_H
#define GRID_TIME_H
#include <sys/time.h>
#include <ctime>
#include <chrono>
namespace Grid {
// Dress the output; use std::chrono
// C++11 time facilities better?
inline double usecond(void) {
struct timeval tv;
#ifdef TIMERS_ON
gettimeofday(&tv,NULL);
#endif
return 1.0*tv.tv_usec + 1.0e6*tv.tv_sec;
}
typedef std::chrono::system_clock GridClock;
typedef std::chrono::time_point<GridClock> GridTimePoint;
typedef std::chrono::milliseconds GridMillisecs;
typedef std::chrono::microseconds GridTime;
typedef std::chrono::microseconds GridUsecs;
inline std::ostream& operator<< (std::ostream & stream, const std::chrono::milliseconds & time)
{
stream << time.count()<<" ms";
return stream;
}
inline std::ostream& operator<< (std::ostream & stream, const std::chrono::microseconds & time)
{
stream << time.count()<<" usec";
return stream;
}
class GridStopWatch {
private:
bool running;
GridTimePoint start;
GridUsecs accumulator;
public:
GridStopWatch () {
Reset();
}
void Start(void) {
assert(running == false);
#ifdef TIMERS_ON
start = GridClock::now();
#endif
running = true;
}
void Stop(void) {
assert(running == true);
#ifdef TIMERS_ON
accumulator+= std::chrono::duration_cast<GridUsecs>(GridClock::now()-start);
#endif
running = false;
};
void Reset(void){
running = false;
#ifdef TIMERS_ON
start = GridClock::now();
#endif
accumulator = std::chrono::duration_cast<GridUsecs>(start-start);
}
GridTime Elapsed(void) {
assert(running == false);
return std::chrono::duration_cast<GridTime>( accumulator );
}
uint64_t useconds(void){
assert(running == false);
return (uint64_t) accumulator.count();
}
bool isRunning(void){
return running;
}
};
}
#endif
+74
View File
@@ -0,0 +1,74 @@
/**
* pugixml parser - version 1.9
* --------------------------------------------------------
* Copyright (C) 2006-2018, by Arseny Kapoulkine (arseny.kapoulkine@gmail.com)
* Report bugs and download new versions at http://pugixml.org/
*
* This library is distributed under the MIT License. See notice at the end
* of this file.
*
* This work is based on the pugxml parser, which is:
* Copyright (C) 2003, by Kristen Wegner (kristen@tima.net)
*/
#ifndef HEADER_PUGICONFIG_HPP
#define HEADER_PUGICONFIG_HPP
// Uncomment this to enable wchar_t mode
// #define PUGIXML_WCHAR_MODE
// Uncomment this to enable compact mode
// #define PUGIXML_COMPACT
// Uncomment this to disable XPath
// #define PUGIXML_NO_XPATH
// Uncomment this to disable STL
// #define PUGIXML_NO_STL
// Uncomment this to disable exceptions
// #define PUGIXML_NO_EXCEPTIONS
// Set this to control attributes for public classes/functions, i.e.:
// #define PUGIXML_API __declspec(dllexport) // to export all public symbols from DLL
// #define PUGIXML_CLASS __declspec(dllimport) // to import all classes from DLL
// #define PUGIXML_FUNCTION __fastcall // to set calling conventions to all public functions to fastcall
// In absence of PUGIXML_CLASS/PUGIXML_FUNCTION definitions PUGIXML_API is used instead
// Tune these constants to adjust memory-related behavior
// #define PUGIXML_MEMORY_PAGE_SIZE 32768
// #define PUGIXML_MEMORY_OUTPUT_STACK 10240
// #define PUGIXML_MEMORY_XPATH_PAGE_SIZE 4096
// Uncomment this to switch to header-only version
// #define PUGIXML_HEADER_ONLY
// Uncomment this to enable long long support
// #define PUGIXML_HAS_LONG_LONG
#endif
/**
* Copyright (c) 2006-2018 Arseny Kapoulkine
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More