1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-09-20 17:25:37 +01:00
Commit Graph

1273 Commits

Author SHA1 Message Date
Peter Boyle
5c90c3b457 Merge pull request #34 from aportelli/master
Polymorphic lattices & various small updates
2016-05-24 10:50:04 +01:00
91e04056f9 fix of the empty SIMD 2016-05-12 19:24:10 +01:00
3789e3f31c additional fixed in slice functions 2016-05-12 18:35:38 +01:00
0c66719210 const fix in slice functions 2016-05-12 13:01:35 +01:00
paboyle
3a5b5c8bec Save an old tar of tree 2016-05-12 03:20:17 -07:00
4bc21ec7cb thread CL argument fix 2016-05-11 15:21:29 +01:00
e3083b6dfc Merge commit 'ab894186589224d570e0ecef8eea06443194a8ab' 2016-05-11 15:20:41 +01:00
paboyle
ab89418658 Precision change going in; useful for mixed precision algorithms for example. 2016-05-11 15:18:47 +01:00
paboyle
28cd99882c Subslicing 2016-05-11 15:06:54 +01:00
paboyle
aceaee774c ExtractSlice / InsertSlice for lower dimensional lattices where the lattice is not
distributed in the orthogonal direction.
Useful for fermion 4d/5d etc..
2016-05-11 14:12:02 +01:00
101aa769eb LatticeBase contain the grid pointer and a virtual destructor to allow polymorphic lattice pointers 2016-05-04 12:15:31 -07:00
0bf99bfde5 log polish 2016-05-04 12:14:49 -07:00
64bf6fe54e macro to dump NERSC header to a stream 2016-05-04 12:14:38 -07:00
1161d566b9 minor code cleaning 2016-05-02 19:32:11 -07:00
c698b16d75 function to generate Chroma-style gamma matrix products 2016-05-01 18:30:35 -07:00
c4c89336fe SliceSum: shutting down warning about non-threaded code for now 2016-05-01 18:29:57 -07:00
fa59789580 ConjugateGradient: cleaner output 2016-05-01 18:29:20 -07:00
92c2c7d3b5 SchurRedBlackDiagMooeeSolve: fix: guess was not initialised from input 2016-05-01 16:07:55 -07:00
e99ce0875f directly exit when using '--help' option 2016-05-01 16:05:16 -07:00
paboyle
c23375cd65 Testing travis CI integration 2016-04-30 06:30:56 -07:00
paboyle
f7ca6ca889 Bernoulli reenabled -- using integral type for the discrete_distribution, but
then casts in the fill
2016-04-30 03:48:28 -07:00
paboyle
ec4a9b7f6c The Bernoulli gives a no compile due to a static assertion that the type be integral
in 4.7 random.h

Probably need to go through an Integer type, and then conver to real after the random draw
to make clean.
2016-04-30 03:42:24 -07:00
paboyle
5341977948 IMCI fixes. Thought I had committed these. The "real" disambiguation
between std::real and Grid::real shouldn't have been necessary and I don't
know why only the icpc v16.0 on babbage hits it.
May need a longer term rename of Grid::real or some careful EnableIf work.
2016-04-30 03:34:16 -07:00
f6c53e5039 Merge commit '1e554350acae0e67fa7177ed0db9d4f684a54af2' 2016-04-30 00:17:52 -07:00
ba09cbae3e function to read std::vector from a string (blank separated values) 2016-04-30 00:15:44 -07:00
6aa000176f Fermion <-> Propagator functions 2016-04-30 00:14:33 -07:00
23b6172c31 Bernoulli RNG 2016-04-30 00:14:13 -07:00
3f128443ab OS X icpc fix 2016-04-30 00:13:33 -07:00
paboyle
1e554350ac The threaded coms didn't agree with GCC. Suprised, and looks like GCC bug. 2016-04-29 16:49:18 -07:00
paboyle
c79ea0dcef Fixingn IMCI 2016-04-22 21:52:54 -07:00
paboyle
e3f141f82f Fixed SSE compile with typecasts 2016-04-22 10:30:30 -07:00
paboyle
a6dfa2386b GCC choked on intrinsics calls that ICPC did not 2016-04-22 06:33:41 -07:00
Peter Boyle
d9b5e66877 Update Make.inc 2016-04-20 18:25:48 +01:00
paboyle
8fd8bc25e9 simd 5th dim with rotation 2016-04-19 15:39:00 -07:00
paboyle
ba427abde9 simd 5d 2016-04-19 15:38:39 -07:00
paboyle
9b6ab6db16 simd in 5th dimension support 2016-04-19 15:38:01 -07:00
paboyle
806a83d38b simd in fifth dim support for dwf 2016-04-19 15:36:19 -07:00
paboyle
7223753355 Rotate in a direction > 2 for simd_layout 2016-04-19 15:35:15 -07:00
paboyle
b27bac4669 Updates for simd in one dir 2016-04-19 15:34:10 -07:00
paboyle
c8a93d6a93 Cartesian changes to allow all simd in one direction 2016-04-19 15:18:12 -07:00
paboyle
04072a5e1f Rotate is a temporary hack. Would like to merge ALL
permutes as rotates of length 2, and make any rotate active
over any subset of lane bits. This is hard, and requires general
permute; current intrinsics mean this is only really possible for specific
case by case encodings as presently performed. Intel could produce a general
permute.. would help. IBM did it in VMX.
2016-04-19 15:15:34 -07:00
paboyle
574ea4f843 const safety 2016-04-19 15:15:11 -07:00
paboyle
587f80cd93 Updated to compile and pass under intel SDE 2016-04-19 15:13:54 -07:00
paboyle
528eb773ad Merged.
Merge branch 'master' of https://github.com/paboyle/Grid
2016-04-19 22:24:34 +01:00
paboyle
e5657510b0 Rotate support for Ls simd-ized 2016-04-19 22:24:18 +01:00
paboyle
f473919526 Rotate support 2016-04-19 22:23:51 +01:00
Christopher Kelly
ab56ccdd25 -Complete and working implementation of Grid_empty 2016-04-15 13:17:42 -04:00
neo
339be37dba Debugging smeared HMC 2016-04-13 17:00:14 +09:00
neo
a87b744621 HMC runs but does not accept with smearing on 2016-04-07 16:45:11 +09:00
Christopher Kelly
a646260e82 Merge remote-tracking branch 'origin/master' into ckelly-dec12-2015 2016-04-06 13:57:28 -04:00
Christopher Kelly
af9c8d1372 -Checkerboard fixes for Lanczos 2016-04-06 13:50:56 -04:00
paboyle
b1192a8908 Benchmark_zmm added 2016-04-06 03:00:07 -07:00
paboyle
e8dddb1596 Adding extra benchmark 2016-04-06 10:32:54 +01:00
97d0d56bcb Debugging Smearing routines (set_fj) 2016-04-06 17:58:43 +09:00
paboyle
c7ba47bdc7 Merge branch 'master' of https://github.com/paboyle/Grid 2016-04-06 02:56:28 +01:00
7c7ea35ffb Putting the Traceless Antihermitian part outside the deriv in pseudofermion actions 2016-04-05 16:28:09 +09:00
4b1cf580e0 Debugging the Smearing routines 2016-04-05 16:19:30 +09:00
paboyle
e67fc2be18 Adding a trial for openmp overhead minimisation 2016-03-31 16:00:37 +01:00
paboyle
f473ef7591 Fixing the compile 2016-03-31 07:47:42 -07:00
paboyle
8052556275 Cleaning up the single/double kernel implementation switch 2016-03-31 14:51:32 +01:00
paboyle
60d965f79e AVX512 improvements; sigfpe trapping too 2016-03-30 08:42:34 +01:00
paboyle
83b15bfcdd Better Avx512 assembly sequence for SU3 using fmaddsub to get the imag imag sign 2016-03-30 08:39:39 +01:00
paboyle
1ecbf9794d Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-30 08:37:55 +01:00
paboyle
2ded354403 configure 2016-03-30 00:17:43 -07:00
paboyle
340428a1fe Eigen fixes and HDCR work 2016-03-30 00:16:02 -07:00
paboyle
c77b7ee897 AddSub based alternate SU3 routine 2016-03-28 17:55:22 -06:00
paboyle
b6c3bc574b Moving to a more coherent organisation of the inline assembly and arch dependencies. 2016-03-28 16:24:37 +01:00
paboyle
1e355a51e1 Interface change 2016-03-27 23:46:55 -07:00
paboyle
ad80f61fba AVX512 shaken out 2016-03-28 00:38:05 -06:00
paboyle
21abaf7e91 Gamma sign change 2016-03-28 00:35:45 -06:00
paboyle
165bffc2e7 Avx512 changes for assembler kernels 2016-03-26 22:25:45 -06:00
paboyle
644fd6d32e Build avx512 clean 2016-03-25 09:35:33 -07:00
azusa
f54e0ec9bd Try lanczos to set up hdcr subspace 2016-03-17 10:36:16 +00:00
paboyle
60d4564151 ICC no compile fix 2016-03-16 02:30:40 -07:00
paboyle
d4e57f4bc6 IO Bandwidth reporting 2016-03-16 02:30:16 -07:00
paboyle
3920b2c0ab HDCR updates 2016-03-16 02:29:58 -07:00
paboyle
2733c4b93c hdcr updates 2016-03-16 02:29:37 -07:00
paboyle
36a800f26c Microsecond granularity support 2016-03-16 02:28:51 -07:00
paboyle
b75da563d9 Resurrect timestamp. Should make optional 2016-03-16 02:28:17 -07:00
paboyle
f9faec38be Printing fix under comms none 2016-03-16 02:27:53 -07:00
paboyle
d6b64f47d9 Uint64 sum for IO rates 2016-03-16 02:27:22 -07:00
paboyle
a359f7a9f5 Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-11 16:07:07 -08:00
paboyle
b606deb3f0 Uint64 gsum 2016-03-11 16:06:54 -08:00
paboyle
090e7aa930 Merge remote-tracking branch 'origin/chulwoo-dec12-2015'
Merge Chulwoo's Lanczos related improvements.
Merge Nd!=4 fixes for pure gauge HMC from Evan.
2016-03-08 09:55:14 +00:00
paboyle
2dce9c3cff HDCR running on 16^3 with 2x-3x speed up. 2016-03-08 01:01:50 -08:00
paboyle
dc72293398 More timing info 2016-03-06 10:46:55 -08:00
paboyle
e55c35734b Fix a nocompile 2016-03-03 20:33:28 +00:00
paboyle
325e745daa Merge branch 'master' of https://github.com/paboyle/Grid 2016-03-02 07:04:03 -08:00
paboyle
61413565d0 Back off the inlined spin proj as not working 2016-03-02 07:03:09 -08:00
paboyle
ff129d9ad9 Redundant operations removed 2016-03-02 07:02:37 -08:00
paboyle
03fcd3b33a Back out of the colour 2016-03-02 07:01:15 -08:00
paboyle
68b02da483 Backing off the colour 2016-03-02 07:00:43 -08:00
paboyle
e051119769 extern "C" should have been in the header file, but Cray is apparently not C++ friendly. 2016-03-02 07:00:00 -08:00
2d8bb356e3 Smearing routines compile (still untested) 2016-02-25 02:43:59 +09:00
a7251f28c7 Stout smearing compiles (untested) 2016-02-24 03:16:50 +09:00
1eb169ac0b compatibility fix 2016-02-23 16:36:50 +00:00
5674c3e241 cycle count fix for x86 2016-02-23 16:08:18 +00:00
Antonin Portelli
497e7e4c53 BG/Q compatibility fix 2016-02-23 15:57:38 +00:00
19526d09c2 Merge commit '6aeaf6f568a391e34b913f08be6a11beb28d8842' 2016-02-22 15:23:26 +00:00
Peter Boyle
6aeaf6f568 Parallel IO worked on. I'm puzzled because I already thought I shook this out on MacOS + OpenMPI and then
turned up problems on the BlueWaters Cray.

Gets 75MB/s from home filesystem on parallel configuration read. Need to make the RNG IO parallel,
and also to look at aggregating bigger writes for the parallel write.
Not sure what the home filesystem is.
2016-02-21 08:03:21 -06:00
Peter Boyle
40f2db9bc0 Disable metropolis step until 10 traj covered. Should move to exposing these
in XML input and start having "applications" directory.
2016-02-21 08:01:44 -06:00
Peter Boyle
2cfa20cc4e Improving the logging, got fed up with color so optionally disable.
Backtrace macro used everwhere
2016-02-21 07:58:53 -06:00
Peter Boyle
a5f683d124 Machine generated 2016-02-21 07:57:42 -06:00
Jung
9f0d9ade68 Added configure flag for LAPACK. Tested ImplicitlyRestartedLanczos::calc()
Checking in before cleaning up
2016-02-20 02:50:32 -05:00
neo
c1b1b89d17 More on smearing routines, writing APEsmear (dev) 2016-02-19 17:15:27 +09:00
neo
771235017d Adding smearing routines (development) 2016-02-19 15:30:41 +09:00
paboyle
3425751cb8 Missing return value 2016-02-19 01:06:03 +00:00
paboyle
db5e8050a8 Attempts at some optimisation 2016-02-18 22:33:58 +00:00
paboyle
a3fbabf404 Bug fix 2016-02-18 18:08:24 +00:00
Peter Boyle
22422a84d9 Small problem in compressor fix 2016-02-17 19:03:09 -06:00
Peter Boyle
c9fadf97a5 Simplify the compressor interface again. 2016-02-17 18:16:45 -06:00
Peter Boyle
c650bb3f3d Very small merge speed up. 2016-02-16 18:41:53 -06:00
Peter Boyle
81395e85d1 Regressing to not overlap comms and compute becasue bluewaters, edison, and cori are so rubbish at it. 2016-02-16 13:56:44 -06:00
Peter Boyle
340a29b735 More careful sequencing of comms 2016-02-15 16:04:59 -06:00
Peter Boyle
a0fc47c6f9 Cheaper implementation 2016-02-15 16:02:36 -06:00
Peter Boyle
42a9ac71d2 BUg fix, wait till complete. 2016-02-14 16:21:21 -06:00
Peter Boyle
41c2b09184 Shmem comms [NO MPI] target added. The dwf test runs and passes.
Not really shaken out to my satisfaction though as I want more tests done, so don't declare as working.
But committing my current while I try a few experimentals.
2016-02-14 14:24:38 -06:00
paboyle
294dbf1bf0 Compile on OpenMPI shmem 2016-02-11 23:45:51 +00:00
Peter Boyle
9548c8b91f Had to break this out for universal access through the code base. 2016-02-11 07:40:09 -06:00
Peter Boyle
7f927a541c Shmem related fixes for shmem compile 2016-02-11 07:37:39 -06:00
paboyle
e2f73e3ead Updates for shmem 2016-02-10 16:50:32 -08:00
neo
6371676a75 Correcting some compilation errors for clang-sse 2016-02-10 11:37:03 +09:00
Jung
bd84c23298 definitions reconciled. 2016-01-25 16:30:59 -05:00
Jung
7aa8d5e8af Faiing to compile, comparing with master 2016-01-25 16:03:02 -05:00
Jung
6012b0ec23 Checking in changes before changing to chulwoo-dec12-2015 2016-01-25 09:40:58 -05:00
Jung
411ac49dd7 GparityWilsonTM typedef added. Not yet tested
Conflicts:
	configure
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-25 01:36:28 -05:00
Jung
b8fb05a422 Addtional routines for Lanczos (SYM2, Chebyshef).. 2016-01-25 01:26:25 -05:00
Jung
5c57d4f403 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-11 11:36:45 -05:00
paboyle
fc6ad65751 Pushed the overlap comms tweaks 2016-01-11 06:34:22 -08:00
paboyle
dafc74020c Overlap comms compute improvements in hand op kernels, and better timing from Edison and Cori 2016-01-10 16:54:27 -08:00
paboyle
d19321dfde Overlap comms compute changes 2016-01-10 19:20:16 +00:00
Jung
5924e5a562 Merge branch 'master' of https://github.com/paboyle/Grid into scidac1_2
Conflicts:
	configure
	lib/qcd/action/Actions.h
	lib/qcd/action/fermion/WilsonKernels.h
2016-01-06 03:44:57 -05:00
paboyle
c99d748da6 Timing reports in benchmarks now reflect the asynch comms thread statistics 2016-01-04 14:42:16 +00:00
paboyle
02452afd36 Optional overlap of comms with compute 2016-01-04 14:18:40 +00:00
paboyle
331768dcff Added overlap comms compute mode 2016-01-03 01:38:11 +00:00
paboyle
4aac345bea Updated logging to colour code according to message type 2016-01-02 17:21:14 +00:00
paboyle
15c0022042 GPLv2 clarified, and copyright message and banner in Init function.
Color is just showing off....
2016-01-02 15:22:30 +00:00
paboyle
aae8bf31a7 Global edit adding copyright and license info to every source file. 2016-01-02 14:51:32 +00:00
paboyle
1e68b1c1bd Create a benign default for gparity twists 2016-01-02 14:06:53 +00:00
paboyle
5a80930dd2 Charge conjugation boundary conditions for gauge fields implemented as a policy
class, changing the nature of covariant Cshifts used in
plaquettes, rectangles and staples.

As a result same code is used for the plaq and rect action independent of the BC type.

Should probably isolate the BC in a separate class that Gimpl takes as a template param.
Do the same with smearing policies.

This would then allow composition of BC with smearing etc....
2016-01-02 13:37:25 +00:00
paboyle
145a295231 Bug fix for stencil with large shifts (3+), would be important to naik term for example but did not
impact Wilson based nearest neighbour stencils.
2015-12-30 19:29:48 +00:00
paboyle
841a37f941 Fix to WilsonCompressor that fixes a bug in comms phase due to the sign change on gamma
matrix in hopping term.
Add logging of time spent in CG.
2015-12-29 23:49:41 +00:00
Azusa Yamaguchi
e6cad3821c Logging improvement 2015-12-29 19:51:18 +00:00
Azusa Yamaguchi
98de1cbb6a Optimised version of rectangle term staples.
~3.4x faster than the naive.
2015-12-29 19:22:59 +00:00
Azusa Yamaguchi
f7d61b8b81 Plaq plus rectangle and Iwasaki, Symanzik DBW2.
http://arxiv.org/pdf/hep-lat/0610075.pdf plaq and rect regress plausibly over 100 trajectories
and under HMC with average plaq and rectangle coming out ok.
2015-12-28 16:39:26 +00:00
Azusa Yamaguchi
78c4e862ef Plaq, Rectangle, Iwasaki, Symanzik and DBW2 workign and HMC regresses to http://arxiv.org/pdf/hep-lat/0610075.pdf 2015-12-28 16:38:31 +00:00
1e0be161e5 MacroMagic: inline functions to avoid double symbol issues 2015-12-23 14:20:05 +00:00
paboyle
0afcf1cf13 Moved all the HMC tests over to using a single HmcRunner class that manages checkpoint strategies and such like 2015-12-22 11:19:25 +00:00
paboyle
08edbb5cbe HMC bit repro across checkpoints. Fixed parallel RNG issue with threading.
Conclusion: c++11 distributions not thread safe and must us distinct dist as well as distinct engine
per site. Makes sense when you think of box muller. Also added a reset of dist on fill to ensure
repro across checkpoints.
2015-12-22 08:54:40 +00:00
paboyle
0abfbcc8eb Naming of files improvement. 2015-12-21 15:37:26 +00:00