1
0
mirror of https://github.com/paboyle/Grid.git synced 2025-06-10 11:26:56 +01:00
Commit Graph

1383 Commits

Author SHA1 Message Date
d35d63b171 Algorithm in 2015-11-04 04:27:44 -06:00
41299da406 files added 2015-10-09 01:01:46 +02:00
8889af45ca FMA4 added 2015-10-09 01:00:53 +02:00
d4289a33b8 AMD FMA4 addition 2015-10-09 00:44:20 +02:00
83afb2e26a Poly support for lanczos 2015-10-09 00:43:21 +02:00
3726fe7481 Bigger vec length 2015-10-09 00:42:54 +02:00
6d06bd9493 Minor change in commented out code 2015-10-09 00:42:21 +02:00
6ee23f409e Lanczos addition 2015-10-09 00:41:00 +02:00
2d95dac6b6 Lanczos untested/partially tested additions. In middle of shake out but at least compiles 2015-10-09 00:40:25 +02:00
44fecd4d8d Lanczos test 2015-10-09 00:39:21 +02:00
814c79f38d SIMD improvements for mac and madd use in complex for avx, sse 2015-10-09 00:38:52 +02:00
af89c40462 Better timing tweaks to give sensible results on 24 threads on Edison dual ivybridge nodes. 2015-09-28 16:09:04 -07:00
9f4f65cb46 Added a decoupled memory system benchmark to remove thread synch overhead 2015-09-26 18:23:57 -07:00
64d64d1ab6 Updating to modify non-inlining permute routines and hopefully get better reg use and
enhance performance.
2015-09-25 08:55:04 -07:00
5ef42add2d Changes to remove warnings under icc; disambiguate AVX512 from IMCI correctly
and drop swizzles in AVX512. Don't know why these compiled.
2015-09-23 05:23:45 -07:00
2f38ebc446 Reintroducing the hand unrolled loops 2015-09-08 17:45:30 +01:00
638d6675ee Tested rms dH is ~ dt^4 numerically, so believe the ForceGradient is correct now.
Paranoia makes me want to diddle with the FG step to ensure dt^2 reappears.
2015-08-31 16:33:20 +01:00
357c6ab46d Reunitarise. Complete the HMC and integrator changes. 2015-08-31 16:32:04 +01:00
755dca9533 Added ForceGradient integrator. dH dropped so seems to work. Will only
believe it is right once I have pulled a dt^4 error scaling plot out.
2015-08-31 06:23:02 +01:00
29fd004d54 Unified integrator and integrator algorithm into virtual class used as a policy for the
HMC.
2015-08-30 13:39:19 +01:00
eed889ea05 Update on todo list 2015-08-30 12:23:08 +01:00
aa52fdadcc Global edit on HMC sector -- making GaugeField a template parameter and
preparing to pass integrator, smearing, bc's as policy classes to hmc.

Propose to unify "integrator" and integrator algorithm in a base/derived
way to override step. Want to read through ForceGradient to ensure
that abstraction covers the force gradient case.
2015-08-30 12:18:34 +01:00
76d752585b Started a tidy up in the HMC sector. Now comfortable with the two level integrators;
to a little figure out what Guido had done & why -- but there is a neat saving of force
evaluations across the nesting time boundary making use of linearity of the leapP in dt.

I cleaned up the printing, reduced the volume of code, in the process sharing printing
between all integrators. Placed an assert that the total integration time for all integrators
must match at end of trajectory.

Have now verified e-dH = 1 for nested integrators in Wilson/Wilson runs with both
Omelyan and with Leapfrog so substantial confidence gained.
2015-08-29 17:18:43 +01:00
dc814f30da Binary IO file for generic Grid array parallel I/O.
Number of IO MPI tasks can be varied by selecting which
dimensions use parallel IO and which dimensions use Serial send to boss
I/O.

Thus can neck down from, say 1024 nodes = 4x4x8x8 to {1,8,32,64,128,256,1024} nodes
doing the I/O.

Interpolates nicely between ALL nodes write their data, a single boss per time-plane
in processor space [old UKQCD fortran code did this], and a single node doing all I/O.

Not sure I have the transfer sizes big enough and am not overly convinced fstream
is guaranteed to not give buffer inconsistencies unless I set streambuf size to zero.

Practically it has worked on 8 tasks, 2x1x2x2 writing /cloning NERSC configurations
on my MacOS + OpenMPI and Clang environment.

It is VERY easy to switch to pwrite at a later date, and also easy to send x-strips around from
each node in order to gather bigger chunks at the syscall level.

That would push us up to the circa 8x 18*4*8 == 4KB size write chunk, and by taking, say, x/y non
parallel we get to 16MB contiguous chunks written in multi 4KB transactions
per IOnode in 64^3 lattices for configuration I/O.

I suspect this is fine for system performance.
2015-08-26 13:40:29 +01:00
612957f057 pull in original license. 2015-08-21 10:19:08 +01:00
cea8ac9a22 Credits to orig source where I found the macro tricks. 2015-08-21 10:14:53 +01:00
476da3ee62 Separated IO reader/writers into a proper abstract base,
derived relationship. Have Text/Binary/Xml versions of
Reader & Writer.

Any new Reader/Writer class inheriting the interface can give object serialisation
to any desired format now.

      new file:   lib/serialisation/BaseIO.h
      modified:   lib/serialisation/BinaryIO.h
      modified:   lib/serialisation/Serialisation.h
      modified:   lib/serialisation/TextIO.h
      modified:   lib/serialisation/XmlIO.h

The test uses the Xml, Binary and Text formats as well as cout << Object.
2015-08-21 10:06:33 +01:00
35818fdf6c Text and Binary readers 2015-08-20 23:04:38 +01:00
091785e5f5 Better list 2015-08-20 17:19:48 +01:00
77d299b414 Cosmetic 2015-08-20 16:30:52 +01:00
ab81a25073 XMLReader implementation and a virtual Reader/Writer template framework.
Test_serialisation has an example of *code* *free* object serialisation
to both ostream and to XML using macro magic.

Implementing TextReader/TextWriter, YAML, JSON etc.. should be trivial
and we can use configure time options to select the default "Reader" typedef.

Present done with

"using XMLPolicy::Reader"

to pick up the default serialisation strategy.
2015-08-20 16:21:26 +01:00
fdfe194c41 Threading bug in RNG fill fixed. 2015-08-19 14:41:05 +01:00
8b070ae54c Gparity now accepting twists through constructor 2015-08-19 11:26:01 +01:00
4e085dd0ed Domain wall even-odd 2f HMC with wilson gauge and PV 2f ratio now running and giving small dH.
Azusa is working hard on the rectangle term and we'll hopefully start reproducing plaquettes
from RBC-UKQCD parameters soon !

My new laptop is pretty warm and is starting to groan ;)
2015-08-19 10:26:07 +01:00
e8d63c9178 Merge branch 'master' of https://github.com/paboyle/Grid 2015-08-19 05:49:00 +01:00
c54c086f17 Even odd preconditioned one flavour ratio
(no support for non-const EE schur block)
2015-08-19 05:46:58 +01:00
dd6bb73ee0 Added one flavour rational ratios (unprec) 2015-08-19 04:58:40 +01:00
fc160eeccc Added one flavour rational ratios (unprec) 2015-08-19 04:58:40 +01:00
48db72259e EvenOdd schur decomposed mpcdagmpc version of rhmc determinant.
dH is also small and plaquette looks right.
2015-08-18 18:37:39 +01:00
570150f1d3 EvenOdd schur decomposed mpcdagmpc version of rhmc determinant.
dH is also small and plaquette looks right.
2015-08-18 18:37:39 +01:00
9c7840c3a7 rhmc for 1+1 wilson is conserving dH~0.
A good days work  ;)
2015-08-18 16:58:56 +01:00
aef98b7226 rhmc for 1+1 wilson is conserving dH~0.
A good days work  ;)
2015-08-18 16:58:56 +01:00
5c364f8082 One flavour rational unprec added; untested but does compile.
Moving param structs into a single header for later connection to file I/O using
macromagic.h
2015-08-18 14:40:08 +01:00
a842a6c94d One flavour rational unprec added; untested but does compile.
Moving param structs into a single header for later connection to file I/O using
macromagic.h
2015-08-18 14:40:08 +01:00
2dd9ad7b0f Update TODO list 2015-08-18 10:43:32 +01:00
cd242a2637 Update TODO list 2015-08-18 10:43:32 +01:00
bdcbfe9310 Even Odd two flavour ratio added and dH == small 2015-08-18 10:37:08 +01:00
9306921ded Even Odd two flavour ratio added and dH == small 2015-08-18 10:37:08 +01:00
76f3855629 Merge branch 'master' of https://github.com/paboyle/Grid 2015-08-18 09:23:58 +01:00
8621e2409f Merge branch 'master' of https://github.com/paboyle/Grid 2015-08-18 09:23:58 +01:00