4e65ad21ac
Adding a routine for AVX512 / IMCI with explicit assembly implementations
2015-11-04 03:15:08 -08:00
dfc1de6f60
Merge branch 'master' of github.com:paboyle/Grid
2015-11-04 05:14:26 -06:00
f87526a04f
Make ICC happy
2015-11-04 05:14:03 -06:00
3b7576ad53
Switch off for now
2015-11-04 05:13:29 -06:00
9b5d31ffc1
mac , mult routines
...
Lines# with '#' will be ignored, and an empty message aborts the commit.
2015-11-04 03:10:34 -08:00
a38762159c
Inline assembly hooks for AVX 512. Better way in some ways than BAGEL to generate assembly.
...
Updated Grid_avx512.h
2015-11-04 03:09:06 -08:00
ffc5dab17f
AMD FMA4 support added for Interlagos/BlueWaters
2015-11-04 04:29:58 -06:00
96608c70d1
chrono causing some problems on Cray systems. Suspend use for now
2015-11-04 04:28:31 -06:00
d35d63b171
Algorithm in
2015-11-04 04:27:44 -06:00
9183920e8b
Added an even odd stencil test, shook out a problem with spread out x-direction.
...
Generalise test to allow different types of "Field" to be used.
2015-11-04 10:03:04 +00:00
01f286c9fe
Better testing for red black cshift which was sufficient to chase down a spread out x-direction problem.
2015-11-04 10:02:17 +00:00
24044dbc56
Debugged a problem with checkerboarded cshift in the checker dimension which arose
...
only when mpi spread out in the checker dimension. Added a test that trapped and helped debug this
2015-11-04 10:00:55 +00:00
abb23df83f
formatting only
2015-11-04 10:00:27 +00:00
12c5ec813c
Useful debug messages (commented out) are included for preservation in case I need to revisit this
2015-11-04 09:59:27 +00:00
1271508ca2
Bug fix for spread out in x (EO) direction.
...
This is really annoying -- it is very hard to thread the loops with the index
recursion on buffer offset in the red-black case. Must think of a good threading
solution here.
2015-11-04 09:57:57 +00:00
ec5af35166
EO bug fix when spread out in x-direction
2015-11-04 09:56:58 +00:00
b3d70a3bb2
Ncall change
2015-11-04 09:55:21 +00:00
c26220e9ab
EO benchmark as well as non-eo
2015-11-04 09:54:48 +00:00
0f59356e86
Problem in comms fixed
2015-11-02 00:00:15 +00:00
8709117aea
Log: generalised Logger class to allow separate logs in Grid-based applications
2015-10-27 17:31:13 +00:00
1b22ce5720
tests Make.inc fix
2015-10-27 10:47:52 +00:00
e6b9aa9076
Config.h removed form repository
2015-10-27 10:47:07 +00:00
d9f2e2e06a
Merge pull request #2 from paboyle/master
...
Update from Peter
2015-10-19 14:52:52 +01:00
41299da406
files added
2015-10-09 01:01:46 +02:00
8889af45ca
FMA4 added
2015-10-09 01:00:53 +02:00
d4289a33b8
AMD FMA4 addition
2015-10-09 00:44:20 +02:00
83afb2e26a
Poly support for lanczos
2015-10-09 00:43:21 +02:00
3726fe7481
Bigger vec length
2015-10-09 00:42:54 +02:00
6d06bd9493
Minor change in commented out code
2015-10-09 00:42:21 +02:00
6ee23f409e
Lanczos addition
2015-10-09 00:41:00 +02:00
2d95dac6b6
Lanczos untested/partially tested additions. In middle of shake out but at least compiles
2015-10-09 00:40:25 +02:00
44fecd4d8d
Lanczos test
2015-10-09 00:39:21 +02:00
814c79f38d
SIMD improvements for mac and madd use in complex for avx, sse
2015-10-09 00:38:52 +02:00
1878bf97d0
Babbage fix
2015-09-30 16:04:01 -07:00
3a478e5f2a
No compile babbage fix
2015-09-30 16:03:05 -07:00
a660ce716b
No compile babbage fix
2015-09-30 16:02:44 -07:00
f4b6d1dfea
NGO stores reenabled
2015-09-30 16:02:14 -07:00
23813ac798
No compile on babbage fix
2015-09-30 16:01:28 -07:00
af89c40462
Better timing tweaks to give sensible results on 24 threads on Edison dual ivybridge nodes.
2015-09-28 16:09:04 -07:00
9f4f65cb46
Added a decoupled memory system benchmark to remove thread synch overhead
2015-09-26 18:23:57 -07:00
64d64d1ab6
Updating to modify non-inlining permute routines and hopefully get better reg use and
...
enhance performance.
2015-09-25 08:55:04 -07:00
5ef42add2d
Changes to remove warnings under icc; disambiguate AVX512 from IMCI correctly
...
and drop swizzles in AVX512. Don't know why these compiled.
2015-09-23 05:23:45 -07:00
2f38ebc446
Reintroducing the hand unrolled loops
2015-09-08 17:45:30 +01:00
638d6675ee
Tested rms dH is ~ dt^4 numerically, so believe the ForceGradient is correct now.
...
Paranoia makes me want to diddle with the FG step to ensure dt^2 reappears.
2015-08-31 16:33:20 +01:00
357c6ab46d
Reunitarise. Complete the HMC and integrator changes.
2015-08-31 16:32:04 +01:00
755dca9533
Added ForceGradient integrator. dH dropped so seems to work. Will only
...
believe it is right once I have pulled a dt^4 error scaling plot out.
2015-08-31 06:23:02 +01:00
29fd004d54
Unified integrator and integrator algorithm into virtual class used as a policy for the
...
HMC.
2015-08-30 13:39:19 +01:00
eed889ea05
Update on todo list
2015-08-30 12:23:08 +01:00
aa52fdadcc
Global edit on HMC sector -- making GaugeField a template parameter and
...
preparing to pass integrator, smearing, bc's as policy classes to hmc.
Propose to unify "integrator" and integrator algorithm in a base/derived
way to override step. Want to read through ForceGradient to ensure
that abstraction covers the force gradient case.
2015-08-30 12:18:34 +01:00
76d752585b
Started a tidy up in the HMC sector. Now comfortable with the two level integrators;
...
to a little figure out what Guido had done & why -- but there is a neat saving of force
evaluations across the nesting time boundary making use of linearity of the leapP in dt.
I cleaned up the printing, reduced the volume of code, in the process sharing printing
between all integrators. Placed an assert that the total integration time for all integrators
must match at end of trajectory.
Have now verified e-dH = 1 for nested integrators in Wilson/Wilson runs with both
Omelyan and with Leapfrog so substantial confidence gained.
2015-08-29 17:18:43 +01:00