ece86f717b
checked performance of new vector libaries.
...
Added check for c++11 support on the configure.ac
2015-05-26 12:02:54 +09:00
1c862dc15b
Completed implementation of new Grid_simd classes
...
Tested performance for SSE4, Ok.
AVX1/2, AVX512 yet untested
2015-05-22 17:33:15 +09:00
9098d7d0a3
Merge remote-tracking branch 'upstream/master'
...
Conflicts:
lib/simd/Grid_vector_types.h
tests/Makefile.am
2015-05-20 17:32:46 +09:00
3a3f54932a
Implemented all SSE4 functions.
...
A test code Grid_simd_new.cc has been created to test the new class.
Tests are all OK.
2015-05-20 17:22:40 +09:00
3d66d00313
Merged
...
Merge branch 'master' of https://github.com/coppolachan/Grid into coppolachan-master
Conflicts:
lib/simd/Grid_vector_types.h
2015-05-19 15:05:07 +01:00
ffc00caea3
Got unpreconditioned conjugate gradient to run and converge on a random (uniform random,
...
not even SU(3) for now) gauge field. Convergence history is correctly indepdendent of decomposition
on 1,2,4,8,16 mpi tasks.
Found a couple of simd bugs which required fixed and enhanced the Grid_simd.cc test suite.
Implemented the Mdag, M, MdagM, Meooe Mooee schur type stuff in the wilson dop.
2015-05-19 13:57:35 +01:00
b29caead32
Partial implementation of the vector types SIMD
...
Implementing SSE4 now
A systematic series of tests must be written.
2015-05-19 17:21:17 +09:00
cf9bbee256
Merge branch 'master' of https://github.com/coppolachan/Grid into coppolachan-master
...
Conflicts:
lib/algorithms/approx/bigfloat.h
2015-05-18 16:34:21 +01:00
2843264bd8
Remez tested
2015-05-18 12:09:25 +01:00
17e4e478cd
Minor modification to the configure.ac
...
Enables silent rules (use make V=1 to override)
Prints a summary after configure is completed
2015-05-18 17:15:14 +09:00
cee363e28c
Corrected some compilation errors (zolotarev.h) and SSE4 vsplat and conj to make cshift test pass.
2015-05-18 16:48:14 +09:00
d0e4673a3f
Getting closer to having a wilson solver... introducing a first and untested
...
cut at Conjugate gradient. Also copied in Remez, Zolotarev, Chebyshev from
Mike Clark, Tony Kennedy and my BFM package respectively since we know we will
need these. I wanted the structure of
algorithms/approx
algorithms/iterative
etc.. to start taking shape.
2015-05-18 07:47:05 +01:00
a98f3e0f5e
Out of source compile now working
2015-05-15 12:21:40 +01:00
e6e72d23df
RNG test
2015-05-13 09:24:30 +01:00
add4495a4a
cout IO for all types
2015-05-13 09:24:10 +01:00
556befaaaa
Enhanced SIMD interfacing
2015-05-12 20:41:44 +01:00
c6baa3e657
Threading support rework.
...
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
6e6843ac69
Moving some things around for pretty
2015-05-11 19:09:49 +01:00
c8dc8ff891
Adding a better controlled threading class, preparing to
...
force in deterministic reduction.
2015-05-11 18:59:03 +01:00
b613ed0bb8
Got command line args working
2015-05-11 14:36:48 +01:00
4eb08ac9de
CML parse
2015-05-11 12:56:27 +01:00
b42453d1fd
Command line args and a general clean up
2015-05-11 12:43:10 +01:00
2203c6e597
Lots of changes required to compile for MIC under ICPC
2015-05-10 23:29:21 +01:00
4da2c2ea00
Merge branch 'master' of https://github.com/paboyle/Grid
...
Conflicts:
lib/qcd/Grid_qcd_wilson_dop.cc
2015-05-10 15:37:47 +01:00
79c51ac51f
Hack; must bring norm2 into the unary operator list.
...
ET's are still incomplete.
2015-05-10 15:30:29 +01:00
7119bce9f3
Default to single node. Move to command line args.
2015-05-10 15:27:38 +01:00
cd90f55536
Single node default. Should expose this as command line args, but haven't sorted out
...
Grid_initialize to handle this. Should put this on the TODO list.
2015-05-10 15:26:06 +01:00
52403d587c
Wilson perf improvements with Gauge prefetching
2015-05-06 06:37:21 +01:00
cdd5cdeda2
Cleaned up for Linux
2015-05-05 22:09:22 +01:00
9d93d1e6d4
Comms and memory benchmarks added
2015-05-03 09:44:47 +01:00
6a39089a43
Starting a benchmarking sub dir
2015-05-02 17:52:36 +01:00
c0ead94791
Integrated Lebesgue code and been playing with alternate implementations of the wilson dop without
...
any particular success in increasing the performance.
2015-04-30 16:39:06 +01:00
d8ffa09e3b
Benchmark wilson dhop now; 14.6GF on one core, not as fast as SU(3)xSU(3) [23GF] but still not too shabby.
...
Disassembling output shows ugly sequences in the permute sector. Could comparatively benchmark with and without
the if-else structure to see how much I'm losing.
Drops to 9GF as it falls out of cache. Moving to Lebesgue ordering should help there. Substantive progress.
2015-04-29 06:50:18 +01:00
dcc23faa4a
Fixed the stencil sector and Wilson now agrees between stencil based implementation
...
and the cshift based implementation. Managed to reduce the volume of code in this
sector a little, but consolidation would be good, perhaps taking common
logic out into simple helper functions
2015-04-29 06:23:56 +01:00
b0485894b3
Shaken out stencil to the point where I think wilson dslash is correct.
...
Need to audit code carefully, consolidate between stencil and cshift,
and then benchmark and optimise.
2015-04-28 08:11:59 +01:00
0b7d389258
Reworking CSHIFT and Stencil. Implementing Wilson and discovered rework is required
2015-04-27 13:45:07 +01:00
35cfef2129
Big updates with progress towards wilson matrix
2015-04-26 15:51:09 +01:00
2d8cf9e456
Added two spinor functionality required to support the Wilson hopping term.
2015-04-25 12:54:06 +01:00
fc32450360
Improved the gamma quite a bit.
...
Serial rng's which are set on node zero and broadcaste
2015-04-24 20:21:40 +01:00
71d5927a66
Vectors now too and right multiple of matrix with gamma
2015-04-24 19:08:29 +01:00
b8eef54fa7
First implementation of Dirac matrices as a Gamma class.
2015-04-24 18:20:03 +01:00
e2e3ea5742
Reorganised the TODO. Really getting somewhere
2015-04-23 20:42:30 +01:00
62e8d2d127
Slice summation working. May move this into lattice/Grid_lattice_reduction however
2015-04-23 15:13:00 +01:00
1851327d19
Got the NERSC IO working and fixed a bug in cshift.
2015-04-22 22:46:48 +01:00
a5b0c492d7
Rework of RNG to use C++11 random. Should work correctly maintaining parallel RNG across
...
a machine. If a "fixedSeed" is used, randoms should be reproducible across different machine
decomposition since the generators are physically indexed and assigned in lexico ordering.
2015-04-19 14:55:58 +01:00
1556c2ba3f
Finishing the reorg
2015-04-18 21:24:10 +01:00
e6ec92d0e4
More files, shorter each.
2015-04-18 20:45:00 +01:00
1674f899e0
Cleaing up
2015-04-18 16:42:47 +01:00
f678be5f94
Shaken out the peekIndex support.
...
Hardwire constants "SpinIndex, ColourIndex" and LorentzIndex in Grid_QCD.h
2015-04-18 16:17:41 +01:00
3e3df092bb
Reorg of build structure
2015-04-18 14:55:00 +01:00