Peter Boyle
1771f97551
Key of mm_malloc.h
2015-05-15 11:32:11 +01:00
Peter Boyle
2eaf73e8b3
strong inline required to force icpc
2015-05-15 11:31:41 +01:00
Peter Boyle
43bdbb5080
Linear op added
2015-05-13 11:25:34 +01:00
Peter Boyle
7f3ae64a31
OMP dslash working
2015-05-13 10:59:22 +01:00
Peter Boyle
457cc0d5a3
RNG test
2015-05-13 09:24:30 +01:00
Peter Boyle
d388b831b4
cout IO for all types
2015-05-13 09:24:10 +01:00
Peter Boyle
b4a570477c
I have made the Cshift work successfully with open mp threading in
...
every routine. Collapse(2) is now working under clang-omp++.
2015-05-13 00:31:00 +01:00
Peter Boyle
52174da232
Enhanced SIMD interfacing
2015-05-12 20:41:44 +01:00
Peter Boyle
65c91eae64
Threading support rework.
...
Placed parallel pragmas as macros; implemented deterministic thread reduction in style of
BFM.
2015-05-12 07:51:41 +01:00
Peter Boyle
8b765be2b1
Moving some things around for pretty
2015-05-11 19:09:49 +01:00
Peter Boyle
a411b48a91
Adding a better controlled threading class, preparing to
...
force in deterministic reduction.
2015-05-11 18:59:03 +01:00
Peter Boyle
ebcb87abe1
Got command line args working
2015-05-11 14:36:48 +01:00
paboyle
1576b7837a
CML parse
2015-05-11 12:56:27 +01:00
paboyle
fa5779537c
Command line args and a general clean up
2015-05-11 12:43:10 +01:00
paboyle
5548fd6928
Updated to do list
2015-05-11 09:44:50 +01:00
Peter Boyle
242e447bc5
Lots of changes required to compile for MIC under ICPC
2015-05-10 23:29:21 +01:00
Peter Boyle
352bccf6ca
Merge branch 'master' of https://github.com/paboyle/Grid
...
Conflicts:
lib/qcd/Grid_qcd_wilson_dop.cc
2015-05-10 15:37:47 +01:00
Peter Boyle
c946e77143
Expression template hack
2015-05-10 15:35:30 +01:00
Peter Boyle
015fbee772
Expression template engin
2015-05-10 15:34:20 +01:00
Peter Boyle
8215893152
Updated TODO list
2015-05-10 15:32:56 +01:00
Peter Boyle
5fcf42cb30
Hack; must bring norm2 into the unary operator list.
...
ET's are still incomplete.
2015-05-10 15:30:29 +01:00
Peter Boyle
e647cf0459
Default to single node. Move to command line args.
2015-05-10 15:27:38 +01:00
Peter Boyle
8919bf9e0a
Single node default. Should expose this as command line args, but haven't sorted out
...
Grid_initialize to handle this. Should put this on the TODO list.
2015-05-10 15:26:06 +01:00
Peter Boyle
133493dc79
Small tweak to enable benchmarking to suppress gauge field bandwidth as a test.
...
This is a short term hack while I benchmark.
2015-05-10 15:25:23 +01:00
Peter Boyle
58d32a4d0e
Assertion should never hit, but did due to a bug
2015-05-10 15:24:37 +01:00
Peter Boyle
6bb17502f9
Moving operator stuff into separate file so that we can switch on/off replacement with
...
expression templates
2015-05-10 15:23:49 +01:00
Peter Boyle
8299bc39ea
Fixing breakage in the Comms non compile
2015-05-10 15:23:09 +01:00
Peter Boyle
7f04b85368
Bringing expression templates for faster vector loops
2015-05-10 15:22:31 +01:00
Peter Boyle
a115f3b086
ET ready benchmark with bytes counted assuming loop interchange
2015-05-10 15:18:04 +01:00
Peter Boyle
27c2d13968
Updated todo list
2015-05-10 15:13:50 +01:00
Peter Boyle
5415180676
Wilson perf improvements with Gauge prefetching
2015-05-06 06:37:21 +01:00
Peter Boyle
7b0dd6c5d6
Cleaned up for Linux
2015-05-05 22:09:22 +01:00
Peter Boyle
cb4b82b09f
streaming store cases
2015-05-05 18:14:09 +01:00
Peter Boyle
cd990ba13d
Streaming store option
2015-05-05 18:13:06 +01:00
Peter Boyle
249165d1b2
Added streaming stores
2015-05-05 18:09:28 +01:00
Peter Boyle
b720222d98
Updated bandwidth test
2015-05-05 18:08:53 +01:00
Peter Boyle
0e8415de1b
Added a makefile
2015-05-05 17:56:42 +01:00
Peter Boyle
2b46ad38e2
Back to vector for now; cost of init loop is clear in the a*x + y
...
loop in memory benchmark and must move to better container class.
2015-05-03 09:48:13 +01:00
Peter Boyle
9d93d1e6d4
Comms and memory benchmarks added
2015-05-03 09:44:47 +01:00
Peter Boyle
253362f978
Added a comms benchmark
2015-05-02 23:51:43 +01:00
Peter Boyle
ea52562527
Added a comms benchmark
2015-05-02 23:42:30 +01:00
Peter Boyle
6a39089a43
Starting a benchmarking sub dir
2015-05-02 17:52:36 +01:00
Peter Boyle
bdf18941a2
Improving the byte swap support for portability
2015-05-01 10:57:33 +01:00
Peter Boyle
d904e2b9ac
Merge branch 'master' of https://github.com/paboyle/Grid
2015-04-30 16:40:13 +01:00
Peter Boyle
c0ead94791
Integrated Lebesgue code and been playing with alternate implementations of the wilson dop without
...
any particular success in increasing the performance.
2015-04-30 16:39:06 +01:00
Peter Boyle
7ac997bd58
Merge pull request #1 from mspraggs/patch-1
...
Added <map> include to GridNerscIO.h
2015-04-30 09:46:48 +01:00
mspraggs
24fc71b2e9
Added <map> include to GridNerscIO.h
...
Adding this allows clang to compile Grid to completion.
2015-04-29 23:44:03 +01:00
Peter Boyle
d8ffa09e3b
Benchmark wilson dhop now; 14.6GF on one core, not as fast as SU(3)xSU(3) [23GF] but still not too shabby.
...
Disassembling output shows ugly sequences in the permute sector. Could comparatively benchmark with and without
the if-else structure to see how much I'm losing.
Drops to 9GF as it falls out of cache. Moving to Lebesgue ordering should help there. Substantive progress.
2015-04-29 06:50:18 +01:00
Peter Boyle
dcc23faa4a
Fixed the stencil sector and Wilson now agrees between stencil based implementation
...
and the cshift based implementation. Managed to reduce the volume of code in this
sector a little, but consolidation would be good, perhaps taking common
logic out into simple helper functions
2015-04-29 06:23:56 +01:00
Peter Boyle
b0485894b3
Shaken out stencil to the point where I think wilson dslash is correct.
...
Need to audit code carefully, consolidate between stencil and cshift,
and then benchmark and optimise.
2015-04-28 08:11:59 +01:00