1
0
mirror of https://github.com/paboyle/Grid.git synced 2024-11-10 07:55:35 +00:00
Commit Graph

970 Commits

Author SHA1 Message Date
paboyle
899ca41cb8 Merge branch 'master' of github.com:paboyle/Grid
Conflicts:
	lib/qcd/action/fermion/WilsonFermion5D.cc
2015-11-06 03:50:04 -08:00
paboyle
d29b4c1dee Assembler files 2015-11-06 03:48:48 -08:00
paboyle
a2ff068e29 Asm and threading for many core 2015-11-06 03:47:14 -08:00
paboyle
b362f8d27b Threading for many core 2015-11-06 03:46:41 -08:00
paboyle
64770d9052 Threading changes for many core and asm calls 2015-11-06 03:46:21 -08:00
paboyle
17af18dcab Changes for AVX512 assembler 2015-11-06 03:45:51 -08:00
Peter Boyle
28022755ae Stencil class name global change to StencilImpl typedef 2015-11-06 05:30:17 -06:00
Peter Boyle
955b482aaf Partial optimisation of the extraction/merger of simd vecs. 2015-11-06 05:26:20 -06:00
Peter Boyle
f9b2fce93b Changing whole stencil class to be template and not just single functions 2015-11-06 05:25:10 -06:00
Peter Boyle
473fa28a6c Partial optimisation; comms in x-dir for red black dslash will be slow as the checker skipping block strided
loops are non threadable. Will need to write a kernel for these instead and drive them with a lookup table
to make a look sufficiently simple to thread.
2015-11-06 05:23:23 -06:00
Peter Boyle
5d854c869c Stencil interface changes 2015-11-06 05:22:33 -06:00
Peter Boyle
880ff88362 Comms optimisation 2015-11-06 05:22:18 -06:00
Azusa Yamaguchi
4690acc3c8 Don't know why peter committed these as they didn't compile 2015-11-06 10:31:48 +00:00
Azusa Yamaguchi
3281745fde Exec info and linux check to stop non-portable code breaking 2015-11-06 10:31:24 +00:00
paboyle
1159de165c Asm option for AVX512 2015-11-05 22:04:51 -08:00
paboyle
16c7993434 Merge branch 'master' of github.com:paboyle/Grid
Conflicts:
	lib/simd/Grid_avx512.h
	lib/simd/Grid_imci.h
2015-11-04 03:32:10 -08:00
paboyle
6be9716e6f New file 2015-11-04 03:26:28 -08:00
paboyle
4a41c885ed Use Linux kernel interface to hardware performance counters. Dead useful. 2015-11-04 03:24:19 -08:00
paboyle
757b31ed42 Threading for KNC mods. 2015-11-04 03:22:14 -08:00
paboyle
ac7d1f26ad Either blocking or lebesgue curve 2015-11-04 03:19:16 -08:00
paboyle
1a8bf938b3 Use either sub-blocking or lebesgue 2015-11-04 03:18:51 -08:00
paboyle
63a2993827 Exec info an cache blocking 2015-11-04 03:16:56 -08:00
paboyle
4e65ad21ac Adding a routine for AVX512 / IMCI with explicit assembly implementations 2015-11-04 03:15:08 -08:00
Peter Boyle
dfc1de6f60 Merge branch 'master' of github.com:paboyle/Grid 2015-11-04 05:14:26 -06:00
Peter Boyle
3b7576ad53 Switch off for now 2015-11-04 05:13:29 -06:00
paboyle
9b5d31ffc1 mac , mult routines
Lines# with '#' will be ignored, and an empty message aborts the commit.
2015-11-04 03:10:34 -08:00
paboyle
a38762159c Inline assembly hooks for AVX 512. Better way in some ways than BAGEL to generate assembly.
Updated Grid_avx512.h
2015-11-04 03:09:06 -08:00
Peter Boyle
ffc5dab17f AMD FMA4 support added for Interlagos/BlueWaters 2015-11-04 04:29:58 -06:00
Peter Boyle
96608c70d1 chrono causing some problems on Cray systems. Suspend use for now 2015-11-04 04:28:31 -06:00
Peter Boyle
d35d63b171 Algorithm in 2015-11-04 04:27:44 -06:00
Peter Boyle
24044dbc56 Debugged a problem with checkerboarded cshift in the checker dimension which arose
only when mpi spread out in the checker dimension. Added a test that trapped and helped debug this
2015-11-04 10:00:55 +00:00
Peter Boyle
abb23df83f formatting only 2015-11-04 10:00:27 +00:00
Peter Boyle
12c5ec813c Useful debug messages (commented out) are included for preservation in case I need to revisit this 2015-11-04 09:59:27 +00:00
Peter Boyle
1271508ca2 Bug fix for spread out in x (EO) direction.
This is really annoying -- it is very hard to thread the loops with the index
recursion on buffer offset in the red-black case. Must think of a good threading
solution here.
2015-11-04 09:57:57 +00:00
Peter Boyle
ec5af35166 EO bug fix when spread out in x-direction 2015-11-04 09:56:58 +00:00
Peter Boyle
0f59356e86 Problem in comms fixed 2015-11-02 00:00:15 +00:00
Peter Boyle
8889af45ca FMA4 added 2015-10-09 01:00:53 +02:00
Peter Boyle
83afb2e26a Poly support for lanczos 2015-10-09 00:43:21 +02:00
Peter Boyle
6d06bd9493 Minor change in commented out code 2015-10-09 00:42:21 +02:00
Peter Boyle
6ee23f409e Lanczos addition 2015-10-09 00:41:00 +02:00
Peter Boyle
2d95dac6b6 Lanczos untested/partially tested additions. In middle of shake out but at least compiles 2015-10-09 00:40:25 +02:00
Peter Boyle
814c79f38d SIMD improvements for mac and madd use in complex for avx, sse 2015-10-09 00:38:52 +02:00
paboyle
1878bf97d0 Babbage fix 2015-09-30 16:04:01 -07:00
paboyle
a660ce716b No compile babbage fix 2015-09-30 16:02:44 -07:00
paboyle
f4b6d1dfea NGO stores reenabled 2015-09-30 16:02:14 -07:00
paboyle
23813ac798 No compile on babbage fix 2015-09-30 16:01:28 -07:00
Peter Boyle
9f4f65cb46 Added a decoupled memory system benchmark to remove thread synch overhead 2015-09-26 18:23:57 -07:00
Peter Boyle
64d64d1ab6 Updating to modify non-inlining permute routines and hopefully get better reg use and
enhance performance.
2015-09-25 08:55:04 -07:00
Peter Boyle
5ef42add2d Changes to remove warnings under icc; disambiguate AVX512 from IMCI correctly
and drop swizzles in AVX512. Don't know why these compiled.
2015-09-23 05:23:45 -07:00
Peter Boyle
2f38ebc446 Reintroducing the hand unrolled loops 2015-09-08 17:45:30 +01:00