Meifeng Lin
|
c2f8ba194e
|
Working simple OpenMP offloading with cudaMallocManaged; cshift not working
|
2021-09-29 15:23:13 -07:00 |
|
Meifeng Lin
|
229ce57fef
|
Added example config-command
|
2021-09-27 19:01:32 -04:00 |
|
Meifeng Lin
|
712b326e40
|
Added OpenMP target offloading support
|
2021-09-27 19:00:18 -04:00 |
|
Peter Boyle
|
ca9816bfbb
|
Typo
|
2021-09-21 04:12:04 +02:00 |
|
Peter Boyle
|
814d5abc7e
|
Merge branch 'develop' of https://github.com/paboyle/Grid into develop
|
2021-09-21 04:05:51 +02:00 |
|
Peter Boyle
|
a29122e2bf
|
Rebench
|
2021-09-21 04:05:04 +02:00 |
|
Peter Boyle
|
e188c0512e
|
Udpdate
|
2021-09-21 01:04:30 +02:00 |
|
Peter Boyle
|
1fb6aaf150
|
Device 2 Device with cudaMemcpy
|
2021-09-21 01:03:07 +02:00 |
|
Peter Boyle
|
894654f7ef
|
Simplificatoin, always gather faces
|
2021-09-21 01:02:34 +02:00 |
|
Peter Boyle
|
109507888b
|
Option to force use of MPI over Nvlink
|
2021-09-21 00:53:25 +02:00 |
|
Peter Boyle
|
68650b61fe
|
Options controlling behaviour
|
2021-09-21 00:51:01 +02:00 |
|
Peter Boyle
|
af98525766
|
Merge pull request #359 from paboyle/feature/serialisation-update
Feature/serialisation update
|
2021-09-16 10:24:52 -04:00 |
|
Peter Boyle
|
1c2f218519
|
Merge pull request #360 from pjgeorg/ld-nvcc-openmp
nvcc: Add -fopenmp to LDFLAGS
|
2021-09-16 10:24:30 -04:00 |
|
Peter Boyle
|
c9aa1f507c
|
Merge pull request #363 from felixerben/feature/testMesonField
Feature/test meson field
|
2021-09-16 10:23:58 -04:00 |
|
Peter Boyle
|
ea7126496d
|
Merge pull request #361 from edbennett/fix-setdevice-message
make message about setdevice consistent with configure script
|
2021-09-16 10:23:37 -04:00 |
|
Peter Boyle
|
f660dc67e4
|
Merge pull request #366 from lehner/feature/gpt
Avx512 mixed prec
|
2021-09-15 20:27:13 -04:00 |
|
Christoph Lehner
|
ede8faea74
|
Merge branch 'paboyle:develop' into feature/gpt
|
2021-09-16 02:23:15 +02:00 |
|
Christoph Lehner
|
1b750761c2
|
Merge pull request #26 from waterret/feature/gpt
AVX512 drop mixed precision as well
|
2021-09-16 02:22:52 +02:00 |
|
Peter Boyle
|
145acf2919
|
Perf results
|
2021-09-16 01:06:28 +01:00 |
|
Peter Boyle
|
cc4a27b9e6
|
Scripts and performance
|
2021-09-16 00:15:35 +01:00 |
|
Peter Boyle
|
b4690e6091
|
Adding build basics for different systems
|
2021-09-16 00:00:38 +01:00 |
|
Luchang Jin
|
4b24800132
|
AVX512 drop mixed precision as well
|
2021-09-15 16:29:47 -04:00 |
|
Peter Boyle
|
9d2238148c
|
Merge branch 'develop' of https://www.github.com/paboyle/Grid into develop
|
2021-09-15 19:25:57 +01:00 |
|
Peter Boyle
|
c15493218d
|
Two extra routines to break out SchurRedBlack on many RHS into stages to allow efficient deflation & split grid
Split grid solver still to do.
|
2021-09-15 19:24:39 +01:00 |
|
Peter Boyle
|
001a556a34
|
Merge pull request #365 from lehner/feature/gpt
Sync
|
2021-09-15 13:34:02 -04:00 |
|
Christoph Lehner
|
3d0f88e702
|
A64FX drop mixed precision as well
|
2021-09-15 18:38:32 +02:00 |
|
Christoph Lehner
|
dd091d0960
|
consistent pointer offloading instead of views
|
2021-09-15 16:58:05 +02:00 |
|
Christoph Lehner
|
e2abbf9520
|
Merge pull request #25 from paboyle/develop
Sync
|
2021-09-15 10:02:43 +02:00 |
|
Peter Boyle
|
402d80e197
|
Merge branch 'develop' of https://www.github.com/paboyle/Grid into develop
|
2021-09-14 16:16:06 +01:00 |
|
Peter Boyle
|
86e33c8ab2
|
Significant GPU perf speed up finished
|
2021-09-14 16:14:23 +01:00 |
|
Peter Boyle
|
5dae6a6dac
|
Deprecate half prec comms
|
2021-09-14 15:06:59 +01:00 |
|
Peter Boyle
|
361bb8a101
|
Remove half prec comms
|
2021-09-14 15:06:29 +01:00 |
|
Peter Boyle
|
7efdb3cd2b
|
Remove half prec comms
|
2021-09-14 15:06:06 +01:00 |
|
Peter Boyle
|
65ef4ec29f
|
Move tables to device memory
|
2021-09-14 15:05:01 +01:00 |
|
Peter Boyle
|
d5835c0222
|
Switch to coalesced stencil face gather
|
2021-09-14 15:04:14 +01:00 |
|
Peter Boyle
|
a7b943b33e
|
Remove half prec comms
|
2021-09-14 05:05:33 +01:00 |
|
Peter Boyle
|
7440cde92f
|
No half prec comms; coalesced access on GPU
|
2021-09-14 05:04:56 +01:00 |
|
Peter Boyle
|
0fc662bb24
|
Dirac cuda 11.4 happy ; force host for functions accessing mult table
ET runs these on host BEFORE lodging result in AST for kernel
|
2021-09-14 05:00:44 +01:00 |
|
Peter Boyle
|
8195890640
|
Force MPI over NVLINK
|
2021-09-14 05:00:17 +01:00 |
|
Peter Boyle
|
4c88104a73
|
Fix compile warns
|
2021-09-11 23:08:05 +01:00 |
|
Peter Boyle
|
73b944c152
|
Drop half prec comms for now.
|
2021-09-11 23:07:18 +01:00 |
|
Peter Boyle
|
d1b0b7f5c6
|
Half prec comms dropping
|
2021-09-11 23:05:40 +01:00 |
|
Peter Boyle
|
381d8797d0
|
Drop half prec comms for now
|
2021-09-11 23:05:02 +01:00 |
|
Peter Boyle
|
b06526bc1e
|
Comment update
|
2021-08-30 21:15:39 -04:00 |
|
Peter Boyle
|
3044419111
|
Some sample code
|
2021-08-30 20:32:11 -04:00 |
|
Peter Boyle
|
114920b8de
|
Some example clean up
|
2021-08-25 12:24:17 +01:00 |
|
Peter Boyle
|
0d588b95f4
|
Bug fix to Example_Laplacian test
|
2021-08-23 23:14:26 +01:00 |
|
Peter Boyle
|
5b3c530aa7
|
Return value
|
2021-08-23 15:30:45 +01:00 |
|
Peter Boyle
|
c6a5499c8b
|
Fail on non-apple
|
2021-08-22 18:40:55 +01:00 |
|
Peter Boyle
|
ec9c3fe77a
|
Remove the file
|
2021-08-22 18:28:39 +01:00 |
|