3950X 2080Ti 9980XE 150k級實驗室運算機 - 3C

By Isabella
at 2020-05-08T18:58

Table of Contents

原本去年的經費存著是要在9月的時候買3950x
結果AMD跳票 3900x也缺貨
只好買一顆6萬的9980xe+一些9900k
(結果11月才有3950x 要買的話採購流程也趕不及年底關帳)

現在就來測測看今年是該繼續買兩萬五的3950x還是三萬五的10980XE
短時間之內intel應該也擠不出什麼來
這篇的時效應該能維持一陣子

測試軟體細節可參考 #1UjJiMol (PC_Shopping)

===

測試硬體
AMD Ryzen 9 3950X
Thermalright Silver Arrow IB-E Extreme
ASUS Pro WS X570-ACE
4x Kingston KVR32N22D8/32
2x GIGABYTE RTX2080Ti TURBO 11G (rev. 2.0)
MSI NVLink GPU Bridge 3-Slots
XPG SX8200Pro 1TB
全漢 CANNON 2000W
全漢 CMT230 炫戰士
(機殼兩個前風扇有上移
從原本兩個風扇吹硬碟電源倉與顯卡
改成下面那個吹顯卡上面對準m.2)
(收到貨才發現技嘉顯卡是rev. 2.0
1.0跟2.0的差異在電源接頭位置
1.0在側邊桌機來說較好安裝
機殼不夠寬電源線可能會卡到
2.0在後面應該對機架式相容度較高
但機殼不夠深也是很難裝)

Intel Core i9-9980XE
Thermalright Silver Arrow IB-E Extreme
ASUS WS X299 PRO
8x A-DATA AD4U2666732G19-RGN
2x ASUS TURBO-RTX2080TI-11G
Quadro RTX 6000/8000 NVLink HB Bridge 2-Slot
ASUS HYPER M.2 X4 MINI CARD
└XPG SX8200Pro 1TB
全漢 CANNON 2000W
MSI MPG GUNGNIR 100
(這殼的背線空間沒有很寬
前風扇風力沒有很大)

BIOS版本與設定
ASUS Pro WS X570-ACE 1302
PBO manual
PPT 1000W
TDC 1000A
EDC 1000A
其餘預設
DDR4-3200 (22-22-22) 1.2V
(我懷疑這版本的BIOS PBO是有問題的
測試成績僅供參考
預設p95全核 sse2約3.8GHz avx2約3.4GHz
PBO Enable p95avx2瞬間黑畫面
1000/1000/1000 sse2約3.8GHz avx2約3.8GHz
手調200~300A sse2約4.0GHz avx2約3.9GHz
Max CPU Boost Clock Override設200MHz會有一堆核心鎖在500MHz)

ASUS WS X299 PRO 2002
Long Duration Package Power Limit 4095W
Package Power Time Window 127s
Short Duration Package Power Limit 4095W
CPU Integrated VR Current Limit 1023.875A
前上1風扇測點VRM
前下23風扇測點PCH
後風扇測點PCH
20度C 20% 65度C 70% 70度C 100%
其餘預設
DDR4-2666 (19-19-19) 1.2V

另外使用
nvidia-smi -pm 1
nvidia-smi -pl 280
解除2080ti到280W

OS
Ubuntu Server 20.04 LTS kernel 5.4.0-26
CUDA driver 440.64

頻率溫度功耗
3950x
sensors讀取溫度
turbostat讀取頻率瓦數

9980xe
turbostat讀取溫度頻率瓦數

2080ti
nvidia-smi讀取溫度頻率瓦數

待機
3950x+2x2080ti
CPU 2200MHz 32度C 20W
GPU 300MHz 32度C 13W
延長線 95W

9980xe+2x2080ti
CPU 1200MHz 34度C 12W
GPU 300MHz 35度C 10W
延長線 95W

Prime95 Version 29.8 build 6
Small FFTs(L1/L2/L3)
3950x sse2
1秒
CPU 3826MHz 54.5度C 131W
延長線 227W
1分鐘
CPU 3768MHz 62.5度C 125W
延長線 218W
https://youtu.be/kDgSxc9guZc

3950x fma3
1秒
CPU 3775MHz 60.3度C 156W
延長線 263W
1分鐘
CPU 3753MHz 72.5度C 161W
延長線 271W
https://youtu.be/fZ3C3hk8TCk

9980xe sse2
1秒
CPU 3800MHz 66度C 257W
延長線 418W
1分鐘
CPU 3800MHz 87度C 265W
延長線 430W
https://youtu.be/WZj_AQrFpME

9980xe fma3
1秒
CPU 3300MHz 61度C 241W
延長線 388W
1分鐘
CPU 3300MHz 80度C 243W
延長線 395W
https://youtu.be/i1VyFFrVi0U

9980xe avx512
1秒
CPU 2800MHz 59度C 210W
延長線 344W
1分鐘
CPU 2800MHz 74度C 208W
延長線 343W
https://youtu.be/vKs5G91rL7c

1xGPU tensorflow resnet50 training fp16 batch128
1x2080ti on 3950x
1秒
GPU 1830MHz 48度C 283W
延長線 416W
1分鐘
GPU 1815MHz 68度C 277W
延長線 369W
https://youtu.be/T2XE2HlIeLg

1x2080ti on 9980xe
1秒
GPU 1875MHz 52度C 274W
延長線 428W
1分鐘
GPU 1815MHz 78度C 262W
延長線 388W
https://youtu.be/pGnW6Am8jaA

p95+2GPU tensorflow
3950x avx2 + 2x2080ti
延長線 796W
https://youtu.be/MzaYkBRSAX0

9980xe sse2 + 2x2080ti
延長線 946W
https://youtu.be/EAauv9QAHkQ

CPU理論效能測試
./2006-Core2 //使用SSE2 模擬一般/普通/傳統/上古遺跡應用程式
./2013-Haswell //使用AVX/FMA3 模擬高度最佳化的現代應用程式
./2017-SkylakePurley //使用AVX512 Intel的加分題

| 128-bit SSE2 | 256-bit AVX | 256-bit FMA3
| Multiply + Add | Multiply + Add | Fused Multiply Add
| 1T | nT | 1T | nT | 1T | nT
3950x| 44.928| 995.664 | 78.912 | 1552.99 | 123.072 | 1791.36
9980xe| 35.04 | 546.144 | 62.016 | 948.384| 123.264 | 1882.37

| 512-bit AVX512
| Fused Multiply Add
| 1T | nT
9980xe| 235.008| 3227.14

CPU計算效能測試
|Cholesky|Det |Dot |Fft |Inv |Lu |Qr |Svd
3950x pip | 511.02 | 639.38| 648.55|5.17|433.32|575.97|122.69| 7.22
3950x mkl | 585.48 | 624.04| 247.31|5.29|285.64|536.54|333.59|11.15
debug mkl | 561.61 | 519.77| 626.46|6.40|479.98|454.04|376.93|12.73
9980xe pip | 597.74 | 699.82| 766.01|3.91|483.11|573.80|160.14|11.59
9980xe mkl | 820.29 |1086.11|1355.97|3.74|712.80|749.14|366.21|14.17

IO測試
|3950x |9980xe
1MSeqQ8T1r|2784MB/s |2387MB/s
1MSeqQ8T1w|2867MB/s |2324MB/s
1MSeqQ1T1r|2779MB/s |2405MB/s
1MSeqQ1T1w|2834MB/s |2283MB/s
4kQ32T16r | 697MB/s(170k) | 655MB/s(160k)
4kQ32T16w |1498MB/s(366k) |1492MB/s(364k)
4kQ1T1r |79.7MB/s(19.5k)|65.9MB/s(16.1k)
4kQ1T1w | 234MB/s(57.1k)| 230MB/s(56.2k)
(這兩顆SSD都是新的且都接在CPU上
應該就是Intel漏洞的影響)

nvidia-smi topo -m

3950x
GPU0 GPU1 CPU Affinity
GPU0 X NV2 0-31
GPU1 NV2 X 0-31

9980xe
GPU0 GPU1 CPU Affinity
GPU0 X NV2 0-35
GPU1 NV2 X 0-35

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between
NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe
Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically
the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the
PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

nvidia-smi topo -mp

3950x
GPU0 GPU1 CPU Affinity
GPU0 X PHB 0-31
GPU1 PHB X 0-31

9980xe
GPU0 GPU1 CPU Affinity
GPU0 X SYS 0-35
GPU1 SYS X 0-35

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between
NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe
Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically
the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the
PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge

p2pBandwidthLatencyTest

3950x
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 529.77 6.24
1 6.25 531.67
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 530.74 46.92
1 46.93 531.33
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 533.64 11.11
1 11.10 535.07
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 533.64 93.47
1 93.68 532.94
P2P=Disabled Latency Matrix (us)
GPU 0 1
0 1.90 15.96
1 12.55 1.93

CPU 0 1
0 2.82 7.58
1 7.61 3.00
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1
0 1.90 2.04
1 2.06 1.94

CPU 0 1
0 3.07 2.50
1 2.51 3.06

9980xe
Unidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 528.38 11.23
1 11.24 531.12
Unidirectional P2P=Enabled Bandwidth (P2P Writes) Matrix (GB/s)
D\D 0 1
0 530.90 46.94
1 46.97 531.39
Bidirectional P2P=Disabled Bandwidth Matrix (GB/s)
D\D 0 1
0 535.18 20.01
1 20.07 534.61
Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
D\D 0 1
0 533.96 93.68
1 93.53 532.69
P2P=Disabled Latency Matrix (us)
GPU 0 1
0 1.88 15.22
1 13.27 1.83

CPU 0 1
0 2.59 6.90
1 6.93 2.51
P2P=Enabled Latency (P2P Writes) Matrix (us)
GPU 0 1
0 1.88 1.77
1 1.75 1.84

CPU 0 1
0 2.73 1.93
1 1.92 2.56

Tensorflow測試 resnet50
1x2080Ti
|fp32batch64|fp32batch128|fp16batch64|fp16batch128|fp16batch256
3950x | 266.86 | 240.54 | 669.76 | 683.40 | 566.51
9980xe | 269.42 | 264.58 | 672.30 | 685.81 | 640.76

2x2080ti fp32
| batch32 | batch64 | batch128
| global64 | global128 | global256
3950x | 540.22 | 592.19 | 387.67
9980xe | 541.25 | 597.76 | 486.02

2x2080ti fp16
| batch32 | batch64 | batch128 | batch256
| global64 | global128 | global256 | global512
3950x | 1103.82 | 1333.90 | 1479.71 | 1180.18
9980xe | 1078.67 | 1288.72 | 1400.09 | 1333.67

Pytorch 與 AMP(Apex) 測試
bert | fp32| fp16|
3950x 2x2080ti |00:26.38|00:26.22|
9980xe 2x2080ti |00:29.67|00:34.92|

===

看來這個價位(100k~200K) 若經費充足
需要CPU多核數學效能或大容量RAM該買10980xe
四通道記憶體 avx512兩倍輸出 MKL最佳化不是開玩笑的
RAM大一倍(256GB vs 128GB)
主機板用ASUS WS X299 PRO/SE還可以有內建顯示+IPMI

如果經費不足購買3900x應該較合理

要雙GPU主機純做DL的話
3600x配x8/x8板+2張二手1080ti應該是最高CP值組合

--

Tags: 3C