CUDA 与 cuDNN 安装

Xing Abao Lv3

CUDA

检查支持版本

开始菜单检索NVIDIA 控制面板,并打开:

找到左下角的系统信息,点击组件,出下以下界面;从 NVCUDA64.DLL 这一行后面的 CUDA 12.9.40 说明我的电脑所支持的最高版本是 12.9

下载对应版本的 CUDA 程序

CUDA Toolkit Archive,页面检索下载对应的版本即可。

然后结合自己的系统信息,一步一步往下勾选即可:

安装 CUDA 程序

下载完成之后,双击点击程序,开始安装,首先设置临时解压目录,一般默认即可:

解压完成后会有安装程序,同意即可,接下来的安装选项自定义:

在安装 CUDA 中取消这个 Visual Studio Integration 有关的组件:

底下这三个也没必要,可安可不安,看个人选择:

安装路径任然建议默认,在 Program Files 中,方便以后寻找,建议记住这里的 CUDA 路径:

然后一直确认最后关闭即可,这里不多赘述:

验证是否安装成功

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\bin添加到系统环境。

1
2
3
4
5
6
> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Apr__9_19:29:17_Pacific_Daylight_Time_2025
Cuda compilation tools, release 12.9, V12.9.41
Build cuda_12.9.r12.9/compiler.35813241_0

cnDNN

下载对应版本的 cnDNN 程序

https://developer.nvidia.com/cudnn,在 cuDNN 的版本中,选择支持该版本的 CUDA 即可,然后点击下载即可:

安装 cnDNN 程序

接下来,解压该压缩包,然后复制其中的文件夹:

粘贴到 CUDA 的安装目录下,即完成了 cuDNN 的安装:

验证是否安装成功

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\extras\demo_suite,执行bandwidthTest.exe,如果运行结果出现了 PASS 即代表安装成功。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\extras\demo_suite>bandwidthTest.exe
[CUDA Bandwidth Test] - Starting...
Running on...

Device 0: NVIDIA GeForce GTX 1080 Ti
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6367.3

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6446.7

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 383865.7

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

再输入命令deviceQuery.exe查询设备,这里会显示你的 GPU 型号,以及 PASS,表示 CUDA 和 cuDNN 都安装成功了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9\extras\demo_suite>deviceQuery.exe
deviceQuery.exe Starting...

CUDA Device Query (Runtime API)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GTX 1080 Ti"
CUDA Driver Version / Runtime Version 12.9 / 12.9
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 11264 MBytes (11811028992 bytes)
(28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores
GPU Max Clock rate: 1645 MHz (1.64 GHz)
Memory Clock rate: 5505 Mhz
Memory Bus Width: 352-bit
L2 Cache Size: 2883584 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: zu bytes
Total amount of shared memory per block: zu bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: zu bytes
Texture alignment: zu bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.9, CUDA Runtime Version = 12.9, NumDevs = 1, Device0 = NVIDIA GeForce GTX 1080 Ti
Result = PASS

PyTorch

进入PyTorch官网,选择需要安装的 PyTorch 版本。这里安装方式可以选择 pip,可以看到生成的 command,只需要进入该网址下载即可:

1
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

GPU test

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import torch
import time

def check_gpu_availability():

"""检查 GPU 是否可用"""
if torch.cuda.is_available():
print(f"✅ GPU 可用,设备名称: {torch.cuda.get_device_name(0)}")
print(f"CUDA 版本: {torch.version.cuda}")
return True
else:
print("❌ GPU 不可用,将使用 CPU")
return False

def test_gpu_performance():
"""测试 GPU 计算性能"""
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"使用设备: {device}")

# 创建两个大型随机张量
size = (10000, 10000) # 调整大小以测试性能
a = torch.randn(size, device=device)
b = torch.randn(size, device=device)

# 测试矩阵乘法(GPU 加速)
start_time = time.time()
c = torch.matmul(a, b)
elapsed_time = time.time() - start_time

print(f"矩阵乘法耗时: {elapsed_time:.4f} 秒")

if __name__ == "__main__":

print("=== GPU 测试脚本 ===")
if check_gpu_availability():
test_gpu_performance()
1
2
3
4
5
=== GPU 测试脚本 ===
✅ GPU 可用,设备名称: NVIDIA GeForce GTX 1080 Ti
CUDA 版本: 12.8
使用设备: cuda
矩阵乘法耗时: 0.0302 秒

GPU vs CPU

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import time
import torch

def compare_cpu_gpu():
size = (20000, 20000)

# CPU 计算
start = time.time()
a_cpu = torch.randn(size)
b_cpu = torch.randn(size)
torch.matmul(a_cpu, b_cpu)
cpu_time = time.time() - start

# GPU 计算
start = time.time()
a_gpu = torch.randn(size, device="cuda")
b_gpu = torch.randn(size, device="cuda")
torch.matmul(a_gpu, b_gpu)
gpu_time = time.time() - start

print(f"CPU 耗时: {cpu_time:.4f} 秒")
print(f"GPU 耗时: {gpu_time:.4f} 秒")
print(f"GPU 加速比: {cpu_time / gpu_time:.2f}x")

compare_cpu_gpu()
1
2
3
CPU 耗时: 17.3024 秒
GPU 耗时: 0.0095 秒
GPU 加速比: 1819.75x
  • Title: CUDA 与 cuDNN 安装
  • Author: Xing Abao
  • Created at : 2025-05-03 13:37:09
  • Updated at : 2025-05-04 12:47:38
  • Link: https://bioinformatics.vip/2025/05/03/installation/CUDA 与 cuDNN 安装/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments