在虚拟机中部署NVIDIA 470.82.01 GPU驱动时报错
[root@cloud-master ~]# ./NVIDIA-Linux-x86_64-470.82.01.run --kernel-source-path=/usr/src/kernels/$(uname -r) -k $(uname -r) -s
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 470.82.01..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
WARNING: You do not appear to have an NVIDIA GPU supported by the 470.82.01 NVIDIA Linux graphics driver installed in this system. For further details, please see the appendix SUPPORTED NVIDIA GRAPHICS CHIPS in the README available on the Linux driver download page at
www.nvidia.com.
ERROR: Unable to load the 'nvidia-drm' kernel module.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
问题解决思路
官方驱动介绍:https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-470-82-01/index.html
虚拟机未开启直通
虚拟机未开启直通,通过以下命令可以查看到内容
[root@cloud-master ~]# cpuid |grep hypervisor_id
Disclaimer: cpuid may not support decoding of all cpuid registers.
hypervisor_id = "VMwareVMware"
hypervisor_id = "VMwareVMware"
虚拟机开启GPU直通,通过lshw -c video
命令可以直接看到显卡的名字
[root@cloud-master ~]# lshw -c video
*-display:0
description: VGA compatible controller
product: GD 5446
vendor: Cirrus Logic
physical id: 2
bus info: pci@0000:00:02.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: vga_controller rom
configuration: driver=cirrus latency=0
resources: irq:0 memory:f0000000-f1ffffff memory:fe050000-fe050fff memory:fe040000-fe04ffff
*-display:1 UNCLAIMED
description: 3D controller
product: TU104GL [Tesla T4]
vendor: NVIDIA Corporation
physical id: 5
bus info: pci@0000:00:05.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msix cap_list
configuration: latency=0
resources: memory:fc000000-fcffffff memory:d0000000-dfffffff memory:f2000000-f3ffffff
*-display:2 UNCLAIMED
description: 3D controller
product: TU104GL [Tesla T4]
vendor: NVIDIA Corporation
physical id: 6
bus info: pci@0000:00:06.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msix cap_list
configuration: latency=0
resources: memory:fd000000-fdffffff memory:e0000000-efffffff memory:f4000000-f5ffffff
内核版本不符合要求
参考官方文档:https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements