PDA

View Full Version : Troubleshooting installing nvidia driver for Geforce RTX 2080



vannesa
2022-10-12, 21:23
Hi everyone,

I installed a dual-boot of Kali with Mint by following the docs, and then followed the "Install NVIDIA GPU Drivers" page. Nouveau keeps causing random crashes. Unfortunately after installing nvidia-driver and nvidia-cuda-toolkit only the nouveau module is being loaded. Secure Boot is disabled. I could only find troubleshooting for other distros on Google; does anyone know how I should begin figuring this out?

I tried blacklisting nouveau by editing the GRUB menu and adding "rdblacklist=nouveau" as a parameter; also placed a config file in modprobe.d/. After rebooting, kali crashes soon after the boot menu (before GUI) and doesn't respond to commands. Had to fresh install.


$ lspci | grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)

lspci -s 01:00.0 -v
01:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Dell TU104 [GeForce RTX 2080 Rev. A]
Flags: bus master, fast devsel, latency 0, IRQ 145, IOMMU group 1
Memory at eb000000 (32-bit, non-prefetchable) [size=16M]
Memory at a0000000 (64-bit, prefetchable) [size=256M]
Memory at b0000000 (64-bit, prefetchable) [size=32M]
I/O ports at e000 [size=128]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: nouveau
Kernel modules: nouveau, nvidia_tesla_drm, nvidia_current_drm, nvidia_tesla, nvidia_current


Output of nvidia-smi:


$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.


Output of clinfo:


$ clinfo
Number of platforms 1
Platform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 3.0 PoCL 3.0+debian Linux, None+Asserts, RELOC, LLVM 13.0.1, SLEEF, DISTRO, POCL_DEBUG
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_pocl_content_size
Platform Extensions with Version cl_khr_icd 0x400000 (1.0.0)
cl_pocl_content_size 0x400000 (1.0.0)
Platform Numeric Version 0xc00000 (3.0.0)
Platform Extensions function suffix POCL
Platform Host timer resolution 0ns

Platform Name Portable Computing Language
Number of devices 1
Device Name pthread-Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Device Vendor GenuineIntel
Device Vendor ID 0x8086
Device Version OpenCL 1.2 PoCL HSTR: pthread-x86_64-pc-linux-gnu-haswell
Driver Version 3.0+debian
Device OpenCL C Version OpenCL C 1.2 PoCL
Device Type CPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 12
Max clock frequency 4600MHz
Device Partition (core)
Max number of sub-devices 12
Supported partition types equally, by counts
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 4096x4096x4096
Max work group size 4096
Preferred work group size multiple (kernel) 8
Preferred / native vector sizes
char 16 / 16
short 16 / 16
int 8 / 8
long 4 / 4
half 0 / 0 (n/a)
float 8 / 8
double 4 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 14445117440 (13.45GiB)
Error Correction support No
Max memory allocation 4294967296 (4GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 12582912 (12MiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 268435456 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 128
Local memory type Global
Local memory size 262144 (256KiB)
Max number of constant args 8
Max constant buffer size 262144 (256KiB)
Max size of kernel argument 1024
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
printf() buffer size 16777216 (16MiB)
Built-in kernels (n/a)
Device Extensions cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Portable Computing Language
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [POCL]
clCreateContext(NULL, ...) [default] Success [POCL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name Portable Computing Language
Device Name pthread-Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) Success (1)
Platform Name Portable Computing Language
Device Name pthread-Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Portable Computing Language
Device Name pthread-Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz

ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.3.1
ICD loader Profile OpenCL 3.0



dpkg -l | grep -i icd
ii nvidia-egl-icd:amd64 470.141.03-2 amd64 NVIDIA EGL installable client driver (ICD)
ii nvidia-vulkan-icd:amd64 470.141.03-2 amd64 NVIDIA Vulkan installable client driver (ICD)
ii ocl-icd-libopencl1:amd64 2.3.1-1 amd64 Generic OpenCL ICD Loader
ii ocl-icd-opencl-dev:amd64 2.3.1-1 amd64 OpenCL development files
ii pocl-opencl-icd:amd64 3.0-6 amd64 pocl ICD


Device: Alienware Aurora R7

Fred Sheehan
2022-10-23, 10:55
Create the /etc/modprobe.d/blacklist-nouveau.conf file and add the following information to the file; (sounds like you did this)


blacklist nouveau
options nouveau modeset=0


Then Re-generate initramfs; (not sure you did this after?)

$sudo update-initramfs -u

Fred Sheehan
2022-10-23, 11:00
Of courtse you could also try removing nouveau completely;

sudo apt-get --purge remove xserver-xorg-video-nouveau

vannesa
2022-10-28, 00:04
Thanks, Fred. Even after updating initramfs it still boots up to an unresponsive screen. update-initramfs gives some missing firmware warnings:


sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.18.0-kali7-amd64
W: Possible missing firmware /lib/firmware/i915/skl_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/glk_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/cml_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/icl_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/ehl_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/ehl_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/tgl_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/tgl_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/dg1_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/tgl_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/adlp_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/adlp_dmc_ver2_14.bin for module i915


The firmware-misc package does not have v69 firmware, so I tried this: https://wiki.debian.org/Firmware#Firmware_missing_from_Debian

Which brought me on reboot to a black screen with a blinking cursor in the corner. Ctrl-alt-1-6 respond and I can get shell access. Also, nvidia-smi is seeing my card now. Am I on the right track? Should I now try troubleshooting XServer?

Fred Sheehan
2022-11-07, 04:40
Have a look at this;

https://linuxconfig.org/how-to-install-nvidia-driver-on-debian-10-buster-linux

vannesa
2022-12-09, 19:08
Thanks very much! I followed the second method, wild guessed that my driver should be 525.60.11, and it worked.