[gentoo-user] "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(

public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-user] "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
@ 2018-08-14 10:35 tuxic
  2018-08-14 20:16 ` [gentoo-user] " Nikos Chantziaras
  0 siblings, 1 reply; 20+ messages in thread
From: tuxic @ 2018-08-14 10:35 UTC (permalink / raw
  To: Gentoo

Hi,

after upgrading to nvidia-drivers-396.51 no CUDA devices were found.
Last version, which works for me is nvidia-drivers-396.24-r1.

The problem was detected first while using Blender which normally
renders on GPU but falls back to CPU rendering then (jawn!).

I submitted a bug report to the blender devs and Brecht van Lommel
pointed me to the tool  http://cuda-z.sourceforge.net/.

After installing that it shows that there are CUDA devices detectable
with the old driver and none with all drivers newer than that.

So it is no blender related problem.

libcuda is installed with both versions of the driver and checked for
being of the currently installed version.

How can I fix that?

Description of the hardware/software, which was used:
Hardware:
ASUS Crosshair IV Formula motherboard
CPU : AMD Phenom(tm) II X6 1090T Processor
8 GB Dual channel RAM
The system is NOT overclocked.

Two Nvidia-cards are installed. One for the desktop
and one for Blender alone:
02:00.0 VGA compatible controller: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1) (Desktop)
07:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1) (Blender)

Software: Linux kernel 4.18.00 currently. The same problem was arises
with older kernels (4.17 ..).
Gentoo is updated daily.

Cheers!
Meino

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-14 10:35 [gentoo-user] "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1( tuxic
@ 2018-08-14 20:16 ` Nikos Chantziaras
  2018-08-15  2:19   ` tuxic
  2018-08-15 10:33   ` Corentin “Nado” Pazdera
  0 siblings, 2 replies; 20+ messages in thread
From: Nikos Chantziaras @ 2018-08-14 20:16 UTC (permalink / raw
  To: gentoo-user

On 14/08/18 13:35, tuxic@posteo.de wrote:
> Hi,
> 
> after upgrading to nvidia-drivers-396.51 no CUDA devices were found.
> Last version, which works for me is nvidia-drivers-396.24-r1.

Do you have the "uvm" USE flag set? It might be required for CUDA, but 
it's disabled by default (perhaps wrongly, because USE flags should 
follow upstream defaults unless there's a reason not to.)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-14 20:16 ` [gentoo-user] " Nikos Chantziaras
@ 2018-08-15  2:19   ` tuxic
  2018-08-15 10:33   ` Corentin “Nado” Pazdera
  1 sibling, 0 replies; 20+ messages in thread
From: tuxic @ 2018-08-15  2:19 UTC (permalink / raw
  To: gentoo-user

On 08/14 11:16, Nikos Chantziaras wrote:
> On 14/08/18 13:35, tuxic@posteo.de wrote:
> > Hi,
> > 
> > after upgrading to nvidia-drivers-396.51 no CUDA devices were found.
> > Last version, which works for me is nvidia-drivers-396.24-r1.
> 
> Do you have the "uvm" USE flag set? It might be required for CUDA, but it's
> disabled by default (perhaps wrongly, because USE flags should follow
> upstream defaults unless there's a reason not to.)
> 
> 
Yes it is:

(this is the version, which is currentlu still working
    Installed versions:  396.24-r1(0/396)^md(08:31:04 PM 08/14/2018)(X driver kms static-libs tools uvm -acpi -compat -gtk3 -multilib -pax_kernel -wayland ABI_MIPS="-n32 -n64 -o32" ABI_PPC="-32 -64" ABI_S390="-32 -64" ABI_X86="64 -32 -x32" KERNEL="linux -FreeBSD")

and set via /etc/portage/package.use 

# required by app-admin/conky-1.10.6-r1::gentoo[nvidia,X]
# required by @selected
# required by @world (argument)
>=x11-drivers/nvidia-drivers-378.13 static-libs uvm

What else could be the reason for the problem?
How can I fix it?



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-14 20:16 ` [gentoo-user] " Nikos Chantziaras
  2018-08-15  2:19   ` tuxic
@ 2018-08-15 10:33   ` Corentin “Nado” Pazdera
  2018-08-15 11:16     ` tuxic
  2018-08-15 11:39     ` Corentin “Nado” Pazdera
  1 sibling, 2 replies; 20+ messages in thread
From: Corentin “Nado” Pazdera @ 2018-08-15 10:33 UTC (permalink / raw
  To: gentoo-user

August 15, 2018 4:19 AM, tuxic@posteo.de wrote:

> On 08/14 11:16, Nikos Chantziaras wrote:
> 
>> On 14/08/18 13:35, tuxic@posteo.de wrote:
>> Hi,
>> 
>> after upgrading to nvidia-drivers-396.51 no CUDA devices were found.
>> Last version, which works for me is nvidia-drivers-396.24-r1.
>> 
>> Do you have the "uvm" USE flag set? It might be required for CUDA, but it's
>> disabled by default (perhaps wrongly, because USE flags should follow
>> upstream defaults unless there's a reason not to.)
> 
> Yes it is:
> 
> (this is the version, which is currentlu still working
> Installed versions: 396.24-r1(0/396)^md(08:31:04 PM 08/14/2018)(X driver kms static-libs tools uvm
> -acpi -compat -gtk3 -multilib -pax_kernel -wayland ABI_MIPS="-n32 -n64 -o32" ABI_PPC="-32 -64"
> ABI_S390="-32 -64" ABI_X86="64 -32 -x32" KERNEL="linux -FreeBSD")
> 
> and set via /etc/portage/package.use
> 
> # required by app-admin/conky-1.10.6-r1::gentoo[nvidia,X]
> # required by @selected
> # required by @world (argument)
>> =x11-drivers/nvidia-drivers-378.13 static-libs uvm
> 
> What else could be the reason for the problem?
> How can I fix it?

Can you also show content of modprobe.d file ?
Did you read the whole wiki page ? Did you check for MSI interrupts ?
https://wiki.gentoo.org/wiki/NVidia/nvidia-drivers#Driver_fails_to_initialize_when_MSI_interrupts_are_enabled

Regards,
--
Corentin “Nado” Pazdera


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 10:33   ` Corentin “Nado” Pazdera
@ 2018-08-15 11:16     ` tuxic
  2018-08-15 11:39     ` Corentin “Nado” Pazdera
  1 sibling, 0 replies; 20+ messages in thread
From: tuxic @ 2018-08-15 11:16 UTC (permalink / raw
  To: gentoo-user

On 08/15 10:33, Corentin “Nado” Pazdera wrote:
> August 15, 2018 4:19 AM, tuxic@posteo.de wrote:
> 
> > On 08/14 11:16, Nikos Chantziaras wrote:
> > 
> >> On 14/08/18 13:35, tuxic@posteo.de wrote:
> >> Hi,
> >> 
> >> after upgrading to nvidia-drivers-396.51 no CUDA devices were found.
> >> Last version, which works for me is nvidia-drivers-396.24-r1.
> >> 
> >> Do you have the "uvm" USE flag set? It might be required for CUDA, but it's
> >> disabled by default (perhaps wrongly, because USE flags should follow
> >> upstream defaults unless there's a reason not to.)
> > 
> > Yes it is:
> > 
> > (this is the version, which is currentlu still working
> > Installed versions: 396.24-r1(0/396)^md(08:31:04 PM 08/14/2018)(X driver kms static-libs tools uvm
> > -acpi -compat -gtk3 -multilib -pax_kernel -wayland ABI_MIPS="-n32 -n64 -o32" ABI_PPC="-32 -64"
> > ABI_S390="-32 -64" ABI_X86="64 -32 -x32" KERNEL="linux -FreeBSD")
> > 
> > and set via /etc/portage/package.use
> > 
> > # required by app-admin/conky-1.10.6-r1::gentoo[nvidia,X]
> > # required by @selected
> > # required by @world (argument)
> >> =x11-drivers/nvidia-drivers-378.13 static-libs uvm
> > 
> > What else could be the reason for the problem?
> > How can I fix it?
> 
> Can you also show content of modprobe.d file ?
> Did you read the whole wiki page ? Did you check for MSI interrupts ?
> https://wiki.gentoo.org/wiki/NVidia/nvidia-drivers#Driver_fails_to_initialize_when_MSI_interrupts_are_enabled
> 
> Regards,
> --
> Corentin “Nado” Pazdera
> 

The wiki-page is old...it speaks of nvidia-driver-174.


modprobe.d/nvidia.conf:
----------------------------
# Nvidia drivers support
alias char-major-195 nvidia
alias /dev/nvidiactl char-major-195

# To tweak the driver the following options can be used, note that
# you should be careful, as it could cause instability!! For more 
# options see /usr/share/doc/nvidia-drivers-396.24-r1/README 
#
# !!! SECURITY WARNING !!!
# DO NOT MODIFY OR REMOVE THE DEVICE FILE RELATED OPTIONS UNLESS YOU KNOW
# WHAT YOU ARE DOING.
# ONLY ADD TRUSTED USERS TO THE VIDEO GROUP, THESE USERS MAY BE ABLE TO CRASH,
# COMPROMISE, OR IRREPARABLY DAMAGE THE MACHINE.
options nvidia NVreg_DeviceFileMode=432 NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=27 NVreg_ModifyDeviceFiles=1


modprobe.d/nvidia-rmmod.conf
----------------------------
# Nvidia UVM support
remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia


All the configurations are working all the years up to x11-drivers/nvidia-drivers-396.24-r1.
After that, no CUDA device was found.
Based on logical reasons, I would tend to think, that it is something
version specific and no global setting which is valid since
nvidia-driver-174.

Regards,



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 10:33   ` Corentin “Nado” Pazdera
  2018-08-15 11:16     ` tuxic
@ 2018-08-15 11:39     ` Corentin “Nado” Pazdera
  2018-08-15 12:02       ` tuxic
  2018-08-15 12:45       ` Corentin “Nado” Pazdera
  1 sibling, 2 replies; 20+ messages in thread
From: Corentin “Nado” Pazdera @ 2018-08-15 11:39 UTC (permalink / raw
  To: gentoo-user

August 15, 2018 1:16 PM, tuxic@posteo.de wrote:

> The wiki-page is old...it speaks of nvidia-driver-174.

Yeah, for legacy cards... If you check the history its also been updated quite frequently.

> modprobe.d/nvidia.conf:
> ----------------------------
> # Nvidia drivers support
> alias char-major-195 nvidia
> alias /dev/nvidiactl char-major-195
> 
> # To tweak the driver the following options can be used, note that
> # you should be careful, as it could cause instability!! For more
> # options see /usr/share/doc/nvidia-drivers-396.24-r1/README
> #
> # !!! SECURITY WARNING !!!
> # DO NOT MODIFY OR REMOVE THE DEVICE FILE RELATED OPTIONS UNLESS YOU KNOW
> # WHAT YOU ARE DOING.
> # ONLY ADD TRUSTED USERS TO THE VIDEO GROUP, THESE USERS MAY BE ABLE TO CRASH,
> # COMPROMISE, OR IRREPARABLY DAMAGE THE MACHINE.
> options nvidia NVreg_DeviceFileMode=432 NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=27
> NVreg_ModifyDeviceFiles=1
> 
> modprobe.d/nvidia-rmmod.conf
> ----------------------------
> # Nvidia UVM support
> remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia
> 
> All the configurations are working all the years up to x11-drivers/nvidia-drivers-396.24-r1.
> After that, no CUDA device was found.
> Based on logical reasons, I would tend to think, that it is something
> version specific and no global setting which is valid since
> nvidia-driver-174.

Updates may need to change a config file after bringing breaking changes, it might not be the cause
I agree. But its possible.

Is CUDA disabled on both cards? I have a 970Ti, although my MB is different, we might try to
compare the big differences in our systems?

--
Corentin “Nado” Pazdera


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 11:39     ` Corentin “Nado” Pazdera
@ 2018-08-15 12:02       ` tuxic
  2018-08-15 12:35         ` Nikos Chantziaras
  2018-08-15 12:45       ` Corentin “Nado” Pazdera
  1 sibling, 1 reply; 20+ messages in thread
From: tuxic @ 2018-08-15 12:02 UTC (permalink / raw
  To: gentoo-user

On 08/15 11:39, Corentin “Nado” Pazdera wrote:
> August 15, 2018 1:16 PM, tuxic@posteo.de wrote:
> 
> > The wiki-page is old...it speaks of nvidia-driver-174.
> 
> Yeah, for legacy cards... If you check the history its also been updated quite frequently.
> 
> > modprobe.d/nvidia.conf:
> > ----------------------------
> > # Nvidia drivers support
> > alias char-major-195 nvidia
> > alias /dev/nvidiactl char-major-195
> > 
> > # To tweak the driver the following options can be used, note that
> > # you should be careful, as it could cause instability!! For more
> > # options see /usr/share/doc/nvidia-drivers-396.24-r1/README
> > #
> > # !!! SECURITY WARNING !!!
> > # DO NOT MODIFY OR REMOVE THE DEVICE FILE RELATED OPTIONS UNLESS YOU KNOW
> > # WHAT YOU ARE DOING.
> > # ONLY ADD TRUSTED USERS TO THE VIDEO GROUP, THESE USERS MAY BE ABLE TO CRASH,
> > # COMPROMISE, OR IRREPARABLY DAMAGE THE MACHINE.
> > options nvidia NVreg_DeviceFileMode=432 NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=27
> > NVreg_ModifyDeviceFiles=1
> > 
> > modprobe.d/nvidia-rmmod.conf
> > ----------------------------
> > # Nvidia UVM support
> > remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia
> > 
> > All the configurations are working all the years up to x11-drivers/nvidia-drivers-396.24-r1.
> > After that, no CUDA device was found.
> > Based on logical reasons, I would tend to think, that it is something
> > version specific and no global setting which is valid since
> > nvidia-driver-174.
> 
> Updates may need to change a config file after bringing breaking changes, it might not be the cause
> I agree. But its possible.
> 
> Is CUDA disabled on both cards? I have a 970Ti, although my MB is different, we might try to
> compare the big differences in our systems?
> 
> --
> Corentin “Nado” Pazdera
> 

...sorry, I am no native speaker...I dont understand.

I did not know how to disable CUDA on both cards.
So...since it works perfectly with the old driver I would think: 
No, CUDA is enabled (or at least the old driver does this for me).

Then I do an "emerge <new nvidia-driver-version>" and CUDA stops
working. I do not change anything else nor do I know, who/what could
disable CUDA on both cards ... except for the driver itsself.

This is weird.

Again, for logical reasons I think, that the culprit is either the
driver itsself or a missing (and therefor undocumented) configuration
step needed for the new drivers.

The cards are not "old" in any sense. 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 12:02       ` tuxic
@ 2018-08-15 12:35         ` Nikos Chantziaras
  2018-08-15 12:49           ` tuxic
  0 siblings, 1 reply; 20+ messages in thread
From: Nikos Chantziaras @ 2018-08-15 12:35 UTC (permalink / raw
  To: gentoo-user

On 15/08/18 15:02, tuxic@posteo.de wrote:
> Then I do an "emerge <new nvidia-driver-version>" and CUDA stops
> working. I do not change anything else nor do I know, who/what could
> disable CUDA on both cards ... except for the driver itsself.

Dumb question, but just to be sure: did you reboot after upgrading the 
driver? The driver never worked for me correctly, unless I reboot. 
Unloading the driver with "modprobe -r" and loading the new one doesn't 
work correctly, only rebooting does.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 11:39     ` Corentin “Nado” Pazdera
  2018-08-15 12:02       ` tuxic
@ 2018-08-15 12:45       ` Corentin “Nado” Pazdera
  2018-08-15 12:59         ` tuxic
  2018-08-15 14:32         ` Corentin “Nado” Pazdera
  1 sibling, 2 replies; 20+ messages in thread
From: Corentin “Nado” Pazdera @ 2018-08-15 12:45 UTC (permalink / raw
  To: gentoo-user

August 15, 2018 2:02 PM, tuxic@posteo.de wrote:
> ...sorry, I am no native speaker...I dont understand.
> 
> I did not know how to disable CUDA on both cards.
> So...since it works perfectly with the old driver I would think:
> No, CUDA is enabled (or at least the old driver does this for me).
> 
> Then I do an "emerge <new nvidia-driver-version>" and CUDA stops
> working. I do not change anything else nor do I know, who/what could
> disable CUDA on both cards ... except for the driver itsself.
> 
> This is weird.
> 
> Again, for logical reasons I think, that the culprit is either the
> driver itsself or a missing (and therefor undocumented) configuration
> step needed for the new drivers.
> 
> The cards are not "old" in any sense.

Ok, what I meant is : how did you check CUDA "was not working"? And could you check it on both cards.

Also, as said by realnc, did you reboot ?

--
Corentin “Nado” Pazdera


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 12:35         ` Nikos Chantziaras
@ 2018-08-15 12:49           ` tuxic
  0 siblings, 0 replies; 20+ messages in thread
From: tuxic @ 2018-08-15 12:49 UTC (permalink / raw
  To: gentoo-user

On 08/15 03:35, Nikos Chantziaras wrote:
> On 15/08/18 15:02, tuxic@posteo.de wrote:
> > Then I do an "emerge <new nvidia-driver-version>" and CUDA stops
> > working. I do not change anything else nor do I know, who/what could
> > disable CUDA on both cards ... except for the driver itsself.
> 
> Dumb question, but just to be sure: did you reboot after upgrading the
> driver? The driver never worked for me correctly, unless I reboot. Unloading
> the driver with "modprobe -r" and loading the new one doesn't work
> correctly, only rebooting does.
> 
> 

Yes, I did reboot the system



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 12:45       ` Corentin “Nado” Pazdera
@ 2018-08-15 12:59         ` tuxic
  2018-08-15 14:32         ` Corentin “Nado” Pazdera
  1 sibling, 0 replies; 20+ messages in thread
From: tuxic @ 2018-08-15 12:59 UTC (permalink / raw
  To: gentoo-user

On 08/15 12:45, Corentin “Nado” Pazdera wrote:
> August 15, 2018 2:02 PM, tuxic@posteo.de wrote:
> > ...sorry, I am no native speaker...I dont understand.
> > 
> > I did not know how to disable CUDA on both cards.
> > So...since it works perfectly with the old driver I would think:
> > No, CUDA is enabled (or at least the old driver does this for me).
> > 
> > Then I do an "emerge <new nvidia-driver-version>" and CUDA stops
> > working. I do not change anything else nor do I know, who/what could
> > disable CUDA on both cards ... except for the driver itsself.
> > 
> > This is weird.
> > 
> > Again, for logical reasons I think, that the culprit is either the
> > driver itsself or a missing (and therefor undocumented) configuration
> > step needed for the new drivers.
> > 
> > The cards are not "old" in any sense.
> 
> Ok, what I meant is : how did you check CUDA "was not working"? And could you check it on both cards.
> 
> Also, as said by realnc, did you reboot ?
> 
> --
> Corentin “Nado” Pazdera
> 

Yes I did reboot the sustem. In my initial mail I mentioned a tool
called CUDA-Z and Blender, which both reports a missing CUDA device.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 12:45       ` Corentin “Nado” Pazdera
  2018-08-15 12:59         ` tuxic
@ 2018-08-15 14:32         ` Corentin “Nado” Pazdera
  2018-08-15 15:11           ` tuxic
  1 sibling, 1 reply; 20+ messages in thread
From: Corentin “Nado” Pazdera @ 2018-08-15 14:32 UTC (permalink / raw
  To: gentoo-user

August 15, 2018 2:59 PM, tuxic@posteo.de wrote:

> Yes I did reboot the sustem. In my initial mail I mentioned a tool
> called CUDA-Z and Blender, which both reports a missing CUDA device.

Ok, so you do not have a specific error which might have been thrown by the module?
Other ideas, check dev-util/nvidia-cuda-toolkit version and double check nvidia/nvidia_uvm with modinfo to ensure they are installed and loaded correctly with the right version?
Could you also run /opt/cuda/extras/demo_suite/deviceQuery (from nvidia-cuda-toolkit) and show its output?

My installation works, so at least we know their version is not completely broken...
Driver version: 396.51
Cuda version: 9.2.88

--
Corentin “Nado” Pazdera

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 14:32         ` Corentin “Nado” Pazdera
@ 2018-08-15 15:11           ` tuxic
  2018-08-15 15:45             ` tuxic
  2018-08-15 16:05             ` Corentin “Nado” Pazdera
  0 siblings, 2 replies; 20+ messages in thread
From: tuxic @ 2018-08-15 15:11 UTC (permalink / raw
  To: gentoo-user

On 08/15 02:32, Corentin “Nado” Pazdera wrote:
> August 15, 2018 2:59 PM, tuxic@posteo.de wrote:
> 
> > Yes I did reboot the sustem. In my initial mail I mentioned a tool
> > called CUDA-Z and Blender, which both reports a missing CUDA device.
> 
> Ok, so you do not have a specific error which might have been thrown by the module?
> Other ideas, check dev-util/nvidia-cuda-toolkit version and double check nvidia/nvidia_uvm with modinfo to ensure they are installed and loaded correctly with the right version?
> Could you also run /opt/cuda/extras/demo_suite/deviceQuery (from nvidia-cuda-toolkit) and show its output?
> 
> My installation works, so at least we know their version is not completely broken...
> Driver version: 396.51
> Cuda version: 9.2.88
> 
> --
> Corentin “Nado” Pazdera
> 

I compiled the new version of the driver again and rebooted the
system.

# dmesg | grep -i nvidia:

[   11.375631] nvidia_drm: module license 'MIT' taints kernel.
[   12.313260] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
[   12.313586] nvidia 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[   12.313691] nvidia 0000:02:00.0: enabling device (0000 -> 0003)
[   12.313737] nvidia 0000:02:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[   12.313826] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  396.51  Tue Jul 31 10:43:06 PDT 2018 (using threaded interrupts)
[   12.491106] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input9
[   12.492291] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input10
[   12.493772] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input11
[   12.494605] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input12
[   13.963644] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
[   34.236553] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
[   34.516495] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  396.51  Tue Jul 31 14:52:09 PDT 2018

# modprobe -a nvidia-uvm

# dmesg | grep uvm

[  209.441956] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 245

# /opt/cuda/extras/demo_suite/deviceQuery
/opt/cuda/extras/demo_suite/deviceQuery Starting...      

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL
[1]    5086 exit 1     /opt/cuda/extras/demo_suite/deviceQuery

CUDA-Z shows also "no CUDA device" 

# modinfo nvidia-uvm
filename:       /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
supported:      external
license:        MIT
depends:        nvidia
name:           nvidia_uvm
vermagic:       4.18.0-RT SMP preempt mod_unload 
parm:           uvm_perf_prefetch_enable:uint
parm:           uvm_perf_prefetch_threshold:uint
parm:           uvm_perf_prefetch_min_faults:uint
parm:           uvm_perf_thrashing_enable:uint
parm:           uvm_perf_thrashing_threshold:uint
parm:           uvm_perf_thrashing_pin_threshold:uint
parm:           uvm_perf_thrashing_lapse_usec:uint
parm:           uvm_perf_thrashing_nap_usec:uint
parm:           uvm_perf_thrashing_epoch_msec:uint
parm:           uvm_perf_thrashing_max_resets:uint
parm:           uvm_perf_thrashing_pin_msec:uint
parm:           uvm_perf_map_remote_on_native_atomics_fault:uint
parm:           uvm_hmm:Enable (1) or disable (0) HMM mode. Default: 0. Ignored if CONFIG_HMM is not set, or if NEXT settings conflict with HMM. (int)
parm:           uvm_global_oversubscription:Enable (1) or disable (0) global oversubscription support. (int)
parm:           uvm_leak_checker:Enable uvm memory leak checking. 0 = disabled, 1 = count total bytes allocated and freed, 2 = per-allocation origin tracking. (int)
parm:           uvm_force_prefetch_fault_support:uint
parm:           uvm_debug_enable_push_desc:Enable push description tracking (int)
parm:           uvm_page_table_location:Set the location for UVM-allocated page tables. Choices are: vid, sys. (charp)
parm:           uvm_perf_access_counter_mimc_migration_enable:Whether MIMC access counters will trigger migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)
parm:           uvm_perf_access_counter_momc_migration_enable:Whether MOMC access counters will trigger migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)
parm:           uvm_perf_access_counter_batch_count:uint
parm:           uvm_perf_access_counter_granularity:Size of the physical memory region tracked by each counter. Valid values asof Volta: 64k, 2m, 16m, 16g (charp)
parm:           uvm_perf_access_counter_threshold:Number of remote accesses on a region required to trigger a notification.Valid values: [1, 65535] (uint)
parm:           uvm_perf_reenable_prefetch_faults_lapse_msec:uint
parm:           uvm_perf_fault_batch_count:uint
parm:           uvm_perf_fault_replay_policy:uint
parm:           uvm_perf_fault_replay_update_put_ratio:uint
parm:           uvm_perf_fault_max_batches_per_service:uint
parm:           uvm_perf_fault_max_throttle_per_service:uint
parm:           uvm_perf_fault_coalesce:uint
parm:           uvm_fault_force_sysmem:Force (1) using sysmem storage for pages that faulted. Default: 0. (int)
parm:           uvm_perf_map_remote_on_eviction:int
parm:           uvm_channel_num_gpfifo_entries:uint
parm:           uvm_channel_gpfifo_loc:charp
parm:           uvm_channel_gpput_loc:charp
parm:           uvm_channel_pushbuffer_loc:charp
parm:           uvm_enable_debug_procfs:Enable debug procfs entries in /proc/driver/nvidia-uvm (int)
parm:           uvm8_ats_mode:Override the default ATS (Address Translation Services) UVM mode by disabling (0) or enabling (1) (int)
parm:           uvm_driver_mode:Set the uvm kernel driver mode. Choices include: 8 (charp)
parm:           uvm_debug_prints:Enable uvm debug prints. (int)
parm:           uvm_enable_builtin_tests:Enable the UVM built-in tests. (This is a security risk) (int)

# ls -l /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
-rw-r--r-- 1 root root 1405808 Aug 15 16:49 /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
(just installed minytes before)

# uname -a
Linux solfire 4.18.0-RT #1 SMP PREEMPT Mon Aug 13 05:15:26 CEST 2018 x86_64 AMD Phenom(tm) II X6 1090T Processor AuthenticAMD GNU/Linux
(the kernel version matches)

# eix nvidia-cuda-toolkit

[I] dev-util/nvidia-cuda-toolkit
     Available versions:  [M](~)6.5.14(0/6.5.14) [M](~)6.5.19-r1(0/6.5.19) [M](~)7.5.18-r2(0/7.5.18) [M](~)8.0.44(0/8.0.44) [M](~)8.0.61(0/8.0.61) (~)9.0.176(0/9.0.176) (~)9.1.85(0/9.1.85) (~)9.2.88(0/9.2.88) {debugger doc eclipse profiler}
     Installed versions:  9.2.88(0/9.2.88)(06:31:32 PM 08/14/2018)(-debugger -doc -eclipse -profiler)
     Homepage:            https://developer.nvidia.com/cuda-zone
     Description:         NVIDIA CUDA Toolkit (compiler and friends)

It becomes even more weird...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 15:11           ` tuxic
@ 2018-08-15 15:45             ` tuxic
  2018-08-15 17:13               ` Nikos Chantziaras
  2018-08-15 16:05             ` Corentin “Nado” Pazdera
  1 sibling, 1 reply; 20+ messages in thread
From: tuxic @ 2018-08-15 15:45 UTC (permalink / raw
  To: gentoo-user


I put nvidia-uvm explictly into /etc/conf.d/modules - which was not
necessary ever before....and it shows the same problems: No cuda
devices.

I think I will dream this night of no cuda devices... ;(


On 08/15 05:11, tuxic@posteo.de wrote:
> On 08/15 02:32, Corentin “Nado” Pazdera wrote:
> > August 15, 2018 2:59 PM, tuxic@posteo.de wrote:
> > 
> > > Yes I did reboot the sustem. In my initial mail I mentioned a tool
> > > called CUDA-Z and Blender, which both reports a missing CUDA device.
> > 
> > Ok, so you do not have a specific error which might have been thrown by the module?
> > Other ideas, check dev-util/nvidia-cuda-toolkit version and double check nvidia/nvidia_uvm with modinfo to ensure they are installed and loaded correctly with the right version?
> > Could you also run /opt/cuda/extras/demo_suite/deviceQuery (from nvidia-cuda-toolkit) and show its output?
> > 
> > My installation works, so at least we know their version is not completely broken...
> > Driver version: 396.51
> > Cuda version: 9.2.88
> > 
> > --
> > Corentin “Nado” Pazdera
> > 
> 
> I compiled the new version of the driver again and rebooted the
> system.
> 
> # dmesg | grep -i nvidia:
> 
> [   11.375631] nvidia_drm: module license 'MIT' taints kernel.
> [   12.313260] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
> [   12.313586] nvidia 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
> [   12.313691] nvidia 0000:02:00.0: enabling device (0000 -> 0003)
> [   12.313737] nvidia 0000:02:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
> [   12.313826] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  396.51  Tue Jul 31 10:43:06 PDT 2018 (using threaded interrupts)
> [   12.491106] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input9
> [   12.492291] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input10
> [   12.493772] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input11
> [   12.494605] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input12
> [   13.963644] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
> [   34.236553] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
> [   34.516495] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  396.51  Tue Jul 31 14:52:09 PDT 2018
> 
> # modprobe -a nvidia-uvm
> 
> # dmesg | grep uvm
> 
> [  209.441956] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 245
> 
> 
> # /opt/cuda/extras/demo_suite/deviceQuery
> /opt/cuda/extras/demo_suite/deviceQuery Starting...      
> 
>  CUDA Device Query (Runtime API) version (CUDART static linking)
> 
> cudaGetDeviceCount returned 30
> -> unknown error
> Result = FAIL
> [1]    5086 exit 1     /opt/cuda/extras/demo_suite/deviceQuery
> 
> CUDA-Z shows also "no CUDA device" 
> 
> # modinfo nvidia-uvm
> filename:       /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
> supported:      external
> license:        MIT
> depends:        nvidia
> name:           nvidia_uvm
> vermagic:       4.18.0-RT SMP preempt mod_unload 
> parm:           uvm_perf_prefetch_enable:uint
> parm:           uvm_perf_prefetch_threshold:uint
> parm:           uvm_perf_prefetch_min_faults:uint
> parm:           uvm_perf_thrashing_enable:uint
> parm:           uvm_perf_thrashing_threshold:uint
> parm:           uvm_perf_thrashing_pin_threshold:uint
> parm:           uvm_perf_thrashing_lapse_usec:uint
> parm:           uvm_perf_thrashing_nap_usec:uint
> parm:           uvm_perf_thrashing_epoch_msec:uint
> parm:           uvm_perf_thrashing_max_resets:uint
> parm:           uvm_perf_thrashing_pin_msec:uint
> parm:           uvm_perf_map_remote_on_native_atomics_fault:uint
> parm:           uvm_hmm:Enable (1) or disable (0) HMM mode. Default: 0. Ignored if CONFIG_HMM is not set, or if NEXT settings conflict with HMM. (int)
> parm:           uvm_global_oversubscription:Enable (1) or disable (0) global oversubscription support. (int)
> parm:           uvm_leak_checker:Enable uvm memory leak checking. 0 = disabled, 1 = count total bytes allocated and freed, 2 = per-allocation origin tracking. (int)
> parm:           uvm_force_prefetch_fault_support:uint
> parm:           uvm_debug_enable_push_desc:Enable push description tracking (int)
> parm:           uvm_page_table_location:Set the location for UVM-allocated page tables. Choices are: vid, sys. (charp)
> parm:           uvm_perf_access_counter_mimc_migration_enable:Whether MIMC access counters will trigger migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)
> parm:           uvm_perf_access_counter_momc_migration_enable:Whether MOMC access counters will trigger migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)
> parm:           uvm_perf_access_counter_batch_count:uint
> parm:           uvm_perf_access_counter_granularity:Size of the physical memory region tracked by each counter. Valid values asof Volta: 64k, 2m, 16m, 16g (charp)
> parm:           uvm_perf_access_counter_threshold:Number of remote accesses on a region required to trigger a notification.Valid values: [1, 65535] (uint)
> parm:           uvm_perf_reenable_prefetch_faults_lapse_msec:uint
> parm:           uvm_perf_fault_batch_count:uint
> parm:           uvm_perf_fault_replay_policy:uint
> parm:           uvm_perf_fault_replay_update_put_ratio:uint
> parm:           uvm_perf_fault_max_batches_per_service:uint
> parm:           uvm_perf_fault_max_throttle_per_service:uint
> parm:           uvm_perf_fault_coalesce:uint
> parm:           uvm_fault_force_sysmem:Force (1) using sysmem storage for pages that faulted. Default: 0. (int)
> parm:           uvm_perf_map_remote_on_eviction:int
> parm:           uvm_channel_num_gpfifo_entries:uint
> parm:           uvm_channel_gpfifo_loc:charp
> parm:           uvm_channel_gpput_loc:charp
> parm:           uvm_channel_pushbuffer_loc:charp
> parm:           uvm_enable_debug_procfs:Enable debug procfs entries in /proc/driver/nvidia-uvm (int)
> parm:           uvm8_ats_mode:Override the default ATS (Address Translation Services) UVM mode by disabling (0) or enabling (1) (int)
> parm:           uvm_driver_mode:Set the uvm kernel driver mode. Choices include: 8 (charp)
> parm:           uvm_debug_prints:Enable uvm debug prints. (int)
> parm:           uvm_enable_builtin_tests:Enable the UVM built-in tests. (This is a security risk) (int)
> 
> 
> # ls -l /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
> -rw-r--r-- 1 root root 1405808 Aug 15 16:49 /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
> (just installed minytes before)
> 
> # uname -a
> Linux solfire 4.18.0-RT #1 SMP PREEMPT Mon Aug 13 05:15:26 CEST 2018 x86_64 AMD Phenom(tm) II X6 1090T Processor AuthenticAMD GNU/Linux
> (the kernel version matches)
> 
> # eix nvidia-cuda-toolkit
> 
> [I] dev-util/nvidia-cuda-toolkit
>      Available versions:  [M](~)6.5.14(0/6.5.14) [M](~)6.5.19-r1(0/6.5.19) [M](~)7.5.18-r2(0/7.5.18) [M](~)8.0.44(0/8.0.44) [M](~)8.0.61(0/8.0.61) (~)9.0.176(0/9.0.176) (~)9.1.85(0/9.1.85) (~)9.2.88(0/9.2.88) {debugger doc eclipse profiler}
>      Installed versions:  9.2.88(0/9.2.88)(06:31:32 PM 08/14/2018)(-debugger -doc -eclipse -profiler)
>      Homepage:            https://developer.nvidia.com/cuda-zone
>      Description:         NVIDIA CUDA Toolkit (compiler and friends)
> 
> 
> 
> It becomes even more weird...
> 
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 15:11           ` tuxic
  2018-08-15 15:45             ` tuxic
@ 2018-08-15 16:05             ` Corentin “Nado” Pazdera
  1 sibling, 0 replies; 20+ messages in thread
From: Corentin “Nado” Pazdera @ 2018-08-15 16:05 UTC (permalink / raw
  To: gentoo-user

August 15, 2018 5:45 PM, tuxic@posteo.de wrote:

> I put nvidia-uvm explictly into /etc/conf.d/modules - which was not
> necessary ever before....and it shows the same problems: No cuda
> devices.
> 
> I think I will dream this night of no cuda devices... ;(
> 
> On 08/15 05:11, tuxic@posteo.de wrote:
> 
>> On 08/15 02:32, Corentin “Nado” Pazdera wrote:
>> August 15, 2018 2:59 PM, tuxic@posteo.de wrote:
> 
> Yes I did reboot the sustem. In my initial mail I mentioned a tool
> called CUDA-Z and Blender, which both reports a missing CUDA device.
>> Ok, so you do not have a specific error which might have been thrown by the module?
>> Other ideas, check dev-util/nvidia-cuda-toolkit version and double check nvidia/nvidia_uvm with
>> modinfo to ensure they are installed and loaded correctly with the right version?
>> Could you also run /opt/cuda/extras/demo_suite/deviceQuery (from nvidia-cuda-toolkit) and show its
>> output?
>> 
>> My installation works, so at least we know their version is not completely broken...
>> Driver version: 396.51
>> Cuda version: 9.2.88
>> 
>> --
>> Corentin “Nado” Pazdera
>> 
>> I compiled the new version of the driver again and rebooted the
>> system.
>> 
>> # dmesg | grep -i nvidia:
>> 
>> [ 11.375631] nvidia_drm: module license 'MIT' taints kernel.
>> [ 12.313260] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
>> [ 12.313586] nvidia 0000:07:00.0: vgaarb: changed VGA decodes:
>> olddecodes=io+mem,decodes=none:owns=io+mem
>> [ 12.313691] nvidia 0000:02:00.0: enabling device (0000 -> 0003)
>> [ 12.313737] nvidia 0000:02:00.0: vgaarb: changed VGA decodes:
>> olddecodes=io+mem,decodes=none:owns=none
>> [ 12.313826] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 396.51 Tue Jul 31 10:43:06 PDT 2018
>> (using threaded interrupts)
>> [ 12.491106] input: HDA NVidia HDMI as
>> /devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input9
>> [ 12.492291] input: HDA NVidia HDMI as
>> /devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input10
>> [ 12.493772] input: HDA NVidia HDMI as
>> /devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input11
>> [ 12.494605] input: HDA NVidia HDMI as
>> /devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input12
>> [ 13.963644] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
>> [ 34.236553] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
>> [ 34.516495] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 396.51
>> Tue Jul 31 14:52:09 PDT 2018
>> 
>> # modprobe -a nvidia-uvm
>> 
>> # dmesg | grep uvm
>> 
>> [ 209.441956] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 245
>> 
>> # /opt/cuda/extras/demo_suite/deviceQuery
>> /opt/cuda/extras/demo_suite/deviceQuery Starting...
>> 
>> CUDA Device Query (Runtime API) version (CUDART static linking)
>> 
>> cudaGetDeviceCount returned 30
>> -> unknown error
>> Result = FAIL
>> [1] 5086 exit 1 /opt/cuda/extras/demo_suite/deviceQuery
>> 
>> CUDA-Z shows also "no CUDA device"
>> 
>> # modinfo nvidia-uvm
>> filename: /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
>> supported: external
>> license: MIT
>> depends: nvidia
>> name: nvidia_uvm
>> vermagic: 4.18.0-RT SMP preempt mod_unload
>> parm: uvm_perf_prefetch_enable:uint
>> parm: uvm_perf_prefetch_threshold:uint
>> parm: uvm_perf_prefetch_min_faults:uint
>> parm: uvm_perf_thrashing_enable:uint
>> parm: uvm_perf_thrashing_threshold:uint
>> parm: uvm_perf_thrashing_pin_threshold:uint
>> parm: uvm_perf_thrashing_lapse_usec:uint
>> parm: uvm_perf_thrashing_nap_usec:uint
>> parm: uvm_perf_thrashing_epoch_msec:uint
>> parm: uvm_perf_thrashing_max_resets:uint
>> parm: uvm_perf_thrashing_pin_msec:uint
>> parm: uvm_perf_map_remote_on_native_atomics_fault:uint
>> parm: uvm_hmm:Enable (1) or disable (0) HMM mode. Default: 0. Ignored if CONFIG_HMM is not set, or
>> if NEXT settings conflict with HMM. (int)
>> parm: uvm_global_oversubscription:Enable (1) or disable (0) global oversubscription support. (int)
>> parm: uvm_leak_checker:Enable uvm memory leak checking. 0 = disabled, 1 = count total bytes
>> allocated and freed, 2 = per-allocation origin tracking. (int)
>> parm: uvm_force_prefetch_fault_support:uint
>> parm: uvm_debug_enable_push_desc:Enable push description tracking (int)
>> parm: uvm_page_table_location:Set the location for UVM-allocated page tables. Choices are: vid,
>> sys. (charp)
>> parm: uvm_perf_access_counter_mimc_migration_enable:Whether MIMC access counters will trigger
>> migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)
>> parm: uvm_perf_access_counter_momc_migration_enable:Whether MOMC access counters will trigger
>> migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)
>> parm: uvm_perf_access_counter_batch_count:uint
>> parm: uvm_perf_access_counter_granularity:Size of the physical memory region tracked by each
>> counter. Valid values asof Volta: 64k, 2m, 16m, 16g (charp)
>> parm: uvm_perf_access_counter_threshold:Number of remote accesses on a region required to trigger a
>> notification.Valid values: [1, 65535] (uint)
>> parm: uvm_perf_reenable_prefetch_faults_lapse_msec:uint
>> parm: uvm_perf_fault_batch_count:uint
>> parm: uvm_perf_fault_replay_policy:uint
>> parm: uvm_perf_fault_replay_update_put_ratio:uint
>> parm: uvm_perf_fault_max_batches_per_service:uint
>> parm: uvm_perf_fault_max_throttle_per_service:uint
>> parm: uvm_perf_fault_coalesce:uint
>> parm: uvm_fault_force_sysmem:Force (1) using sysmem storage for pages that faulted. Default: 0.
>> (int)
>> parm: uvm_perf_map_remote_on_eviction:int
>> parm: uvm_channel_num_gpfifo_entries:uint
>> parm: uvm_channel_gpfifo_loc:charp
>> parm: uvm_channel_gpput_loc:charp
>> parm: uvm_channel_pushbuffer_loc:charp
>> parm: uvm_enable_debug_procfs:Enable debug procfs entries in /proc/driver/nvidia-uvm (int)
>> parm: uvm8_ats_mode:Override the default ATS (Address Translation Services) UVM mode by disabling
>> (0) or enabling (1) (int)
>> parm: uvm_driver_mode:Set the uvm kernel driver mode. Choices include: 8 (charp)
>> parm: uvm_debug_prints:Enable uvm debug prints. (int)
>> parm: uvm_enable_builtin_tests:Enable the UVM built-in tests. (This is a security risk) (int)
>> 
>> # ls -l /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
>> -rw-r--r-- 1 root root 1405808 Aug 15 16:49 /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
>> (just installed minytes before)
>> 
>> # uname -a
>> Linux solfire 4.18.0-RT #1 SMP PREEMPT Mon Aug 13 05:15:26 CEST 2018 x86_64 AMD Phenom(tm) II X6
>> 1090T Processor AuthenticAMD GNU/Linux
>> (the kernel version matches)
>> 
>> # eix nvidia-cuda-toolkit
>> 
>> [I] dev-util/nvidia-cuda-toolkit
>> Available versions: [M](~)6.5.14(0/6.5.14) [M](~)6.5.19-r1(0/6.5.19) [M](~)7.5.18-r2(0/7.5.18)
>> [M](~)8.0.44(0/8.0.44) [M](~)8.0.61(0/8.0.61) (~)9.0.176(0/9.0.176) (~)9.1.85(0/9.1.85)
>> (~)9.2.88(0/9.2.88) {debugger doc eclipse profiler}
>> Installed versions: 9.2.88(0/9.2.88)(06:31:32 PM 08/14/2018)(-debugger -doc -eclipse -profiler)
>> Homepage: https://developer.nvidia.com/cuda-zone
>> Description: NVIDIA CUDA Toolkit (compiler and friends)
>> 
>> It becomes even more weird...

It is weird indeed... Im running on kernel 4.15.16 and I needed to disable MSI in
/etc/modprobe.d/nvidia.conf with " NVreg_EnableMSI=0" appended to the line "options nvidia ...".
Thats the main differences I see with you from the software side.

This kind of error is usually due to a failed reload (not rebooting) or because of a version
mismatch according to google, but I can't find any mismatch in the info you gave us.

Good luck

--
Corentin “Nado” Pazdera


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 15:45             ` tuxic
@ 2018-08-15 17:13               ` Nikos Chantziaras
  2018-08-15 17:24                 ` tuxic
  0 siblings, 1 reply; 20+ messages in thread
From: Nikos Chantziaras @ 2018-08-15 17:13 UTC (permalink / raw
  To: gentoo-user

On 15/08/18 18:45, tuxic@posteo.de wrote:
> 
> I put nvidia-uvm explictly into /etc/conf.d/modules - which was not
> necessary ever before....and it shows the same problems: No cuda
> devices.
> 
> I think I will dream this night of no cuda devices... ;(

Or you might want to use the LTS (Long Term Support) driver series for 
now, which is 390.x (390.77 being the latest of that series.)

You can see what the latest LTS series is by going here:

   https://nvidia.com/drivers

Select your GPU and "Linux 64-bit" and click search. This will tell you 
what the currently recommended "stable" driver is.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 17:13               ` Nikos Chantziaras
@ 2018-08-15 17:24                 ` tuxic
  2018-08-16  6:24                   ` Nikos Chantziaras
  0 siblings, 1 reply; 20+ messages in thread
From: tuxic @ 2018-08-15 17:24 UTC (permalink / raw
  To: gentoo-user

On 08/15 08:13, Nikos Chantziaras wrote:
> On 15/08/18 18:45, tuxic@posteo.de wrote:
> > 
> > I put nvidia-uvm explictly into /etc/conf.d/modules - which was not
> > necessary ever before....and it shows the same problems: No cuda
> > devices.
> > 
> > I think I will dream this night of no cuda devices... ;(
> 
> Or you might want to use the LTS (Long Term Support) driver series for now,
> which is 390.x (390.77 being the latest of that series.)
> 
> You can see what the latest LTS series is by going here:
> 
>   https://nvidia.com/drivers
> 
> Select your GPU and "Linux 64-bit" and click search. This will tell you what
> the currently recommended "stable" driver is.
> 
> 


And the show must go on:


Secure Connection Failed

An error occurred during a connection to nvidia.com. Peer attempted old style (potentially vulnerable) handshake. Error code: SSL_ERROR_UNSAFE_NEGOTIATION

    The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.
    Please contact the website owners to inform them of this problem.

Learn more…

Report errors like this to help Mozilla identify and block malicious sites


Sigh...


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-15 17:24                 ` tuxic
@ 2018-08-16  6:24                   ` Nikos Chantziaras
  2018-08-16  6:26                     ` Nikos Chantziaras
  0 siblings, 1 reply; 20+ messages in thread
From: Nikos Chantziaras @ 2018-08-16  6:24 UTC (permalink / raw
  To: gentoo-user

On 15/08/18 20:24, tuxic@posteo.de wrote:
> On 08/15 08:13, Nikos Chantziaras wrote:
>> On 15/08/18 18:45, tuxic@posteo.de wrote:
>>>
>>> I put nvidia-uvm explictly into /etc/conf.d/modules - which was not
>>> necessary ever before....and it shows the same problems: No cuda
>>> devices.
>>>
>>> I think I will dream this night of no cuda devices... ;(
>>
>> Or you might want to use the LTS (Long Term Support) driver series for now,
>> which is 390.x (390.77 being the latest of that series.)
>>
>> You can see what the latest LTS series is by going here:
>>
>>    https://nvidia.com/drivers
>>
>> Select your GPU and "Linux 64-bit" and click search. This will tell you what
>> the currently recommended "stable" driver is.
> 
> And the show must go on:
> 
> Secure Connection Failed
> 
> An error occurred during a connection to nvidia.com. Peer attempted old style (potentially vulnerable) handshake. Error code: SSL_ERROR_UNSAFE_NEGOTIATION

Click "Advanced" and then "add exception". If you uncheck the 
"permament" checkbox, the exception will not be saved and be only valid 
for this session.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-16  6:24                   ` Nikos Chantziaras
@ 2018-08-16  6:26                     ` Nikos Chantziaras
  2018-08-16  6:55                       ` tuxic
  0 siblings, 1 reply; 20+ messages in thread
From: Nikos Chantziaras @ 2018-08-16  6:26 UTC (permalink / raw
  To: gentoo-user

On 16/08/18 09:24, Nikos Chantziaras wrote:
> On 15/08/18 20:24, tuxic@posteo.de wrote:
>> Secure Connection Failed
>>
>> An error occurred during a connection to nvidia.com. Peer attempted 
>> old style (potentially vulnerable) handshake. Error code: 
>> SSL_ERROR_UNSAFE_NEGOTIATION
> 
> Click "Advanced" and then "add exception". If you uncheck the 
> "permament" checkbox, the exception will not be saved and be only valid 
> for this session.

Or use this URL instead:

https://www.nvidia.com/drivers



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
  2018-08-16  6:26                     ` Nikos Chantziaras
@ 2018-08-16  6:55                       ` tuxic
  0 siblings, 0 replies; 20+ messages in thread
From: tuxic @ 2018-08-16  6:55 UTC (permalink / raw
  To: gentoo-user

On 08/16 09:26, Nikos Chantziaras wrote:
> On 16/08/18 09:24, Nikos Chantziaras wrote:
> > On 15/08/18 20:24, tuxic@posteo.de wrote:
> > > Secure Connection Failed
> > > 
> > > An error occurred during a connection to nvidia.com. Peer attempted
> > > old style (potentially vulnerable) handshake. Error code:
> > > SSL_ERROR_UNSAFE_NEGOTIATION
> > 
> > Click "Advanced" and then "add exception". If you uncheck the
> > "permament" checkbox, the exception will not be saved and be only valid
> > for this session.
> 
> Or use this URL instead:
> 
> https://www.nvidia.com/drivers
> 
> 

...my comment about the not so well implemented NVIDIA homepage
was only a sligtly ironic/cynic additonal "sigh" in all that
trouble...




^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-08-16  6:55 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-14 10:35 [gentoo-user] "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1( tuxic
2018-08-14 20:16 ` [gentoo-user] " Nikos Chantziaras
2018-08-15  2:19   ` tuxic
2018-08-15 10:33   ` Corentin “Nado” Pazdera
2018-08-15 11:16     ` tuxic
2018-08-15 11:39     ` Corentin “Nado” Pazdera
2018-08-15 12:02       ` tuxic
2018-08-15 12:35         ` Nikos Chantziaras
2018-08-15 12:49           ` tuxic
2018-08-15 12:45       ` Corentin “Nado” Pazdera
2018-08-15 12:59         ` tuxic
2018-08-15 14:32         ` Corentin “Nado” Pazdera
2018-08-15 15:11           ` tuxic
2018-08-15 15:45             ` tuxic
2018-08-15 17:13               ` Nikos Chantziaras
2018-08-15 17:24                 ` tuxic
2018-08-16  6:24                   ` Nikos Chantziaras
2018-08-16  6:26                     ` Nikos Chantziaras
2018-08-16  6:55                       ` tuxic
2018-08-15 16:05             ` Corentin “Nado” Pazdera

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox