Odd crash after cudaStreamCreate()

This is for a multi-GPU application which worked fine when I had a GTX 1080 and a Maxwell GTX Titan X.

I put in the Pascal GTX Titan X and ran the same code, the OS crashed early in the application right after a number streams were created for GPU 0.

This was the error message;

Problem signature:
  Problem Event Name:	APPCRASH
  Application Name:	ConsoleApplication1.exe
  Application Version:	0.0.0.0
  Application Timestamp:	57a41d93
  Fault Module Name:	nvcuda.dll
  Fault Module Version:	6.14.13.6905
  Fault Module Timestamp:	579960a8
  Exception Code:	c0000005
  Exception Offset:	00000000002a06ee
  OS Version:	6.1.7601.2.1.0.256.48
  Locale ID:	1033
  Additional Information 1:	2197
  Additional Information 2:	21974c2f0c4b150437903a381074a443
  Additional Information 3:	0a66
  Additional Information 4:	0a66a844300b398517249f03b63a9317

All CUDA calls check the return code for an error, but no messages appear related to errors from the stream creation. The application did not even get to the device allocations. The CUDA context was setup, some pinned host allocations were made, the streams were created then Boom!

Again this application worked fine before the GPU change and driver update.

Any ideas?

You have a GTX 1080 and a Pascal Titan X in the same system, is that correct?

Which driver version are you running?

Does the driver have listed support for both devices on the driver download page?

Yes I have both a GTX 1080 and a Pascal GTX Titan X in the same system. Downloaded and installed the 369.05 Pascal Titan driver today and both GPUs seemed to be working fine.

It was only when I ran a multi-GPU application when this error appeared.

So I guess you are saying that this driver does not support both the GTX 1080 and the Pascal GTX Titan X.

For that driver this is what appears on the download page;

Supported Products
GeForce 10 Series
NVIDIA TITAN X (Pascal)

Is the GTX 1080 not a Geforce 10 series?

It is, but it’s not listed on the download page under supported prodcuts.
Only the Titan X is listed there.

The GeForce 10 Series text that you’ve listed actually has a colon after it, indicating that this is a category header, not an actual product.

Within the category of GeForce 10 Series products, the supported products are Titan X.

GeForce 10 Series does not mean it supports all GeForce 10 Series products.

Is it unreasonable to expect that a Pascal GTX Titan X and another Pascal GPU should be able to work in the same PC using the same driver?

The previous driver had no problem with a Pascal GTX 1080 and a Maxwell GTX Titan X working in the same PC.

No, it’s not. But I already stated in another thread that that doesn’t happen to be the case for this particular driver. It’s commonly the case the NVIDIA drivers tend to support a fairly wide range of products. It’s just not (officially) the case with this particular driver.

Perhaps you may wish to reread this thread:

https://devtalk.nvidia.com/default/topic/953961/cuda-setup-and-installation/driver-support-for-pascal-titan-x/

Where I state:

  1. “The only supported product for the 369.05 driver is Pascal Titan X (take a look at the supported products tab).”

  2. " This driver is also somewhat unusual in that it supports only Pascal Titan X."

Anyway, you’re welcome to do whatever you wish. But since you’re running this driver on a product that is not offically supported by that driver, I thought you might want to be aware of that.

Fair enough, but I would like to have this multi-GPU configuration be usable.

In the past NVIDIA has updated drivers to be functional across multiple GPU models, so I hope that future versions will support such a configuration.

Is there any way to determine if this is the case by contacting NVIDIA directly?

I’m sure it will be usable in the future, if not now.

Titan X is a brand new product. You’re on the cutting edge. You’re using the very first Titan X driver ever.

It’s reasonable to assume that being on the cutting edge means that all the comfy ecosystem features you’re used to might not all be in place yet.

But NVIDIA has a history of a unified driver program and it’s not going to stop now. This was just a point driver release so that the highest quality could be ensured on Titan X at release, without having to delay the release for a huge QA cycle across every GPU that was ever built.

I updated to the most recent driver 372.54 which claims to support both the Pascal GTX Titan X and the GTX 1080,
but still getting a crash when I try to use both in a multi-GPU application.

On that driver page it explicitly lists both GPUs;

GeForce 10 Series:
<b>NVIDIA TITAN X (Pascal)</b>, <b>GeForce GTX 1080,</b> GeForce GTX 1070, GeForce GTX 1060

Again this same code had no problem with a Maxwell Titan X and a Pascal GTX 1080, but with two Pascal GPUs crashes every time when it hits the same point.

CUDA 8 RC, Windows 7 x64, compiling only for 61

Of course I realize this must be my fault…

Does one of the CUDA samples that has streams work for your configuration under the same compilation parameters? (say simpleMultiGPU)

I swapped out the Pascal Titan X and re-installed the Maxwell Titan X. Between this issue with CUDA streams, the issue with warps out of sync and the issue with the lower % of usable global memory bandwidth (480 GBs is completely false, at best with 16 byte coalesced loads I was able to get 369 GBs) I just gave up.

At least with Maxwell and CUDA 7.5 I know what to expect.