cudaMalloc is not working on the gtx 1070 with cuda 8.0

Hello all.

I had developed random forest trainer using the gtx 980 with cuda 7.5 on the windows 8.1 and vs 2013.
that was working well. (thanks to gtx 980)

And I bought gtx 1070 and installed cuda 8.0. so I expected my program is working well too.
However, I got a error from cudaMalloc function. and error was cudaErrorUnknown :(

I don’t know how to handle this problem.
My environment is windows 10, vs2013 and cuda 8.0.


I changed the CUDA C/C++ configuration.
changed from compute_35,sm_35 to compute_60,sm_60.

however why cuda 8 is not support compute 35??

CUDA 8 supports compute 35

Your GPU does not support compute 35

compute_35 is a virtual architecture though, and the resulting PTX should JIT compile for both GTX 980 and GTX 1070. Note that I do not recommend using JIT compilation unless necessary, it is not necessary here. Simply build a fat binary which incorporates the machine code for all relevant GPU architectures.

Without a Minimal, Complete, and Verifiable Example ( it is not possible to diagnose issues like this, as the root cause is likely not cudaMalloc().

when specified in the fashion as indicated in OP’s statement:


which in a windows (VS project) typically expands to:

-gencode arch=compute_35,code=sm_35

will generate SASS only, not PTX.

In such a scenario, the resulting SASS code is not compatible with cc 6.x architecture, and this appears to be consistent with OP’s report, to wit:

Given the above context, my response was in answer to this question:

It probably would have been more accurate to say

CUDA 8 supports compute_35

Your GPU does not support sm_35

I think we are very much on the same page. My comment was meant to express a point of clarification, not disagreement. I would encourage OP to read the CUDA documentation on how to compile for virtual and physical architectures.

Thanks all.

I got what you are saying. very helpful for me.