Problem with sm_70 not supported in Volta GPUs

Hi Optix team,

I created a container for a DGX-1 with Voltas, recompiled ptx files using the next command:

nvcc -I…/…/…/Programs/NVIDIA-OptiX-SDK-5.0.1-linux64/include -O3 -gencode arch=compute_70,code=sm_70 -ptx -m64 shaders.cu

and my code is using compute_70/sm_70 flags too (btw, my non-Optix code does not uses cuda at all). However when launching the program I get the next error:

Parse error (Details: Function “RTresult _rtProgramCreateFromPTXFile(RTcontext, const char*, const char*, RTprogram_api**)” caught exception: shaders/shaders.ptx: error: Failed to parse input PTX string
shaders/shaders.ptx, line 10; fatal : Unsupported .target ‘sm_70’
Cannot parse input PTX string

When compiling/running a Nvidia sample with sm_70 inside the container I get:

./transpose
Transpose Starting…

GPU Device 0: “Tesla V100-SXM2-16GB” with compute capability 7.0

Device 0: “Tesla V100-SXM2-16GB”
SM Capability 7.0 detected:
[Tesla V100-SXM2-16GB] has 80 MP(s) x 64 (Cores/MP) = 5120 (Cores)
Compute performance scaling factor = 1.00

Matrix size: 1024x1024 (64x64 tiles), tile size: 16x16, block size: 16x16

transpose simple copy , Throughput = 528.7176 GB/s, Time = 0.01478 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose shared memory copy, Throughput = 536.1486 GB/s, Time = 0.01457 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose naive , Throughput = 280.4925 GB/s, Time = 0.02785 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose coalesced , Throughput = 515.1516 GB/s, Time = 0.01517 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose optimized , Throughput = 533.5241 GB/s, Time = 0.01464 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose coarse-grained , Throughput = 539.1799 GB/s, Time = 0.01449 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose fine-grained , Throughput = 538.7991 GB/s, Time = 0.01450 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose diagonal , Throughput = 523.9968 GB/s, Time = 0.01491 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
Test passed

I am using the next combination:

Optix 5.0.1 / CUDA 9.0 / Driver 384.125 / nvidia-docker 2.0

So my question here is what optix/cuda version support sm_70 ?

Thanks,

Benjamin

Optix 5.1 / CUDA 9 did the trick.