Hi Optix team,
I created a container for a DGX-1 with Voltas, recompiled ptx files using the next command:
nvcc -I…/…/…/Programs/NVIDIA-OptiX-SDK-5.0.1-linux64/include -O3 -gencode arch=compute_70,code=sm_70 -ptx -m64 shaders.cu
and my code is using compute_70/sm_70 flags too (btw, my non-Optix code does not uses cuda at all). However when launching the program I get the next error:
Parse error (Details: Function “RTresult _rtProgramCreateFromPTXFile(RTcontext, const char*, const char*, RTprogram_api**)” caught exception: shaders/shaders.ptx: error: Failed to parse input PTX string
shaders/shaders.ptx, line 10; fatal : Unsupported .target ‘sm_70’
Cannot parse input PTX string
When compiling/running a Nvidia sample with sm_70 inside the container I get:
./transpose
Transpose Starting…
GPU Device 0: “Tesla V100-SXM2-16GB” with compute capability 7.0
Device 0: “Tesla V100-SXM2-16GB”
SM Capability 7.0 detected:
[Tesla V100-SXM2-16GB] has 80 MP(s) x 64 (Cores/MP) = 5120 (Cores)
Compute performance scaling factor = 1.00
Matrix size: 1024x1024 (64x64 tiles), tile size: 16x16, block size: 16x16
transpose simple copy , Throughput = 528.7176 GB/s, Time = 0.01478 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose shared memory copy, Throughput = 536.1486 GB/s, Time = 0.01457 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose naive , Throughput = 280.4925 GB/s, Time = 0.02785 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose coalesced , Throughput = 515.1516 GB/s, Time = 0.01517 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose optimized , Throughput = 533.5241 GB/s, Time = 0.01464 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose coarse-grained , Throughput = 539.1799 GB/s, Time = 0.01449 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose fine-grained , Throughput = 538.7991 GB/s, Time = 0.01450 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
transpose diagonal , Throughput = 523.9968 GB/s, Time = 0.01491 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
Test passed
I am using the next combination:
Optix 5.0.1 / CUDA 9.0 / Driver 384.125 / nvidia-docker 2.0
So my question here is what optix/cuda version support sm_70 ?
Thanks,
Benjamin