Migrating from Optix 7.2 to 7.5: DemandLoading problems

miguel.t · January 24, 2023, 4:34pm

Hi,
We have a custom raytracer written in cuda that uses DemandLoading 7.2.
Our .cu files are compiled into PTX and loaded with cuModuleLoad.

Now we’d like to use version 7.5 of DemandLoading library with the texture eviction capability, but when we recompile our modules using the new Texture2D.h file, cuModuleLoad fails with CUDA_ERROR_INVALID_PTX.
By commenting out the lines in Texture2D.h with calls to optixTexFootprint2DGrad and optixTexFootprint2DLod, cuModuleLoad loads the PTX but, obviously, no textures are loaded.

Is there any substitute or a workaround for these functions?

Our Driver Version: 516.59
CUDA Version: 11.7

Best regards.

mleone · January 24, 2023, 10:46pm

Hi Miguel, that sounds like a bug in cuModuleLoad. We don’t have much experience using the demand loading library in plain-old-CUDA kernels. The OptiX kernel loading machinery doesn’t rely on cuModuleLoad, so we haven’t encountered this issue.
I’ll see if I can replicate it. If you’re able to provide a simple reproducer, that would be a big help. In particular, it would be helpful to see in detail how you’re compiling the PTX and how you’re calling cuModuleLoad.

mleone · January 24, 2023, 10:48pm

Also, as long as you’re going to the effort of upgrading the demand loading library, it would be worth switching to the latest version, which is now distributed as part of the OptiX Toolkit on GitHub: GitHub - NVIDIA/optix-toolkit: Set of utilities supporting workflows common in GPU raytracing applications

miguel.t · January 24, 2023, 10:54pm

Hi, thanks for the answer. I’ll do a simple example and post it here.

Thanks.

miguel.t · January 25, 2023, 5:56pm

Hi,

here’s the code:

include <DemandLoading/DeviceContext.h>
include <DemandLoading/Paging.h>
include <DemandLoading/Texture2D.h>

include <cuda.h>

namespace demandLoading {

extern “C” global void test( DeviceContext context)
{
const unsigned int startIndex = 32;
const unsigned int endIndex = 64;
unsigned int       globalIndex = threadIdx.x + blockIdx.x * blockDim.x + startIndex;
const unsigned int laneId      = globalIndex % 32;

unsigned int  textureId = 11;
float s = 0.1;
float t = 0.1;
float2 ddx = make_float2(0., 0.);
float2 ddy = make_float2(0., 0.);
bool   isResident = true;

float4 color = demandLoading::tex2DGrad<float4>(context, textureId, s, t, ddx, ddy, &isResident);
int4 intcolor = make_int4(color.x, color.y, color.z, color.w);
}
}

And the command line to compile to ptx:

H:\projects\temp>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc.exe” -gencode=arch=compute_75,code="sm_75,compute_75" --use-local-env -ccbin “C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64” -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" “-IH:/projects/NVIDIA Corporation/OptiX SDK 7.5.0/include” “-IH:/projects/NVIDIA Corporation/OptiX SDK 7.5.0/SDK/cuda” “-IH:/projects/NVIDIA Corporation/OptiX SDK 7.5.0/SDK” “-IH:/projects/NVIDIA Corporation/OptiX SDK 7.5.0/build/include” “-IH:/projects/NVIDIA Corporation/OptiX SDK 7.5.0/SDK/lib/DemandLoading/include” -maxrregcount=0 --machine 64 --ptx -cudart static -std=c++17 -Xcompiler=“/EHsc -Zi -Ob0” -D_WINDOWS -D_DEBUG -DNOMINMAX -D__CUDA_INCLUDE_COMPILER_INTERNAL_HEADERS__=1 -D"CMAKE_INTDIR="Debug"" -D"CMAKE_INTDIR="Debug"" -D_MBCS h:\projects\temp\test_00.cu -o h:\projects\temp\test_00.ptx

Just for testing, if i compile this to .cubin, (previously we compiled our cuda code to .cubin files and loaded with cuModuleLoad) it gives me these errors:

“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc.exe” -gencode=arch=compute_75,code="sm_75,compute_75" --use-local-env -ccbin “C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64” -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" “-IH:/projects/NVIDIA Corporation/OptiX SDK 7.5.0/include” “-IH:/projects/NVIDIA Corporation/OptiX SDK 7.5.0/SDK/cuda” “-IH:/projects/NVIDIA Corporation/OptiX SDK 7.5.0/SDK” “-IH:/projects/NVIDIA Corporation/OptiX SDK 7.5.0/build/include” “-IH:/projects/NVIDIA Corporation/OptiX SDK 7.5.0/SDK/lib/DemandLoading/include” -maxrregcount=0 --machine 64 -cudart static -std=c++17 -Xcompiler=“/EHsc -Zi -Ob0” -D_WINDOWS -D_DEBUG -DNOMINMAX -D__CUDA_INCLUDE_COMPILER_INTERNAL_HEADERS__=1 -D"CMAKE_INTDIR="Debug"" -D"CMAKE_INTDIR="Debug"" -D_MBCS h:\projects\temp\test_00.cu -o h:\projects\temp\test_00.cubin

test_00.cu
h:\projects\temp\test_00.cu(12): warning #177-D: variable “endIndex” was declared but never referenced

h:\projects\temp\test_00.cu(15): warning #177-D: variable “laneId” was declared but never referenced

ptxas C:/Users/Miguel/AppData/Local/Temp/tmpxft_00007cf0_00000000-7_test_00.ptx, line 137; error : Call to ‘_optix_tex_footprint_2d_grad_v2’ requires call prototype
ptxas C:/Users/Miguel/AppData/Local/Temp/tmpxft_00007cf0_00000000-7_test_00.ptx, line 451; error : Call to ‘_optix_tex_footprint_2d_grad_v2’ requires call prototype
ptxas C:/Users/Miguel/AppData/Local/Temp/tmpxft_00007cf0_00000000-7_test_00.ptx, line 824; error : Call to ‘_optix_tex_footprint_2d_grad_v2’ requires call prototype
ptxas C:/Users/Miguel/AppData/Local/Temp/tmpxft_00007cf0_00000000-7_test_00.ptx, line 1021; error : Call to ‘_optix_tex_footprint_2d_grad_v2’ requires call prototype
ptxas C:/Users/Miguel/AppData/Local/Temp/tmpxft_00007cf0_00000000-7_test_00.ptx, line 137; error : Unknown symbol ‘_optix_tex_footprint_2d_grad_v2’
ptxas C:/Users/Miguel/AppData/Local/Temp/tmpxft_00007cf0_00000000-7_test_00.ptx, line 451; error : Unknown symbol ‘_optix_tex_footprint_2d_grad_v2’
ptxas C:/Users/Miguel/AppData/Local/Temp/tmpxft_00007cf0_00000000-7_test_00.ptx, line 824; error : Unknown symbol ‘_optix_tex_footprint_2d_grad_v2’
ptxas C:/Users/Miguel/AppData/Local/Temp/tmpxft_00007cf0_00000000-7_test_00.ptx, line 1021; error : Unknown symbol ‘_optix_tex_footprint_2d_grad_v2’
ptxas fatal : Ptx assembly aborted due to errors

What am i doing wrong?

Best regards.

mleone · January 25, 2023, 8:40pm

Thanks, I’ll take a look. That looks quite straightforward to reproduce, thanks.
Here’s my attempt at a reproducer, which is not so straightforward, but you might find it informative.

miguel.t · January 25, 2023, 8:50pm

Hi,
here’s a small program that exhibits the error CUDA_INVALID_PTX

include <stdio.h>
include <cuda.h>
include

int main( int argc, char** argv )
{
CUresult res = cuInit ( 0 );

if( res != CUDA_SUCCESS )
{
    printf( "Error initializing cuda" );
    return 1;
}

CUdevice device;
res = cuDeviceGet ( &device, 0 );
if( res != CUDA_SUCCESS )
{
    printf( "Error getting device 0" );
    return 1;
}

CUcontext pctx;
res = cuCtxCreate ( &pctx, 0, 0 );
if( res != CUDA_SUCCESS )
{
    printf( "Error creating context on device" );
    return 1;
}

res = cuCtxSetCurrent ( pctx );
if( res != CUDA_SUCCESS )
{
    printf( "Error binding context" );
    return 1;
}

CUmodule module;
std::string fname("h:/projects/temp/test_00.ptx");
res = cuModuleLoad ( &module, fname.c_str() );

if( res != CUDA_SUCCESS )
{
    printf( "Error loading ptx : %d", res );
    return 1;
}

return 0;

}

The command lines use to compile an lik:

“C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64\CL.exe” /c /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" /Zi /nologo /W1 /WX- /diagnostics:column /Od /Ob0 /D WIN32 /D _WINDOWS /D NOMINMAX /D CUDA_INCLUDE_COMPILER_INTERNAL_HEADERS=1 /D _MBCS /Gm- /EHsc /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR /external:W1 /Gd /TP /wd4996 /errorReport:prompt H:\projects\temp\test_00.cpp

“C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64\CL.exe” /OUT:“H:\projects\temp\test_00.exe” /MANIFEST /NXCOMPAT /DYNAMICBASE “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\lib\x64\cudart_static.lib” “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\lib\x64\cuda.lib” “kernel32.lib” “user32.lib” “gdi32.lib” “winspool.lib” “shell32.lib” “ole32.lib” “oleaut32.lib” “uuid.lib” “comdlg32.lib” “advapi32.lib” /DEBUG /MACHINE:X64 /INCREMENTAL /SUBSYSTEM:CONSOLE /MANIFESTUAC:“level=‘asInvoker’ uiAccess=‘false’” /NOLOGO /TLBID:1 h:\projects\temp\test_00.obj

Best regards.

mleone · January 25, 2023, 9:27pm

I think I see the problem. Starting in OptiX 7.3, our tex2DGrad function calls an OptiX intrinsic function called optixTexFootprint2DGrad. That intrinsic calculates the tiles that are required for the given texture coordinates and derivatives using a hardware accelerated instruction (on Turing and later architectures. It’s emulated on older architectures).
The problem is that the intrinsic function is implemented by the OptiX module loader, but it’s not recognized by cuModuleLoad. Unfortunately there is no PTX instruction for the texture footprint operation.
So the short answer is that the the demand loading texture call can only be used in OptiX kernels, not in plain old CUDA kernels.
That’s an unfortunate limitation. I’ll discuss this with the rest of the team. We might be able to provide a workaround.

miguel.t · January 25, 2023, 11:56pm

Hi Mark,

thanks for the explanation. It would be awesome to have a workaround since the DemandLoading library greatly fits our needs.

Thanks and best regards.