OptiX launch dimension inquiry

Hello,

Previous question: Is there a way to know how much GPU memory Optix will use?

If the number of meshes is small, it runs, but if there are a lot, it does not run, so I thought it was a GPU memory problem.

However, it did not run on the GTX 960 4GB graphics card,
but it did run on the GTX 1050 2GB graphics card.

I executed optixLaunch(…, width, height, 1) in the for statement. (width = mesh count)
“height” is set so that (width * height) is about 30% (range of 2^30 or less) of GPU available memory. (Adjust for loop index increment value accordingly).

In the above situation, GTX 960 4GB (width(=mesh count): 138056 / height: 7777) did not run.

When “height” was set to 1, it worked on GTX 960 4GB.

I think it may be an index access issue when putting the results into the buffer, but I don’t understand why the same code works on the GTX 1050 but not on the GTX 960.

Are there any restrictions on the execution size of optixLaunch for each graphics card?

==============================================
This is part of the source code.
o main.cpp
// dimensionH is set so that (width * height) is about 30% (range of 2^30 or less) of GPU available memory.
for (unsigned int i = 0; i < meshesCnt; i += dimensionH)
{
const unsigned int width = meshesCnt;
const unsigned int height = (meshesCnt - i < dimensionH) ? (meshesCnt - i) : dimensionH; // last block
cudaMemset(params.Visibility, 0, blockSize * sizeof(char))); // blockSize = width * dimensionH
cudaMemcpy(… startIndex … );
optixLaunch(… width, height, 1);
CUDA_SYNC_CHECK();
}

o ray_gen()
{

const unsigned int resultIndex = (launch_index.y * width) + launch_index.x;
params.Visibility[resultIndex] = hit or miss;
}

==============================================
Output error when running GTX 960 4GB
: options.validationMode = OPTIX_DEVICE_CONTEXT_VALIDATION_MODE_ALL; Setting, Debug x64 Mode

[ 2][ ERROR]: Error recording event to prevent concurrent launches on the same OptixPipeline (CUDA error string: unknown error, CUDA error code: 999)
Error recording resource event on user stream (CUDA error string: unknown error, CUDA error code: 999)
Caught exception: OPTIX_ERROR_CUDA_ERROR: Optix call ‘optixLaunch( scene.pipeline(), 0, reinterpret_cast(d_paramsOpX), sizeof(opx::LaunchParamsOpX), scene.sbt(), width, height, 1 )’ failed:

[ 2][ PIPELINE]: Error releasing namedConstant’s internal resources (CUDA error string: unknown error, CUDA error code: 999)
Error synching on OptixPipeline event (CUDA error string: unknown error, CUDA error code: 999)
Error destroying OptixPipeline event (CUDA error string: unknown error, CUDA error code: 999)

==============================================

Thank you

If you get a CUDA_ERROR_UNKNOWN with big OptiX launch sizes and no error with small launch sizes, that might be related to the performance of the old board.

I think it may be an index access issue when putting the results into the buffer

If there are access errors inside your program that should result in things like CUDA_ERROR_MISALIGNED_ADDRESS or CUDA_ERROR_INVALID_ADDRESS_SPACE.

A GTX 960 is an entry level Maxwell GPU from 2015. Maxwell GPUs do not support compute preemption, Pascal GPUs like on your GTX 1050 do.
So if you run kernels on that Maxwell GPU which take longer than the Windows OS Timeout Detection and Recovery (TDR) limit of 2 seconds under Windows 10, the OS will terminate the display driver on long running kernels and there could be all kinds of error behavior reported.
Usually there should be CUDA_ERROR_LAUNCH_TIMEOUT (702) errors when the CUDA driver would be able to detect that but maybe the CUDA driver doesn’t even get the chance to report that and just throws CUDA_ERROR_UNKNOWN (999) because it just exited unexpectedly.

Are you benchmarking that with fully optimized device code in release builds?
Your error message has validation enabled and a Debug x64 message. That would run even slower.

How long does the optixLaunch take when using the bigger size?
(Mind that optixLaunch is an asynchronous call, you must add synchronizations before and after it when measuring that time on the host.)

How big can you make the launch size until it fails?
Does that near the 2 seconds kernel timeout? If yes, that would be your limit.

What is the memory requirement in that case?
Did you use cudaMemGetInfo or nvidia-smi to see if there is enough memory left?

OptiX requires stack space and you should have calculated that and called optixPipelineSetStackSize inside your app. The overall memory requirement of that also depends on the number of threads launched.

This simply sounds as if your GTX 960 is too slow for the workload you’re using.
Actually neither of your two entry level boards has enough VRAM or is fast enough for workloads near the OptiX launch size limit of 2^30 irrespective of what the kernels do. Nothing I can do about that.

1 Like

You were right about the cause of the problem.

When height = 7,777 on GTX 960 4GB, the error occurs after about 2 seconds.
When I tested it, it worked until the height was about 40, and then an error occurred.

Release builds use optimized code.
I was testing OptiX 7.7 because it was said to be available starting from Maxwell.

I am using cudaMemGetInfo to check if I have enough memory.

This was solved by limiting the user to GPU specifications and memory.

Thank you

I was testing OptiX 7.7 because it was said to be available starting from Maxwell.

All OptiX SDK versions from 7.0.0 to the currently available 8.0.0 release support Maxwell GPUs.

But these GPUs are really, really slow in comparison to the available GPUs today and cannot be recommended anymore for serious GPU raytracing workloads.
If you upgrade to an RTX board (Ada GPU generation recommended), you’ll see what I mean.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.