Why is there no `cudaMallocArrayAsync`?

I am working on an OptiX based application that needs to stream geometry and textures from disk as the application is running.

To increase the performance, I am currently working on updating the OptiX rendering code to be able to upload any new assets needed for frame N+1 while frame N is still rendering. I am currently doing this by using cudaMallocAsync and cudaFreeAsync to allocate and free device memory at the right place in the stream in a non blocking way.

This works as expected, but I am running into an issue with textures. As far as I can see, there seems to be no way to allocate the backing memory needed for a cudaTextureObject_t in a non-blocking way.

I have 2 questions about this:

  • Is there some way to get a CUDA texture in an async way that I am missing? There seems to be no cudaMallocArrayAsync but is there some way to access or map memory allocated with cudaMallocAsync as a CUDA array, or something like that?
  • If this just isn’t possible, I would be curious to learn the reason why. As far as I know, memory allocated with cudaMallocArray isn’t inherently special or stored in some special way. It exists as a way to allow a CUDA kernel to access main device memory using the texture unit rather than regular memory access. Is there something I am missing here that makes CUDA Array memory impossible to allocate in an async way?

There must be something special about it. If there were not, the CUDA designers presumably would have just used cudaMalloc for underlying array allocations.

I’m not aware of any method to provide an underlying allocation for a cudaArray other than cudaMallocArray and its similar cousins.

AFAIK the reason why is not documented by NVIDIA.

If you’d like to see a change in CUDA or its documentation, one possibility is to file a bug.