How to improve texture loading speed?

The image information provided to me by the upper level business is stored in gpumat. Now, the way I load the texture is to convert gpumat to cv::mat, and then call glTexImage2D().

example:
void loadtexture(cv::cuda::GpuMat gpu_mat,unsigned int textureid)
{
cv::Mat cpu_mat;
gpu_mat.download(cpu_mat);
glBindTexture(GL_TEXTURE_2D, textureid);
glTexImage2D(GL_TEXTURE_2D , 0, GL_RGB, cpu_mat.cols, cpu_mat.rows, 0, GL_BGR, GL_UNSIGNED_BYTE, cpu_mat.data);
glTexParameteri(); …
}

Is there a way to directly load gpumat data into texture objects?

a gpumat has image storage in device memory. So to directly load that data into a texture, you could:

  1. Use OpenGL to create a texture
  2. Use CUDA/OpenGL interop to get a CUDA handle to the texture, as a cudaArray
  3. From there you have to get the device data from the gpumat into the cudaArray, you could:
    3a. create a CUDA surface from the cudaArray, in which case you could use ordinary kernel code to write gpumat data to the underlying cudaArray
    3b. just use a cudaMemcpyToArray, to populate the cudaArray with your gpumat data

I haven’t personally done this myself, so there may have been a gotcha that I have overlooked. But that is generally the roadmap I would follow.

I don’t have a full code for you. I generally don’t work with OpenCV much. You may find one on the web. Otherwise the key steps will be to learn more about CUDA/OpenGL interop perhaps by studying the relevant sample codes and also get familiar with how to extract and use the gpumat data via a bare CUDA pointer. AFAIK, gpumat uses an underlying pitched allocation, which may complicate things. It will not complicate (much) the surface approach.

If, instead, you could work with a pixel buffer object on the OpenGL side, then you could extract an ordinary linear memory pointer to the underlying data during OpenGL interop, and just do a cudaMemcpy (device to device) to move the data into it.

Depending on your use case, perhaps this would be better? I don’t know if it actually under the hood would do a device->host copy, though, which is I guess is what you are trying to avoid.