gpu::calchist, misaligned address, very simple example. Would anyone care to explain what's going on?

I’m using VS2013, Cuda 8.0, OpenCV 3.0.0.

This is my test code:

// Load (blank) initial image (it could be a real image - I get the same error)
Mat test = Mat::zeros(256, 256, CV_8UC1);
cuda::GpuMat test2;

// Initialize the output histgram
cuda::GpuMat histogram_gpu(1, 256, CV_32SC1);

// Run calcHist
cuda::calcHist(test2, histogram_gpu);

I get this error:
OpenCV Error: Gpu API call (misaligned address) in hist::histogram256, file C:/opencv/sources/modules/cudaimgproc/src/cuda/, line 106

I’m a beginner to CUDA. I’ve tried my best to read up on Global Memory, Warp Size, etc., but I must not understand it well, since I can’t seem to fix this example.

Would someone mind explaining, in a simple way, what’s going on here?

It’s impossible to tell from the minimal tidbits provided, but this may be a problem:

Mat test = Mat::zeros(256, 256, CV_8UC1);
cuda::GpuMat histogram_gpu(1, 256, CV_32SC1);

It seems you are allocating a matrix of bytes, then treating it as a matrix of 32-bit integer when computing the histogram. if so, that won’t work.

On the GPU (as on various non-x86 CPUs), all data must be naturally aligned. That means that the alignment of an array must be equal to the size of its elements. So an array of 32-bit elements must be aligned on a 4-byte boundary. However, an array of 8-bit elements must only be aligned on a 1-byte boundary. If you now treat the array of bytes as an array of 32-bit integers (which I think the above too lines are indicating), you have a three in four chance that the alignment will be insufficient.

You can always look at the details by observing the value of relevant variables at the file/line location indicated by the error message.

Thanks for your reply. I’m not sure I follow you though.

According to the documentation of cuda::calcHist :

Calculates histogram for one channel 8-bit image.

C++: void cuda::calcHist(InputArray src, OutputArray hist, Stream& stream=Stream::Null())

    src – Source image with CV_8UC1 type.
    hist – Destination histogram with one row, 256 columns, and the CV_32SC1 type.
    stream – Stream for the asynchronous version.

So the scr image needs to be CV_8UC1, and the destination histogram needs to be one row, 256 columns, CV_32SC1. Is’t that what I’m doing?

I speculated that CV_8UC1 is a 1-byte type, and that CV_32SC1 is a 4-byte type, and that using one instead in place of the other may be the course of the reported alignment error. Of course, I could be totally wrong about this. It is not realistic to expect strangers on the interne to remotely debug an application based on a small code snippet.

What you need to do here is debugging work. The error message gives you a point of failure, and it tells you what kind of failure it is. Work backwards from there, and eventually things will become clear. I am confident you can figure it out all by yourself. It may turn out to be easy, or it may be a lot of work (in my career, I have worked on multiple bugs that each took two weeks to get to the bottom of; I would expect this to be much easier). Debugging is an integral part of the software engineering process, and one can learn a lot from it.

Anyone else?