OpenCV + Cuda - Getting “Misaligned Address” Error

I am writing a project in C++ on VisualStudio2013 using OpenCV 2.4.10 and Cuda 6.5. In the project I am using the “Local histogram Entropy based method” for object detection in images.
My graphic card is GTX760.

In the Program I am running in a nested loop on the rows and cols of the image, for each pixel I take a 9x9 neighborhood in which I calculate the entropy using gpu::CalcHist. When I reach a pixel in a col greater then 0 I receive an error “Misaligned Address”.

When I run on the rows (as long as Im in the first col - ‘0’) everything works great.

for (int row = 0; row < filt.rows - 1; row++) {
    for (int col = 0; col < filt.cols - 1; col++) {
        Neighborhood = Padded(Rect(col,row,k.width,k.height)); //Take current neighborhood
        gpu::calcHist(Neighborhood, temp); //temp is one row by 256 cols

        hist.row(256*row+col) = temp;

If I start from col different then 0 I get the error. The only col I can use is 0.

Any suggestions? Thank you.