Image processing (divide image)

Well, having it to fail in alike cases is the exact point of checking for all kind of return values that I’ve tried to stress at the end of my previous message…

The original image has photometric value of PHOTOMETRIC_RGB, as you could verify say by running tiffinfo utility on the downloaded file - here is the tiffinfo output:

TIFF Directory at offset 0xc0008 (786440)

  Subfile Type: (0 = 0x0)

  Image Width: 512 Image Length: 512

  Bits/Sample: 8

  Compression Scheme: None

  Photometric Interpretation: RGB color

  Samples/Pixel: 3

  Planar Configuration: single image plane

However, the original image is 512x512, and you mentioned 8601x7561 resolution in your first post, which means someone scaled the original image, and photometric probably changed along the way. Thus, you’d have either to re-scale the original image, with better control over the output options so that you could match input format expected by my code, or you should change my code in order to match actual format of the image you are intending to process. In any case: TIFF is rather complex format, with many options available, and you’d have to decide pretty much tightly over most of the input format options that your program is accepting (just like my code is doing) and then convert your test image(s) into that exact format, or otherwise your host/device code is going to be very complex in order to work properly. But I’d certainly first suggest you download the original image first, and try with it, and then try to get your kernel working with the original image too, and then you could re-size the image, and try how your kernel behaves with big images.

I have such an error: Assertion `cudaMalloc(&idata_d, size) == cudaSuccess’ failed. with orginal image Lena.

Well, then - debug it:

    []print the value of the “size” variable before this cudaMalloc() call, and check is it 512512*3=786432, as it should be

    if so, then change the cudaMalloc() call so that you save return value, and then print this value, and then look-up this value in cudaError enumeration in driver_types.h header file in your CUDA installation - there, you’ll find corresponding symbolic name of the error, that is going to help you to further find what may have caused the problem


yeah:) It’s work. Now I’m starting to write new kernel.

In small image this code work but on large 9000x8000 dosen’t and I have error: invalid configuration argument

#include <assert.h>

#include <stdlib.h>

#include <tiffio.h>

#include <cuda.h>

#define ROZMIAR_BLOKU 256


void waterFind (char *idata_d,char *odata_d,int size)


	int i = blockIdx.x * blockDim.x + threadIdx.x;

	if (i < size)


	if(idata_d[i] < 120) 


		odata_d[i] = 120;




		odata_d[i] = idata_d[i];	




int main(int argc, char **argv)


	//assert(argc == 2);

	TIFF		   *iimage = TIFFOpen("B10.TIF", "r");


	uint32		  width;

	assert(TIFFGetField(iimage, TIFFTAG_IMAGEWIDTH, &width));

	assert(width > 0);

	uint32		  length;

	assert(TIFFGetField(iimage, TIFFTAG_IMAGELENGTH, &length));

	assert(length > 0);

	uint16		  bits_per_sample;

	assert(TIFFGetField(iimage, TIFFTAG_BITSPERSAMPLE, &bits_per_sample) != 0);

	assert(bits_per_sample == 8);

	uint16		  photometric;

	assert(TIFFGetField(iimage, TIFFTAG_PHOTOMETRIC, &photometric));

	//assert(photometric == PHOTOMETRIC_RGB);

	assert(photometric == 1);

	uint16		  planar_config;

	assert(TIFFGetField(iimage, TIFFTAG_PLANARCONFIG, &planar_config) != 0);

	uint16		  samples_per_pixel;

	assert(TIFFGetField(iimage, TIFFTAG_SAMPLESPERPIXEL, &samples_per_pixel));

	assert(samples_per_pixel == 1);

	div_t		   pair = std::div(bits_per_sample, 8);

	int			 mul = (pair.rem == 0) ? pair.quot : pair.quot + 1;


	int			 size = width * length * samples_per_pixel * mul * sizeof(char);

			//fprintf( stderr, "\n mul %d\nsize: %d\n", mul, size);			

	char		   *idata = (char *) malloc(size);

	assert(idata != NULL);

	char		   *curr = idata;

	int			 count = TIFFNumberOfStrips(iimage);

	for (int i = 0; i < count; ++i) {

		tsize_t		 in = TIFFReadEncodedStrip(iimage, i, curr, -1);

		assert(in != -1);

		curr += in;


	//cudaError_t error;


	char		   *idata_d;

	assert(cudaMalloc((void**)&idata_d, size) == cudaSuccess);

	//error = cudaMalloc(&idata_d,size);

	//fprintf(stderr,"Erros: %i\n", error);

	assert(cudaMemcpy(idata_d,idata, size, cudaMemcpyHostToDevice) == cudaSuccess);


	char		   *odata_d;

	assert(cudaMalloc((void**)&odata_d, size) == cudaSuccess);


	int nBlocks = imagesize/ROZMIAR_BLOKU + (imagesize%ROZMIAR_BLOKU == 0 ? 0 : 1);



	fprintf (stderr,"%s\n",cudaGetErrorString(cudaGetLastError()));

	char		   *odata = (char *) malloc(size);

	assert(odata != NULL);

	assert(cudaMemcpy(odata, odata_d, size, cudaMemcpyDeviceToHost) == cudaSuccess);


	TIFF		   *oimage = TIFFOpen("output.tif", "w");


	assert(TIFFSetField(oimage, TIFFTAG_IMAGEWIDTH, width));

	assert(TIFFSetField(oimage, TIFFTAG_IMAGELENGTH, length));

	assert(TIFFSetField(oimage, TIFFTAG_BITSPERSAMPLE, bits_per_sample));


	assert(TIFFSetField(oimage, TIFFTAG_PHOTOMETRIC, photometric));

	assert(TIFFSetField(oimage, TIFFTAG_SAMPLESPERPIXEL, samples_per_pixel));

	assert(TIFFSetField(oimage, TIFFTAG_PLANARCONFIG, planar_config));

	assert(TIFFSetField(oimage, TIFFTAG_ROWSPERSTRIP, length));

	tsize_t		 on = size;

	assert(TIFFWriteEncodedStrip(oimage, 0, odata, on) == on);




	assert(cudaFree(idata_d) == cudaSuccess);

	assert(cudaFree(odata_d) == cudaSuccess);


Your grid size is too large : 9000*8000/256 = 281250 which is larger than the 65535 maximum grid dimension. You will need to modify your kernel to use a two-dimensional grid.

I have change kernel


void waterFind (char *idata_d,char *odata_d,int size)


	int x = blockIdx.x * blockDim.x + threadIdx.x;

	int y = blockIdx.y * blockDim.y + threadIdx.y;

	int i = x + y;

	if (i < size)


		odata_d[i] = (char)200;





	dim3 grid((width + block.x-1) / block.x, (length + block.y -1) /block.y);


but it’s doesn’t help. I don’t have error’s but image is wrong.

The index calculation for i is wrong. Try this:

int x = blockIdx.x * blockDim.x + threadIdx.x;

	int y = blockIdx.y * blockDim.y + threadIdx.y;

	int i = x + gridDim.x*y;

it’s doesn’t help. Now i have image that 2/3 of it is black

int i = x + width*y;

I change to this and now is working.

Thanks all, It’s working I made some changes:)