Problem with image processing fiter

Hello CUDA developers,

My name is Edison Gustavo Muenz, I’m a student at Federal University of Santa Catarina UFSC and now I’m working at the laboratory of digital image processing (LaPIX). We started looking at CUDA to compute our image filters, but now I’m having a problem with the filter ‘Anisotropic diffusion’.
The problem is that the image gets distorted totally distorted with one implementation where I was writing to a float4. Now I tried to use the same basics of the example ‘simpleTexture’ from the SDK, but now the problem is that it seems that the image is not being processed at all. I think the problem is with the number of threads, but I can’t figure how I can do it a differente way.

I’ll post the source code, a sample image and a sample image processed with the diffusion filter running in CPU.

To compile the code, you use the following command:

nvcc simpleTexture.cu -L"$CUDA_SDK/lib/" -I"$CUDA_SDK/common/inc/" -lcutil -lglut

where $CUDA_SDK is a variable to the cuda_sdk directory

Then the program can be run with

$ ./a.out -i 1 -l 1000 01.pgm

Where:
-l -> Controls the ‘lambda’ parameter
-i -> Controls the number of iterations parameter

You can pass any PGM image to the program through the parameters, so you can test the program with different images like:
$ ./a.out another_image.pgm

Please, run the program and let me know what I am doing wrong. I’ve been over this for more than a week.

PS.: I know the for loop to do the iterations is totally unoptimized, but the purpose now is to make it work

Attachments can be found here: http://www.inf.ufsc.br/~gangster/cyclops/d…n_filter.tar.gz

I’m not seeing the attachment… :(

Me neither. I’m wondering if he still needs help. He has edited the post, maybe he has deleted the links to the source and the images.

I don’t know what happened with the attachments.
They can be found here:
http://www.inf.ufsc.br/~gangster/cyclops/d…n_filter.tar.gz

Thanks in advance!

So nobody knows what’s happening here?
I think it’s because of the coalesced reads/writes, or something like that. Something related about the number of threads I’m issuing for the kernel.
I have a Geforce 8800 GTS, if that helps anybody.