Data corruption with GT 730

Hi,

I am to converting large Tiff images to JPEG2000 images. I found some code on line that will do the conversion using CUDA (http://apps.man.poznan.pl/trac/jpeg2k). It required a bit of work to allow creation of tiled JPEG2000 images (so larger images could be processed) but I have it working. I did the initial work on my Ubuntu box where I have a GeForce GTX 750 Ti installed. I installed a Windows 10 box using a GeForce GT 730 and ported the code to Visual studio 2010. It processes the images without error however the resultant JPEG2000 images have data corruption. After investigation I found it is periodically dropping nibbles from bytes. It will take a byte like 0xff3d and change it to 0xff00. The bytes that get corrupted always originally start with 0xff and the corrupted byte is always 0xff00. There are valid 0xff00 bytes in the binary data. I could find no apparent pattern to the offsets for the corrupted bytes and nothing in the source that would cause the issue. I finally pulled my GTX 750 from my Linux box and replaced the GT 730 in my Windows box, the corruption went away.

Anybody seen anything like this?

Thanx,

Bob

One possibility is a hardware issue (as you are probably imagining).

Based on your limited description, however, I don’t think it’s possible to rule out a software defect. Race conditions can occur that appear on one GPU architecture but not another. This doesn’t seem likely to me but it can’t be ruled out without more information or verification of the code. Simple verifications can include checking all CUDA API calls and kernel calls for errors (google “proper cuda error checking”), and running your code with cuda-memcheck including all the relevant sub-tool options (initcheck, racecheck, etc.):

http://docs.nvidia.com/cuda/cuda-memcheck/index.html#abstract

Regarding hardware issues, it does happen from time to time that some GeForce cards are “bad actors”. Of course, defects can occur even with Quadro and Tesla cards, although the defect rate is probably lower.

If the card itself is the issue, there isn’t much you can do about it. (You could try manually down-clocking the card, but this seems somewhat tedious on a card that retails for ~$60 currently. However down-clocking could help to confirm that the observation is hardware related.) If the card passes all relevant graphics tests, it might be considered a perfectly good product by the manufacturer (in this case, the manufacturer is not NVIDIA). If there is an underlying hardware weakness, it might be the NVIDIA GPU chip (there are test escapes in almost any manufacturing process), the on-board memory (which does not come from NVIDIA) or practically any other component or aspect of the board.

If you’re reasonably convinced the board is unsatisfactory, and it is still under warranty or you just purchased it, you might see what the return or replacement options are from the manufacturer.

We are looking for JPEG2000 decoder solution(based GPU) for our player project.
I am heard of your experience on JPEG2000 GPU decoder, may be you can help us with paid.
We receive the data via the network and we need decoder them at max 8*channels simultaneously,we can make a detail discussion through my email zxianteng@gmail.com, if you are interested.
Thanks.