Mutual Information (MI) Source Code for CUDA GPUs

Since I posted histogram computation source codes back in May, quite a few people have been asking me about mutual information source codes. MI codes can be found here

CUDA Source Codes

The code is related to the following publication:

author = “R. Shams and N. Barnes”,
title = “Speeding up Mutual Information Computation Using {NVIDIA} {CUDA} Hardware”,
booktitle = “Proc. Digital Image Computing: Techniques and Applications ({DICTA})”,
address = “Adelaide, Australia”,
month = dec,
year = “2007”,
pages = “555-560”,
doi = “10.1109/DICTA.2007.4426846”,

Hi, I know I’m raising this from the dead, but I ported the code to RHEL6 and CUDA 6.5. This is the output I get:

Initializing the first random array of 10000000 elements…
Initializing the second array of 10000000 elements…
cpuMI (80x80 bins): mi = 0.486209, 333.919 ms, 119.8 MB/s
cudaMIa (80x80 bins): mi = 16.357243, 21.143 ms, 1891.9 MB/s
cudaMIb (80x80 bins): mi = 16.134764, 37.753 ms, 1059.5 MB/s
cudaMI_Approx (80x80 bins): mi = 6.767943, 5.483 ms, 7295.3 MB/s

Shouldn’t the ‘mi’ values be almost the same?


As the code is quite old (2007?), it may not follow the current coding best practices,
like e.g. marking shared variables as volatile. It was probably developed for Compute
1.x compatible hardware.

Maybe the code would have to be carefully reviewed to check for compatibility with later
CUDA toolkits.

Also, try using a slightly older CUDA toolkit, such as 5.0 or 6.0 and build explicily for sm_10 or sm_13. This invokes the older nvcc compiler that is still based on Open64 (and not LLVM).

There is a chance that the resulting code will work better (even though it has to involve the JIT compiler in the driver).

In the latest CUDA toolkits compilation for Compute 1.x has been deprecated…


Thank you for the feedback Christian. The cluster has the 6.5 toolkit. I compiled with -arch=sm_11. The compiler said the option was deprecated, but it built. I get the same, seemingly incorrect results.

I’ll try to figure out what coding practices are wrong, but if you wouldn’t mind pointing me in the right direction if you see something obvious I’d appreciate it. I’m new to CUDA programming.


Christian, you nailed it. The erroneous entropy calculations were caused by the volatile keyword being absent from shared variables being declared in the macros of gpu_basics.h. The output is now:

Initializing the first random array of 10000000 elements…
Initializing the second array of 10000000 elements…
cpuMI (80x80 bins): mi = 0.486422, 333.942 ms, 119.8 MB/s
cudaMIa (80x80 bins): mi = 0.486418, 21.189 ms, 1887.8 MB/s
cudaMIb (80x80 bins): mi = 0.486418, 37.801 ms, 1058.2 MB/s
cudaMI_Approx (80x80 bins): mi = 0.521578, 5.540 ms, 7220.2 MB/s

The updated source and makefile are attached. (32.4 KB)

Source code is based on ITK (Insight toolkit) and CUDA histogram sample code.

Instead of “initialising random array”, two real images are the inputs.

The mattes mutual info is buggy… I am still working on it. Use it at your own risk.


(My cuda environment: CUDA 8, quadro m1000m, 5.0)