K-Nearest Neighbor Balltree Implementation in CUDA

Hi all, I am a newbie in cuda, there are many source code for knn kd-tree, but in the high dimension, knn balltree is faster than knn kd tree. But I dont know how to implement parallel cuda for knn in balltree. I need to compare performance between CPU and GPU knn in balltree. It very important for me.

Do anyone have a KNN balltree implementation in CUDA?

Thanks for your help!