Implementing a Quadtree using cuda

I’m trying to implementing a Quadtree using Cuda. I found a quadtree code sample in CUDA Samples,but it works incorrectly. The code sample builds quadtree with 1024 points data, m_begin and m_end are pointers to the data range begin and end respectively in a quadtree node.

When the quadtree depth level is more than 3 (the first level is quadtree root), code works incorrectly.
There are print all leaf nodes data range when depth is 3, leaf nodes are 16, but many node data range info is missing, bounding box info missing too.

____________m_begin   m_end
data range:  0,        56
data range:  56,      120
data range:  120,     197
data range:  197,     256
data range:  0,       0 
data range:  0,       0 
data range:  0,       0 
data range:  0,       0
data range:  0,       0 
data range:  0,       0
data range:  0,       0 
data range:  0,       0 
data range:  0,       0 
data range:  0,       0 
data range:  0,       0 
data range:  0,       0

I don’t know where the problem is and how to fix it. Or I missed something or made mistakes?

With the level of detail you’ve given us, you might as well consider hiring a psychic.

Please post code or a full reference as to where to find the code, ideally self contained and complete (so it can be compiled without much effort).

If your code is part of a CUDA sample, please state the toolkit version where this is included. Also the specs of the hardware you are running this on may be important. Which are the compile options you’ve used?

Please post any modified code from the original sample. You may have modified it to get to more than 3 levels deep.


cross posting (has some code):

I ALREADY said "a quadtree code sample in CUDA Samples,but it works incorrectly. "
@cbuchner1 , You really need a psychic.

Dont tell me you can not find CUDA Samples.

The bug comes from the wrong offset in original samples


kudos for trying to up-level the activity. Thanks for trying anyway.

Hopeless case. Tuning out.

@cbuchner1, “talk is cheap, show me your code”. I already clearly said where is the bug and what it is. You even NO run quadtree code sample in UCDA Samples, 在这bb你麻痹啊!!!

I disagree. You did no such thing.

I ran the cdpQuadtree sample code.

It produces output like this:

$ /usr/local/cuda/samples/bin/x86_64/linux/release/cdpQuadtree
GPU Device 0: "Tesla V100-PCIE-32GB" with compute capability 7.0

GPU device Tesla V100-PCIE-32GB has compute capabilities (SM 7.0)
Launching CDP kernel to build the quadtree
Results: OK

I have no idea how to connect it to your observation; you have obviously modified the code, if nothing else to produce this printout:

Sure, I could hunt around in the code, and try and guess where you are printing this out, and wonder whether or not you have made any other changes to the code. If you expect that level of effort from others, when you have put so little effort into writing the question, I think your expectations need to be reset.

In this community, you are the one asking for help. Others have no responsibility to give it to you. Therefore, when asking for help:

  1. Be polite
  2. Be helpful
  3. Be responsive to requests

That formula is not a “law” or any set of published rules. However it is the formula that is most likely to get you help. You can choose not to follow it if you wish. If you are grossly impolite I will just delete your post. Otherwise, the ramification is that you’re less likely to get help, based on my experience working on this forum.

Asking for code is not an unreasonable request, even if it’s only slightly modified or not modified at all. There are a number of good reasons for this:

  1. It means people don’t have to go searching for it. See item 2 above. (You want people to help, right? Then be helfpul.)
  2. If by chance you made modifications, it will remove any lack of clarity around that.
  3. If the original source of the code is removed (e.g. that CUDA sample code “goes away”) the question will still be coherent for future readers. (Yes, this matters to me.)

Further posts of any kind, from anyone, to this topic, if they are in any way impolite according to my opinion, will be deleted without notice or warning by me.

set max_depth = 3, I add printf in cdpQuadtree(), print last level node data ranage info

// Allocate GPU structs, launch kernel and clean up
bool cdpQuadtree(int warp_size)
    // Constants to control the algorithm.
    const int num_points = 1024;
    const int max_depth  = 3;
    const int min_points_per_node = 16;

    // Allocate memory for points.
    thrust::device_vector<float> x_d0(num_points);
    thrust::device_vector<float> x_d1(num_points);
    thrust::device_vector<float> y_d0(num_points);
    thrust::device_vector<float> y_d1(num_points);

    // Generate random points.
    Random_generator rnd;
        thrust::make_zip_iterator(thrust::make_tuple(x_d0.begin(), y_d0.begin())),
        thrust::make_zip_iterator(thrust::make_tuple(x_d0.end(), y_d0.end())),

    // Host structures to analyze the device ones.
    Points points_init[2];
    points_init[0].set(thrust::raw_pointer_cast(&x_d0[0]), thrust::raw_pointer_cast(&y_d0[0]));
    points_init[1].set(thrust::raw_pointer_cast(&x_d1[0]), thrust::raw_pointer_cast(&y_d1[0]));

    // Allocate memory to store points.
    Points *points;
    checkCudaErrors(cudaMalloc((void **) &points, 2*sizeof(Points)));
    checkCudaErrors(cudaMemcpy(points, points_init, 2*sizeof(Points), cudaMemcpyHostToDevice));

    // We could use a close form...
    int max_nodes = 0;

    for (int i = 0, num_nodes_at_level = 1 ; i < max_depth ; ++i, num_nodes_at_level *= 4)
        max_nodes += num_nodes_at_level;

    // Allocate memory to store the tree.
    Quadtree_node root;
    root.set_range(0, num_points);
    Quadtree_node *nodes;
    checkCudaErrors(cudaMalloc((void **) &nodes, max_nodes*sizeof(Quadtree_node)));
    checkCudaErrors(cudaMemcpy(nodes, &root, sizeof(Quadtree_node), cudaMemcpyHostToDevice));

    // We set the recursion limit for CDP to max_depth.
    cudaDeviceSetLimit(cudaLimitDevRuntimeSyncDepth, max_depth);

    // Build the quadtree.
    Parameters params(max_depth, min_points_per_node);
    std::cout << "Launching CDP kernel to build the quadtree" << std::endl;
    const int NUM_THREADS_PER_BLOCK = 128; // Do not use less than 128 threads.
    const int NUM_WARPS_PER_BLOCK = NUM_THREADS_PER_BLOCK / warp_size;
    const size_t smem_size = 4*NUM_WARPS_PER_BLOCK*sizeof(int);
    build_quadtree_kernel<NUM_THREADS_PER_BLOCK><<<1, NUM_THREADS_PER_BLOCK, smem_size>>>(nodes, points, params);

    // Copy points to CPU.
    thrust::host_vector<float> x_h(x_d0);
    thrust::host_vector<float> y_h(y_d0);
    Points host_points;
    host_points.set(thrust::raw_pointer_cast(&x_h[0]), thrust::raw_pointer_cast(&y_h[0]));

    // Copy nodes to CPU.
    Quadtree_node *host_nodes = new Quadtree_node[max_nodes];
    checkCudaErrors(cudaMemcpy(host_nodes, nodes, max_nodes *sizeof(Quadtree_node), cudaMemcpyDeviceToHost));

    // Validate the results.
    bool ok = check_quadtree(host_nodes, 0, num_points, &host_points, params);
    std::cout << "Results: " << (ok ? "OK" : "FAILED") << std::endl;

	printf("max_nodes: %d\n", max_nodes);

	int leaf_node_count = pow(4, max_depth - 1);
	printf("leaf_node_count: %d\n", leaf_node_count);

	Quadtree_node* node_ptr = host_nodes;

	node_ptr = node_ptr + (max_nodes - leaf_node_count);
	Quadtree_node* node_end_ptr = host_nodes + max_nodes - 1;
	for (;;) {
		if (node_ptr != node_end_ptr + 1) {
			/*printf("coor: %7f, %7f___%7f, %7f\n", node_ptr->bounding_box().get_min().x, node_ptr->bounding_box().get_min().y,
				node_ptr->bounding_box().get_max().x, node_ptr->bounding_box().get_max().y);*/
			printf("coor: %7d, %7d___\n", node_ptr->points_begin(), node_ptr->points_end());
		else {

    // Free CPU memory.
    delete[] host_nodes;

    // Free memory.

    return ok;

what is this? “you might as well consider hiring a psychic.”
from cbuchner1. That’s rude.

Yes, it is.

We’re not always perfect on this forum, whether that be in our words or in our (my) moderation skills. However:

  • I cannot police everything, and don’t really want to. I view this (psychic) statement as a mild issue. The thread overall was heading in a direction I didn’t like, so I felt it was necessary to put a stop to the back-and-forth.

  • cbuchner1 has been a long time contributor to this forum, helping many people over time. You don’t have to take my word for it, its all public. In short cbuchner1 is a respected and valued member of the community. In those cases, I’ll overlook trifles. We are all quirky in our own various ways.

  • The statement you excerpted was followed by a whole paragraph of what I consider to be good advice. I consider that paragraph to be helpful, and in line with the goals of this forum. Words like “please” were used. In light of that evident display of what is commonly recognized as being “polite”, I don’t over-emphasize the significance of the (psychic) statement.

Taking things like that into account, I’m inclined to overlook such things, if it doesn’t devolve into a conflict. Unfortunately in this case it didn’t work out.

Thanks for your consideration. No disrespect is intended towards you, I assure you, whether it is by me or cbuchner1. I feel comfortable and confident saying that. I’m sorry this thread got to this point.

Questions in the style of #1 that lack pretty much all information that would allow one to formulate an answer appear disrepectful and yes, rude, to me and maybe to others as well. Given human nature, rudeness often engenders rudeness.

While I am able to suppress the urge to reply in a similar vein most of the time, I am sure I have injected a comment like “My magic eightball is broken” or “I flunked clairvoyance class” in these forums on a few occasions. In other words, I have some definite ideas where answers like #2 come from.

I am not saying that is a good thing: snarky responses do not improve anything, and can lead to further escalation. IMHO, a better strategy is to completely ignore such questions (or on Stackoverflow: downvote and close vote without comment). I find that it takes some discipline to apply that approach consistently …