寻找cuda中n个点之间的最小距离

时间:2021-12-20 03:35:49

I am trying to find the minimum distance between n points in cuda. I wrote the below code. This is working fine for number of points from 1 to 1024 i.e., 1 block. But if num_points is greater than 1024 i am getting wrong value for minimum distance. I am checking the gpu min value with the value I found in CPU using brute force algorithm. The min value is stored in the temp1[0] at the end of kernel function.

我试图找到cuda中n点之间的最小距离。我写了下面的代码。这适用于从1到1024的点数,即1个块。但是如果num_points大于1024,我得到的最小距离值是错误的。我正在使用暴力算法检查cpu min值和我在CPU中找到的值。最小值存储在内核函数末尾的temp1 [0]中。

I don't know what is wrong in this. Please help me out..

我不知道这有什么问题。请帮帮我..

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>
#define MAX_POINTS 50000

__global__ void minimum_distance(float * X, float * Y, float * D, int n) {

__shared__ float temp[1024];
float temp1[1024];
int tid = threadIdx.x;
int bid = blockIdx.x;
int ref = tid+bid*blockDim.x;
temp[ref] = 1E+37F;
temp1[bid] = 1E+37F;
float dx,dy;
float Dij;
int i;

            //each thread will take a point and find min dist to all
            // points greater than its unique id(ref)
    for (i = ref + 1; i < n; i++) 
    {
        dx = X[ref]-X[i];
        dy = Y[ref]-Y[i];
        Dij = sqrtf(dx*dx+dy*dy);
        if (temp[tid] > Dij)
        {
         temp[tid] = Dij;
        }
    }   

    __syncthreads();

            //In each block the min value is stored in temp[0]
    if(tid == 0)
    {
        if( bid == (n-1)/1024 ) {
        int end = n - (bid) * 1024;
        for (i = 1; i < end; i++ )   
        {
            if (temp[i] < temp[tid])
            temp[tid] = temp[i];
        }
        temp1[bid] = temp[tid];
        }
        else {
        for (i = 1; i < 1024; i++ )   
        {
            if (temp[i] < temp[tid])
            temp[tid] = temp[i];
        }
        temp1[bid] = temp[tid];
        }   
    }

__syncthreads();

    //Here the min value is stored in temp1[0]
if (ref == 0)
{
    for (i = 1; i <= (n-1)/1024; i++)
        if( temp1[bid] > temp1[i])
            temp1[bid] = temp1[i];

    *D=temp1[bid];
}
}

//part of Main function 
//kernel function invocation
// Invoking kernel of 1D grid and block sizes  
// Vx and Vy are arrays of x-coordinates and y-coordinates respectively  

int main(int argc, char* argv[]) {
.
.
blocks = (num_points-1)/1024 + 1;
minimum_distance<<<blocks,1024>>>(Vx,Vy,dmin_dist,num_points);
.
.

1 个解决方案

#1


-1  

I'd say what's wrong is your choice of algorithm. You can certainly do better than O(n^2) - even if yours is pretty straightforward. Sure, on 5,000 points it might not seem terrible, but try 50,000 points and you'll feel the pain...

我会说你的算法选择有什么不对。你当然可以做得比O(n ^ 2)更好 - 即使你很简单。当然,在5,000分上看起来似乎并不可怕,但尝试50,000分,你会感受到痛苦......

I'd think about parallelizing the construction of a Voronoi Diagram, or maybe some kind of BSP-like structure which might be easier to query with less code divergence.

我想考虑并行化Voronoi图的构造,或者某种类似BSP的结构,这可能更容易查询,代码分歧更少。

#1


-1  

I'd say what's wrong is your choice of algorithm. You can certainly do better than O(n^2) - even if yours is pretty straightforward. Sure, on 5,000 points it might not seem terrible, but try 50,000 points and you'll feel the pain...

我会说你的算法选择有什么不对。你当然可以做得比O(n ^ 2)更好 - 即使你很简单。当然,在5,000分上看起来似乎并不可怕,但尝试50,000分,你会感受到痛苦......

I'd think about parallelizing the construction of a Voronoi Diagram, or maybe some kind of BSP-like structure which might be easier to query with less code divergence.

我想考虑并行化Voronoi图的构造,或者某种类似BSP的结构,这可能更容易查询,代码分歧更少。