计算OpenCL C中内核代码的运行时间

时间:2022-08-06 13:54:58

I want to measure the performance (read runtime) of my kernel code on various devices viz CPU and GPUs. The kernel code that I wrote is:

我想测量我的内核代码在各种设备上的性能(读取运行时),即CPU和GPU。我写的内核代码是:

__kernel void dataParallel(__global int* A)
{  
    sleep(10);
    A[0]=2;
    A[1]=3;
    A[2]=5;
    int pnp;//pnp=probable next prime
    int pprime;//previous prime
    int i,j;
    for(i=3;i<500;i++)
    {
        j=0;
        pprime=A[i-1];
        pnp=pprime+2;
        while((j<i) && A[j]<=sqrt((float)pnp))
        {
            if(pnp%A[j]==0)
                {
                    pnp+=2;
                    j=0;
                }
            j++;

        }
        A[i]=pnp;

    }
}

However I have been told that it is not possible to use sleep() in the kernel code. If that is true then can someone give the reason and if it isn't please tell the way to implement the same.

但是我被告知在内核代码中不可能使用sleep()。如果这是真的那么有人可以给出原因,如果不是,请告诉实现相同的方法。

Also, as I said that I wish to compare the performance of my CPU and the GPUs, one of the ways to achieve that is by computing the run time of the kernel code on the various devices while if there was another way by which I could get the code to start executing on all the devices at the same time then I would just have to list the corresponding end time of execution and that would serve the purpose as well! Is it possible?

另外,正如我所说,我希望比较我的CPU和GPU的性能,实现这一点的方法之一是通过计算各种设备上的内核代码的运行时间,而如果还有另一种方法可以实现得到代码同时开始在所有设备上执行然后我只需要列出相应的执行结束时间,这也将达到目的!可能吗?

Hardware Details:

硬件细节:

GPU: AMD FirePro W7000, NVIDIA TESLA C2075 CPU: Intel(R) XEON(R) CPU X5660 @ 2.80GHZn

GPU:AMD FirePro W7000,NVIDIA TESLA C2075 CPU:Intel(R)XEON(R)CPU X5660 @ 2.80GHZn

1 个解决方案

#1


2  

However I have been told that it is not possible to use sleep() in the kernel code.

但是我被告知在内核代码中不可能使用sleep()。

It's not that it's not possible; it might be. I don't know. That's not really specified in C. Having said that, it's simply not a good idea to block execution of a kernel until a period of time has elapsed. Even in general purpose programming, that doesn't seem like a good idea. Your function should finish processing as soon as possible, or pass control back to the kernel so that it can find something else to do while it's waiting on idle tasks.

这不是不可能的;有可能。我不知道。这在C中并没有真正说明。尽管如此,在一段时间过去之前阻止执行内核并不是一个好主意。即使在通用编程中,这似乎也不是一个好主意。您的函数应该尽快完成处理,或者将控制权交还给内核,这样它就可以在等待空闲任务时找到别的东西。

Also, as I said that I wish to compare the performance of my CPU and the GPUs, one of the ways to achieve that is by computing the run time of the kernel code on the various devices while if there was another way by which I could get the code to start executing on all the devices at the same time then I would just have to list the corresponding end time of execution and that would serve the purpose as well! Is it possible?

另外,正如我所说,我希望比较我的CPU和GPU的性能,实现这一点的方法之一是通过计算各种设备上的内核代码的运行时间,而如果还有另一种方法可以实现得到代码同时开始在所有设备上执行然后我只需要列出相应的执行结束时间,这也将达到目的!可能吗?

Sure, something like that... but... I'm not even sure why you think injecting sleep(10) into each task will help you; you haven't explained that here. It doesn't seem like a requirement for profiling your code (e.g. checking its speed). Have you ever heard of the XY problem? I think sleep is your Y variable, in this case.

当然,这样的事情......但是......我甚至不确定你为什么认为在每项任务中注入睡眠(10)会对你有所帮助;你没有在这里解释过。这似乎不是分析代码的必要条件(例如检查其速度)。你听说过XY问题吗?在这种情况下,我认为睡眠是你的Y变量。

I mentioned profiling just now. Have you learnt about profilers? They do exactly what it is you're aiming to do, except that they do it without you having to write any code. Here's a tutorial on using perf to profile the Linux kernel...

我刚才提到了剖析。你了解了剖面仪吗?它们完全按照您的目标去做,除非他们不必编写任何代码。这是一个使用perf来分析Linux内核的教程......

#1


2  

However I have been told that it is not possible to use sleep() in the kernel code.

但是我被告知在内核代码中不可能使用sleep()。

It's not that it's not possible; it might be. I don't know. That's not really specified in C. Having said that, it's simply not a good idea to block execution of a kernel until a period of time has elapsed. Even in general purpose programming, that doesn't seem like a good idea. Your function should finish processing as soon as possible, or pass control back to the kernel so that it can find something else to do while it's waiting on idle tasks.

这不是不可能的;有可能。我不知道。这在C中并没有真正说明。尽管如此,在一段时间过去之前阻止执行内核并不是一个好主意。即使在通用编程中,这似乎也不是一个好主意。您的函数应该尽快完成处理,或者将控制权交还给内核,这样它就可以在等待空闲任务时找到别的东西。

Also, as I said that I wish to compare the performance of my CPU and the GPUs, one of the ways to achieve that is by computing the run time of the kernel code on the various devices while if there was another way by which I could get the code to start executing on all the devices at the same time then I would just have to list the corresponding end time of execution and that would serve the purpose as well! Is it possible?

另外,正如我所说,我希望比较我的CPU和GPU的性能,实现这一点的方法之一是通过计算各种设备上的内核代码的运行时间,而如果还有另一种方法可以实现得到代码同时开始在所有设备上执行然后我只需要列出相应的执行结束时间,这也将达到目的!可能吗?

Sure, something like that... but... I'm not even sure why you think injecting sleep(10) into each task will help you; you haven't explained that here. It doesn't seem like a requirement for profiling your code (e.g. checking its speed). Have you ever heard of the XY problem? I think sleep is your Y variable, in this case.

当然,这样的事情......但是......我甚至不确定你为什么认为在每项任务中注入睡眠(10)会对你有所帮助;你没有在这里解释过。这似乎不是分析代码的必要条件(例如检查其速度)。你听说过XY问题吗?在这种情况下,我认为睡眠是你的Y变量。

I mentioned profiling just now. Have you learnt about profilers? They do exactly what it is you're aiming to do, except that they do it without you having to write any code. Here's a tutorial on using perf to profile the Linux kernel...

我刚才提到了剖析。你了解了剖面仪吗?它们完全按照您的目标去做,除非他们不必编写任何代码。这是一个使用perf来分析Linux内核的教程......