如何检测一个线程或进程是否由于操作系统调度而变得缺乏资源

This is on Linux OS. App is written in C++ with ACE library.

这是在Linux操作系统上。App是用c++和ACE库编写的。

I am suspecting that one of the thread in the process is getting blocked for unusually long time(5 to 40 seconds) sometimes. The app runs fine most of the times except couple times a day it has this issue. There are other similar 5 apps running on the box which are also I/O bound due to heavy socket incoming data.

我怀疑进程中的一个线程有时会被阻塞很长时间(5到40秒)。这个应用程序大部分时间运行良好，除了一天有几次它有这个问题。另外还有5个类似的应用程序运行在box上，由于大量的socket输入数据，这些应用程序也被I/O绑定。

I would like to know if there is any thing I can do programatically to see if the thread/process are getting their time slice.

我想知道是否有什么事情我可以编程地做，看看线程/进程是否得到它们的时间片。

1 个解决方案

#1

If a process is being starved out, self monitoring for that process would not be that productive. But, if you just want that process to notice it hasn't been run in a while, it can call times periodically and compare the relative difference in elapsed time with the relative difference in scheduled user time (you would sum the tms_utime and tms_cutime fields if you want to count waiting for children as productive time, and you would sum in the tms_stime and tms_cstime fields if you count kernel time spent on your behalf to be productive time). For thread times, the only way I know of is to consult the /proc filesystem.

如果一个过程被耗尽，那么对这个过程的自我监控就不会那么有效。但是,如果你只是想让这一过程注意到它没有运行,它可以调用次定期和比较的相对差异在时间与相对差异在预定用户时间(你会和tms_utime和tms_cutime字段如果你想计算生产时间等待的孩子,和你会和tms_stime和tms_cstime字段如果算内核时间为你生产时间)。对于线程时间，我所知道的惟一方法是查询/proc文件系统。

A high priority external process or high priority thread could externally monitor processes (and threads) of interest by reading the appropriate /proc/<pid>/stat entries for the process (and /proc/<pid>/task/<tid>/stat for the threads). The user times are found in the 14th and 16th fields of the stat file. The system times are found in the 15th and 17th fields. (The field positions are accurate for my Linux 2.6 kernel.)

一个高优先级的外部进程或高优先级的线程可以通过为进程读取适当的/proc/ /stat条目(以及/proc/ pid>/task/ /stat)来监视感兴趣的进程(和线程)。在stat文件的第14和第16个字段中可以找到用户时间。系统时间分布在第15和第17个领域。(我的Linux 2.6内核的字段位置是准确的。)

Between two time points, you determine the amount of elapsed time that has passed (a monitor process or thread would usually wake up at regular intervals). Then the difference between the cumulative processing times at each of those time points represents how much time the thread of interest got to run during that time. The ratio of processing time to elapsed time would represent the time slice.

在两个时间点之间，您确定已经过的时间量(监控进程或线程通常会定期醒来)。然后，每个时间点的累积处理时间之间的差值代表了在那段时间内要运行的线程的时间。处理时间与运行时间的比率将表示时间片。

One last bit of info: On Linux, I use the following to obtain the tid of the current thread for examining the right task in the /proc/<pid>/task/ directory:

最后一点信息:在Linux上，我使用以下信息获取当前线程的tid，用于检查/proc/ /task/目录中的正确任务:

tid = syscall(__NR_gettid);

I do this, because I could not find the gettid system call actually exported by any library on my system, even though it was documented. But, it might be available on yours.

我这样做是因为我找不到gettid系统调用实际上是由我的系统上的任何库导出的，即使它是有文档记录的。但是，你的电脑上可能有。

#1