在多台计算机上扩展多线程应用程序

时间:2022-09-23 04:07:47

I'm working on a project were we need more performance. Over time we've continued to evolve the design to work more in parallel(both threaded and distributed). Then latest step has been to move part of it onto a new machine with 16 cores. I'm finding that we need to rethink how we do things to scale to that many cores in a shared memory model. For example the standard memory allocator isn't good enough.

我正在做一个项目我们需要更多的性能。随着时间的推移,我们不断改进设计,使其更加并行(包括线程和分布式)。然后最新的一步是将其中的一部分移动到具有16个核心的新机器上。我发现我们需要重新思考如何在共享内存模型中扩展到那么多内核。例如,标准内存分配器不够好。

What resources would people recommend?

人们会推荐哪些资源?

So far I've found Sutter's column Dr. Dobbs to be a good start. I just got The Art of Multiprocessor Programming and The O'Reilly book on Intel Threading Building Blocks

到目前为止,我发现Sutter的专栏Dobbs博士是一个好的开始。我刚刚获得了多处理器编程的艺术和O'Reilly关于英特尔线程构建模块的书籍

8 个解决方案

#1


5  

A couple of other books that are going to be helpful are:

其他一些有用的书籍是:

Also, consider relying less on sharing state between concurrent processes. You'll scale much, much better if you can avoid it because you'll be able to parcel out independent units of work without having to do as much synchronization between them.

另外,请考虑减少并发进程之间的共享状态。如果可以避免它,你会扩展得更好,更好,因为你将能够分离出独立的工作单元,而不必在它们之间进行同样多的同步。

Even if you need to share some state, see if you can partition the shared state from the actual processing. That will let you do as much of the processing in parallel, independently from the integration of the completed units of work back into the shared state. Obviously this doesn't work if you have dependencies among units of work, but it's worth investigating instead of just assuming that the state is always going to be shared.

即使您需要共享某个状态,也可以查看是否可以将共享状态与实际处理分开。这将使您可以并行执行尽可能多的处理,而不必将已完成的工作单元集成回共享状态。显然,如果您在工作单元之间存在依赖关系,则这不起作用,但值得研究而不是仅仅假设状态总是将被共享。

#2


3  

You might want to check out Google's Performance Tools. They've released their version of malloc they use for multi-threaded applications. It also includes a nice set of profiling tools.

您可能想查看Google的效果工具。他们发布了他们用于多线程应用程序的malloc版本。它还包括一组很好的分析工具。

#3


2  

Jeffrey Richter is into threading a lot. He has a few chapters on threading in his books and check out his blog:

杰弗里里希特正在进行大量的线程化。他在书中有几个关于线程的章节,并查看他的博客:

http://www.wintellect.com/cs/blogs/jeffreyr/default.aspx.

#4


2  

As monty python would say "and now for something completely different" - you could try a language/environment that doesn't use threads, but processes and messaging (no shared state). One of the most mature ones is erlang (and this excellent and fun book: http://www.pragprog.com/titles/jaerlang/programming-erlang). May not be exactly relevant to your circumstances, but you can still learn a lot of ideas that you may be able to apply in other tools.

正如monty python会说“现在是完全不同的东西” - 你可以尝试不使用线程的语言/环境,但是进程和消息传递(没有共享状态)。最成熟的一个是erlang(这本优秀而有趣的书:http://www.pragprog.com/titles/jaerlang/programming-erlang)。可能与您的情况不完全相关,但您仍然可以学到很多可以在其他工具中应用的想法。

For other environments:

对于其他环境:

.Net has F# (to learn functional programming). JVM has Scala (which has actors, very much like Erlang, and is functional hybrid language). Also there is the "fork join" framework from Doug Lea for Java which does a lot of the hard work for you.

.Net有F#(学习函数式编程)。 JVM有Scala(有演员,非常像Erlang,是功能混合语言)。还有来自Doug Lea for Java的“fork join”框架,它为您做了很多艰苦的工作。

#5


1  

The allocator in FreeBSD recently got an update for FreeBSD 7. The new one is called jemaloc and is apparently much more scaleable with respect to multiple threads.

FreeBSD中的分配器最近获得了FreeBSD 7的更新。新的一个叫做jemaloc,对于多线程来说显然更加可扩展。

You didn't mention which platform you are using, so perhaps this allocator is available to you. (I believe Firefox 3 uses jemalloc, even on windows. So ports must exist somewhere.)

您没有提到您正在使用的平台,因此也许您可以使用此分配器。 (我相信Firefox 3甚至在Windows上使用jemalloc。所以端口必须存在于某个地方。)

#6


0  

Take a look at Hoard if you are doing a lot of memory allocation.

如果您正在进行大量内存分配,请查看Hoard。

Roll your own Lock Free List. A good resource is here - it's in C# but the ideas are portable. Once you get used to how they work you start seeing other places where they can be used and not just in lists.

滚动你自己的锁定免费清单。这里有一个很好的资源 - 它在C#中,但这些想法是可移植的。一旦你习惯了它们的工作方式,你就会开始看到可以使用它们的其他地方,而不仅仅是列表。

#7


0  

I will have to check-out Hoard, Google Perftools and jemalloc sometime. For now we are using scalable_malloc from Intel Threading Building Blocks and it performs well enough.

我必须在某个时候检查Hoard,Google Perftools和jemalloc。目前我们正在使用英特尔线程构建模块中的scalable_malloc,它运行良好。

For better or worse, we're using C++ on Windows, though much of our code will compile with gcc just fine. Unless there's a compelling reason to move to redhat (the main linux distro we use), I doubt it's worth the headache/political trouble to move.

无论好坏,我们在Windows上使用C ++,尽管我们的许多代码都可以使用gcc进行编译。除非有一个令人信服的理由转向redhat(我们使用的主要Linux发行版),否则我怀疑它是否值得头疼/政治麻烦。

I would love to use Erlang, but there way to much here to redo it now. If we think about the requirements around the development of Erlang in a telco setting, the are very similar to our world (electronic trading). Armstrong's book is on my to read stack :)

我很想使用Erlang,但现在有很多方法可以重做它。如果我们考虑在电信公司环境中围绕Erlang开发的要求,它们与我们的世界(电子交易)非常相似。阿姆斯特朗的书在我看书堆栈:)

In my testing to scale out from 4 cores to 16 cores I've learned to appreciate the cost of any locking/contention in the parallel portion of the code. Luckily we have a large portion that scales with the data, but even that didn't work at first because of an extra lock and the memory allocator.

在我从4核扩展到16核的测试中,我学会了理解代码并行部分中任何锁定/争用的成本。幸运的是,我们有很大一部分可以与数据进行扩展,但是由于额外的锁定和内存分配器,即使这样也没有起作用。

#8


0  

I maintain a concurrency link blog that may be of ongoing interest:

我维护一个可能持续感兴趣的并发链接博客:

http://concurrency.tumblr.com

#1


5  

A couple of other books that are going to be helpful are:

其他一些有用的书籍是:

Also, consider relying less on sharing state between concurrent processes. You'll scale much, much better if you can avoid it because you'll be able to parcel out independent units of work without having to do as much synchronization between them.

另外,请考虑减少并发进程之间的共享状态。如果可以避免它,你会扩展得更好,更好,因为你将能够分离出独立的工作单元,而不必在它们之间进行同样多的同步。

Even if you need to share some state, see if you can partition the shared state from the actual processing. That will let you do as much of the processing in parallel, independently from the integration of the completed units of work back into the shared state. Obviously this doesn't work if you have dependencies among units of work, but it's worth investigating instead of just assuming that the state is always going to be shared.

即使您需要共享某个状态,也可以查看是否可以将共享状态与实际处理分开。这将使您可以并行执行尽可能多的处理,而不必将已完成的工作单元集成回共享状态。显然,如果您在工作单元之间存在依赖关系,则这不起作用,但值得研究而不是仅仅假设状态总是将被共享。

#2


3  

You might want to check out Google's Performance Tools. They've released their version of malloc they use for multi-threaded applications. It also includes a nice set of profiling tools.

您可能想查看Google的效果工具。他们发布了他们用于多线程应用程序的malloc版本。它还包括一组很好的分析工具。

#3


2  

Jeffrey Richter is into threading a lot. He has a few chapters on threading in his books and check out his blog:

杰弗里里希特正在进行大量的线程化。他在书中有几个关于线程的章节,并查看他的博客:

http://www.wintellect.com/cs/blogs/jeffreyr/default.aspx.

#4


2  

As monty python would say "and now for something completely different" - you could try a language/environment that doesn't use threads, but processes and messaging (no shared state). One of the most mature ones is erlang (and this excellent and fun book: http://www.pragprog.com/titles/jaerlang/programming-erlang). May not be exactly relevant to your circumstances, but you can still learn a lot of ideas that you may be able to apply in other tools.

正如monty python会说“现在是完全不同的东西” - 你可以尝试不使用线程的语言/环境,但是进程和消息传递(没有共享状态)。最成熟的一个是erlang(这本优秀而有趣的书:http://www.pragprog.com/titles/jaerlang/programming-erlang)。可能与您的情况不完全相关,但您仍然可以学到很多可以在其他工具中应用的想法。

For other environments:

对于其他环境:

.Net has F# (to learn functional programming). JVM has Scala (which has actors, very much like Erlang, and is functional hybrid language). Also there is the "fork join" framework from Doug Lea for Java which does a lot of the hard work for you.

.Net有F#(学习函数式编程)。 JVM有Scala(有演员,非常像Erlang,是功能混合语言)。还有来自Doug Lea for Java的“fork join”框架,它为您做了很多艰苦的工作。

#5


1  

The allocator in FreeBSD recently got an update for FreeBSD 7. The new one is called jemaloc and is apparently much more scaleable with respect to multiple threads.

FreeBSD中的分配器最近获得了FreeBSD 7的更新。新的一个叫做jemaloc,对于多线程来说显然更加可扩展。

You didn't mention which platform you are using, so perhaps this allocator is available to you. (I believe Firefox 3 uses jemalloc, even on windows. So ports must exist somewhere.)

您没有提到您正在使用的平台,因此也许您可以使用此分配器。 (我相信Firefox 3甚至在Windows上使用jemalloc。所以端口必须存在于某个地方。)

#6


0  

Take a look at Hoard if you are doing a lot of memory allocation.

如果您正在进行大量内存分配,请查看Hoard。

Roll your own Lock Free List. A good resource is here - it's in C# but the ideas are portable. Once you get used to how they work you start seeing other places where they can be used and not just in lists.

滚动你自己的锁定免费清单。这里有一个很好的资源 - 它在C#中,但这些想法是可移植的。一旦你习惯了它们的工作方式,你就会开始看到可以使用它们的其他地方,而不仅仅是列表。

#7


0  

I will have to check-out Hoard, Google Perftools and jemalloc sometime. For now we are using scalable_malloc from Intel Threading Building Blocks and it performs well enough.

我必须在某个时候检查Hoard,Google Perftools和jemalloc。目前我们正在使用英特尔线程构建模块中的scalable_malloc,它运行良好。

For better or worse, we're using C++ on Windows, though much of our code will compile with gcc just fine. Unless there's a compelling reason to move to redhat (the main linux distro we use), I doubt it's worth the headache/political trouble to move.

无论好坏,我们在Windows上使用C ++,尽管我们的许多代码都可以使用gcc进行编译。除非有一个令人信服的理由转向redhat(我们使用的主要Linux发行版),否则我怀疑它是否值得头疼/政治麻烦。

I would love to use Erlang, but there way to much here to redo it now. If we think about the requirements around the development of Erlang in a telco setting, the are very similar to our world (electronic trading). Armstrong's book is on my to read stack :)

我很想使用Erlang,但现在有很多方法可以重做它。如果我们考虑在电信公司环境中围绕Erlang开发的要求,它们与我们的世界(电子交易)非常相似。阿姆斯特朗的书在我看书堆栈:)

In my testing to scale out from 4 cores to 16 cores I've learned to appreciate the cost of any locking/contention in the parallel portion of the code. Luckily we have a large portion that scales with the data, but even that didn't work at first because of an extra lock and the memory allocator.

在我从4核扩展到16核的测试中,我学会了理解代码并行部分中任何锁定/争用的成本。幸运的是,我们有很大一部分可以与数据进行扩展,但是由于额外的锁定和内存分配器,即使这样也没有起作用。

#8


0  

I maintain a concurrency link blog that may be of ongoing interest:

我维护一个可能持续感兴趣的并发链接博客:

http://concurrency.tumblr.com