如何在x86、arm、GCC和icc上工作的Linux上执行原子操作?

时间:2022-12-29 03:14:55

Every Modern OS provides today some atomic operations:

每个现代操作系统都提供了一些原子操作:

  • Windows has Interlocked* API
  • Windows联锁* API
  • FreeBSD has <machine/atomic.h>
  • FreeBSD <机 atomic.h>
  • Solaris has <atomic.h>
  • Solaris有< atomic.h >
  • Mac OS X has <libkern/OSAtomic.h>
  • Mac OS X有

Anything like that for Linux?

Linux有类似的吗?

  • I need it to work on most Linux supported platforms including: x86, x86_64 and arm.
  • 我需要它在大多数Linux支持的平台上工作,包括:x86、x86_64和arm。
  • I need it to work on at least GCC and Intel Compiler.
  • 我需要它至少在GCC和Intel编译器上工作。
  • I need not to use 3rd par library like glib or qt.
  • 我不需要像glib或qt那样使用第三级库。
  • I need it to work in C++ (C not required)
  • 我需要它在c++中工作(不需要C)

Issues:

问题:

  • GCC atomic builtins __sync_* are not supported on all platforms (ARM) and are not supported by the Intel compiler.
  • 所有平台(ARM)都不支持GCC原子内置函数__sync_*, Intel编译器也不支持。
  • AFAIK <asm/atomic.h> should not be used in user space and I haven't successfully used it at all. Also, I'm not sure if it would work with Intel compiler.
  • < asm /原子的。h>不应该在用户空间中使用,我还没有成功地使用它。另外,我不确定它是否能与Intel编译器兼容。

Any suggestions?

有什么建议吗?

I know that there are many related questions but some of them point to __sync* which is not feasible for me (ARM) and some point to asm/atomic.h.

我知道有很多相关的问题,但其中一些问题指向__sync*,这对我(ARM)来说是不可行的,而另一些问题则指向asm/atomic.h。

Maybe there is an inline assembly library that does this for GCC (ICC supports gcc assembly)?

也许有一个内联程序集库可以为GCC实现这一点(ICC支持GCC汇编)?

Edit:

编辑:

There is a very partial solution for add operations only (allows implementing atomic counter but not lock free-structures that require CAS):

只有添加操作才有非常部分的解决方案(允许实现原子计数器,但不允许锁定需要CAS的*结构):

If you use libstc++ (Intel Compiler uses libstdc++) then you can use __gnu_cxx::__exchange_and_add that defined in <ext/atomicity.h> or <bits/atomicity.h>. Depends on compiler version.

如果您使用libstc++ (Intel Compiler使用libstdc++),那么您可以使用__gnu_cxx::__exchange_and_add在 中定义的内容。h >或 <比特 atomicity.h> 。取决于编译器版本。

However I'd still like to see something that supports CAS.

但是我仍然希望看到支持CAS的东西。

9 个解决方案

#1


19  

Projects are using this:

项目使用的是这样的:

http://packages.debian.org/source/sid/libatomic-ops

http://packages.debian.org/source/sid/libatomic-ops

If you want simple operations such as CAS, can't you just just use the arch-specific implementations out of the kernel, and do arch checks in user-space with autotools/cmake? As far as licensing goes, although the kernel is GPL, I think it's arguable that the inline assembly for these operations is provided by Intel/AMD, not that the kernel has a license on them. They just happen to be in an easily accessible form in the kernel source.

如果您想要简单的操作,比如CAS,您就不能仅仅使用内核中特定的实现,并在用户空间中使用autotools/cmake进行arch检查吗?至于许可,尽管内核是GPL,但我认为这些操作的内联程序集是由Intel/AMD提供的,而不是内核有许可证。它们只是在内核源代码中以一种容易访问的形式出现。

#2


12  

Recent standards (from 2011) of C & C++ now specify atomic operations:

C & c++最近的标准(从2011年开始)现在明确了原子操作:

Regardless, your platform or compiler may not support these newer headers & features.

无论如何,您的平台或编译器可能不支持这些更新的头和特性。

#3


3  

Darn. I was going to suggest the GCC primitives, then you said they were off limits. :-)

该死的。我想建议使用GCC原语,然后你说它们是禁止的。:-)

In that case, I would do an #ifdef for each architecture/compiler combination you care about and code up the inline asm. And maybe check for __GNUC__ or some similar macro and use the GCC primitives if they are available, because it feels so much more right to use those. :-)

在这种情况下,我将为您关心的每个体系结构/编译器组合执行#ifdef,并对内联asm进行编码。如果可以,可以检查__GNUC__或类似的宏,并使用GCC原语,因为使用它们感觉更合适。:-)

You are going to have a lot of duplication and it might be difficult to verify correctness, but this seems to be the way a lot of projects do this, and I've had good results with it.

你将会有很多重复,可能很难验证正确性,但这似乎是很多项目的做法,我已经有了很好的结果。

Some gotchas that have bit me in the past: when using GCC, don't forget "asm volatile" and clobbers for "memory" and "cc", etc.

有些问题在过去曾困扰过我:使用GCC时,不要忘记“asm volatile”和“内存”和“cc”等等。

#4


1  

Boost, which has a non intrusive license, and other frameworks already offer portable atomic counters -- as long as they are supported on the target platform.

Boost具有非入侵许可,其他框架已经提供了可移植的原子计数器——只要目标平台支持这些计数器。

Third party libraries are good for us. And if for strange reasons your company forbid you from using them, you can still have a look at how they proceed (as long as the licence permit it for your use) to implement what your are looking for.

第三方图书馆对我们有好处。如果出于奇怪的原因,你的公司禁止你使用它们,你仍然可以看看它们是如何执行的(只要许可证允许你使用它们)来实现你正在寻找的东西。

#5


1  

I recently did an implementation of such a thing and I was confronted to the same difficulties as you are. My solution was basically the following:

我最近做了一件这样的事情,我遇到了和你一样的困难。我的解决办法基本上是:

  • try to detect the gcc builtins with the feature macro
  • 尝试使用特性宏来检测gcc内置项
  • if not available just implement something like cmpxch with __asm__ for the other architectures (ARM is a bit more complicated than that). Just do that for one possible size, e.g sizeof(int).
  • 如果没有可用,那么只需为其他体系结构实现cmpxch和__asm__ (ARM要复杂一些)。只要对一个可能的尺寸e这样做。g sizeof(int)。
  • implement all other functionality on top of that one or two primitives with inline functions
  • 使用内联函数在一个或两个原语之上实现所有其他功能

#6


1  

There is a patch for GCC here to support ARM atomic operations. WIll not help you on Intel, but you could examine the code - there is recent kernel support for older ARM architectures, and newer ones have the instructions built in, so you should be able to build something that works.

这里有一个用于支持ARM原子操作的补丁。在Intel上不会对您有所帮助,但是您可以检查代码——最近有对旧的ARM架构的内核支持,而更新的架构有内置的指令,因此您应该能够构建起能够工作的东西。

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00050.html

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00050.html

#7


1  

__sync* certainly is (and has been) supported by the Intel compiler, because GCC adopted these build-ins from there. Read the first paragraph on this page. Also see "Intel® C++ Compiler for Linux* Intrinsics Reference", page 198. It's from 2006 and describes exactly those built-ins.

__sync*当然是由Intel编译器支持的(并且已经得到了),因为GCC从那里采用了这些内置函数。读这一页的第一段。参阅“英特尔®c++编译器为Linux * intrinsic参考”,198页。这是2006年的作品,描述的正是这些建筑。

Regarding ARM support, for older ARM CPUs: it cannot be done entirely in userspace, but it can be done in kernelspace (by disabling interrupts during the operation), and I think I read somewhere that it is supported for quite a while now.

关于ARM支持,对于旧的ARM cpu:它不能完全在用户空间中完成,但是它可以在内核空间中完成(通过在操作期间禁用中断),我想我在某个地方读到过,它现在已经被支持了很长一段时间。

According to this PHP bug, dated 2011-10-08, __sync_* will only fail on

根据这个日期为2011-10-08的PHP bug, __sync_*只能失败

  • PA-RISC with anything other than Linux
  • PA-RISC除了Linux之外还有其他任何东西。
  • SPARCv7 and lower
  • SPARCv7和更低的
  • ARM with GCC < 4.3
  • 使用GCC < 4.3
  • ARMv5 and lower with anything other than Linux
  • ARMv5和更低的任何其他Linux
  • MIPS1
  • MIPS1

So with GCC > 4.3 (and 4.7 is the current one), you shouldn't have a problem with ARMv6 and newer. You shouldn't have no problem with ARMv5 either as long as compiling for Linux.

因此,使用GCC > 4.3(4.7是当前的版本),您不应该对ARMv6和更新版本有问题。您也不应该对ARMv5有任何问题,只要为Linux编译即可。

#8


0  

On Debian/Ubuntu recommend...

在Debian / Ubuntu推荐……

sudo apt-get install libatomic-ops-dev

sudo apt-get安装libatomic-ops-dev

examples: http://www.hpl.hp.com/research/linux/atomic_ops/example.php4

示例:http://www.hpl.hp.com/research/linux/atomic_ops/example.php4

GCC & ICC compatible.

GCC和ICC兼容。

compared to Intel Thread Building Blocks (TBB), using atomic< T >, libatomic-ops-dev is over twice as fast! (Intel compiler)

与Intel线程构建块(TBB)相比,使用atomic< T >, libatomic-op -dev快了一倍多!(英特尔编译器)

Testing on Ubuntu i7 producer-consumer threads piping 10 million ints down a ring buffer connection in 0.5secs as opposed to 1.2secs for TBB

在ubuntui7的产品-消费者线程上进行测试,在0.5secs中减少了1000万的环形缓冲连接,而TBB则是1.2秒。

And easy to use e.g.

并且易于使用。

volatile AO_t head;

挥发性AO_t头;

AO_fetch_and_add1(&head);

AO_fetch_and_add1(担任);

#9


0  

See: kernel_user_helpers.txt or entry-arm.c and look for __kuser_cmpxchg. As seen in comments of other ARM Linux versions,

看到:kernel_user_helpers。txt或entry-arm。寻找__kuser_cmpxchg。正如在ARM Linux其他版本的评论中看到的,

kuser_cmpxchg

Location:       0xffff0fc0

Reference prototype:

  int __kuser_cmpxchg(int32_t oldval, int32_t newval, volatile int32_t *ptr);

Input:

  r0 = oldval
  r1 = newval
  r2 = ptr
  lr = return address

Output:

  r0 = success code (zero or non-zero)
  C flag = set if r0 == 0, clear if r0 != 0

Clobbered registers:

  r3, ip, flags

Definition:

  Atomically store newval in *ptr only if *ptr is equal to oldval.
  Return zero if *ptr was changed or non-zero if no exchange happened.
  The C flag is also set if *ptr was changed to allow for assembly
  optimization in the calling code.

Usage example:
 typedef int (__kuser_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
 #define __kuser_cmpxchg (*(__kuser_cmpxchg_t *)0xffff0fc0)

 int atomic_add(volatile int *ptr, int val)
 {
        int old, new;

        do {
                old = *ptr;
                new = old + val;
        } while(__kuser_cmpxchg(old, new, ptr));

        return new;
}

Notes:

注:

  • This routine already includes memory barriers as needed.
  • 这个例程已经包含了必要的内存障碍。
  • Valid only if __kuser_helper_version >= 2 (from kernel version 2.6.12).
  • 只有在__kuser_helper_version >= 2时有效(来自内核版本2.6.12)。

This is for use with Linux with ARMv3 using the swp primitive. You must have a very ancient ARM not to support this. Only a data abort or interrupt can cause the spinning to fail, so the kernel monitors for this address ~0xffff0fc0 and performs a user space PC fix-up when either a data abort or an interrupt occurs. All user-space libraries that support ARMv5 and lower will use this facility.

这是用于使用swp原语的Linux和ARMv3。你必须有一个非常古老的手臂来支撑这个。只有数据中止或中断会导致旋转失败,因此,当数据中止或中断发生时,内核监控器对这个地址~0xffff0fc0进行监视,并执行用户空间PC修复。所有支持ARMv5和lower的用户空间库都将使用这个工具。

For instance, QtConcurrent uses this.

例如,QtConcurrent会使用它。

#1


19  

Projects are using this:

项目使用的是这样的:

http://packages.debian.org/source/sid/libatomic-ops

http://packages.debian.org/source/sid/libatomic-ops

If you want simple operations such as CAS, can't you just just use the arch-specific implementations out of the kernel, and do arch checks in user-space with autotools/cmake? As far as licensing goes, although the kernel is GPL, I think it's arguable that the inline assembly for these operations is provided by Intel/AMD, not that the kernel has a license on them. They just happen to be in an easily accessible form in the kernel source.

如果您想要简单的操作,比如CAS,您就不能仅仅使用内核中特定的实现,并在用户空间中使用autotools/cmake进行arch检查吗?至于许可,尽管内核是GPL,但我认为这些操作的内联程序集是由Intel/AMD提供的,而不是内核有许可证。它们只是在内核源代码中以一种容易访问的形式出现。

#2


12  

Recent standards (from 2011) of C & C++ now specify atomic operations:

C & c++最近的标准(从2011年开始)现在明确了原子操作:

Regardless, your platform or compiler may not support these newer headers & features.

无论如何,您的平台或编译器可能不支持这些更新的头和特性。

#3


3  

Darn. I was going to suggest the GCC primitives, then you said they were off limits. :-)

该死的。我想建议使用GCC原语,然后你说它们是禁止的。:-)

In that case, I would do an #ifdef for each architecture/compiler combination you care about and code up the inline asm. And maybe check for __GNUC__ or some similar macro and use the GCC primitives if they are available, because it feels so much more right to use those. :-)

在这种情况下,我将为您关心的每个体系结构/编译器组合执行#ifdef,并对内联asm进行编码。如果可以,可以检查__GNUC__或类似的宏,并使用GCC原语,因为使用它们感觉更合适。:-)

You are going to have a lot of duplication and it might be difficult to verify correctness, but this seems to be the way a lot of projects do this, and I've had good results with it.

你将会有很多重复,可能很难验证正确性,但这似乎是很多项目的做法,我已经有了很好的结果。

Some gotchas that have bit me in the past: when using GCC, don't forget "asm volatile" and clobbers for "memory" and "cc", etc.

有些问题在过去曾困扰过我:使用GCC时,不要忘记“asm volatile”和“内存”和“cc”等等。

#4


1  

Boost, which has a non intrusive license, and other frameworks already offer portable atomic counters -- as long as they are supported on the target platform.

Boost具有非入侵许可,其他框架已经提供了可移植的原子计数器——只要目标平台支持这些计数器。

Third party libraries are good for us. And if for strange reasons your company forbid you from using them, you can still have a look at how they proceed (as long as the licence permit it for your use) to implement what your are looking for.

第三方图书馆对我们有好处。如果出于奇怪的原因,你的公司禁止你使用它们,你仍然可以看看它们是如何执行的(只要许可证允许你使用它们)来实现你正在寻找的东西。

#5


1  

I recently did an implementation of such a thing and I was confronted to the same difficulties as you are. My solution was basically the following:

我最近做了一件这样的事情,我遇到了和你一样的困难。我的解决办法基本上是:

  • try to detect the gcc builtins with the feature macro
  • 尝试使用特性宏来检测gcc内置项
  • if not available just implement something like cmpxch with __asm__ for the other architectures (ARM is a bit more complicated than that). Just do that for one possible size, e.g sizeof(int).
  • 如果没有可用,那么只需为其他体系结构实现cmpxch和__asm__ (ARM要复杂一些)。只要对一个可能的尺寸e这样做。g sizeof(int)。
  • implement all other functionality on top of that one or two primitives with inline functions
  • 使用内联函数在一个或两个原语之上实现所有其他功能

#6


1  

There is a patch for GCC here to support ARM atomic operations. WIll not help you on Intel, but you could examine the code - there is recent kernel support for older ARM architectures, and newer ones have the instructions built in, so you should be able to build something that works.

这里有一个用于支持ARM原子操作的补丁。在Intel上不会对您有所帮助,但是您可以检查代码——最近有对旧的ARM架构的内核支持,而更新的架构有内置的指令,因此您应该能够构建起能够工作的东西。

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00050.html

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00050.html

#7


1  

__sync* certainly is (and has been) supported by the Intel compiler, because GCC adopted these build-ins from there. Read the first paragraph on this page. Also see "Intel® C++ Compiler for Linux* Intrinsics Reference", page 198. It's from 2006 and describes exactly those built-ins.

__sync*当然是由Intel编译器支持的(并且已经得到了),因为GCC从那里采用了这些内置函数。读这一页的第一段。参阅“英特尔®c++编译器为Linux * intrinsic参考”,198页。这是2006年的作品,描述的正是这些建筑。

Regarding ARM support, for older ARM CPUs: it cannot be done entirely in userspace, but it can be done in kernelspace (by disabling interrupts during the operation), and I think I read somewhere that it is supported for quite a while now.

关于ARM支持,对于旧的ARM cpu:它不能完全在用户空间中完成,但是它可以在内核空间中完成(通过在操作期间禁用中断),我想我在某个地方读到过,它现在已经被支持了很长一段时间。

According to this PHP bug, dated 2011-10-08, __sync_* will only fail on

根据这个日期为2011-10-08的PHP bug, __sync_*只能失败

  • PA-RISC with anything other than Linux
  • PA-RISC除了Linux之外还有其他任何东西。
  • SPARCv7 and lower
  • SPARCv7和更低的
  • ARM with GCC < 4.3
  • 使用GCC < 4.3
  • ARMv5 and lower with anything other than Linux
  • ARMv5和更低的任何其他Linux
  • MIPS1
  • MIPS1

So with GCC > 4.3 (and 4.7 is the current one), you shouldn't have a problem with ARMv6 and newer. You shouldn't have no problem with ARMv5 either as long as compiling for Linux.

因此,使用GCC > 4.3(4.7是当前的版本),您不应该对ARMv6和更新版本有问题。您也不应该对ARMv5有任何问题,只要为Linux编译即可。

#8


0  

On Debian/Ubuntu recommend...

在Debian / Ubuntu推荐……

sudo apt-get install libatomic-ops-dev

sudo apt-get安装libatomic-ops-dev

examples: http://www.hpl.hp.com/research/linux/atomic_ops/example.php4

示例:http://www.hpl.hp.com/research/linux/atomic_ops/example.php4

GCC & ICC compatible.

GCC和ICC兼容。

compared to Intel Thread Building Blocks (TBB), using atomic< T >, libatomic-ops-dev is over twice as fast! (Intel compiler)

与Intel线程构建块(TBB)相比,使用atomic< T >, libatomic-op -dev快了一倍多!(英特尔编译器)

Testing on Ubuntu i7 producer-consumer threads piping 10 million ints down a ring buffer connection in 0.5secs as opposed to 1.2secs for TBB

在ubuntui7的产品-消费者线程上进行测试,在0.5secs中减少了1000万的环形缓冲连接,而TBB则是1.2秒。

And easy to use e.g.

并且易于使用。

volatile AO_t head;

挥发性AO_t头;

AO_fetch_and_add1(&head);

AO_fetch_and_add1(担任);

#9


0  

See: kernel_user_helpers.txt or entry-arm.c and look for __kuser_cmpxchg. As seen in comments of other ARM Linux versions,

看到:kernel_user_helpers。txt或entry-arm。寻找__kuser_cmpxchg。正如在ARM Linux其他版本的评论中看到的,

kuser_cmpxchg

Location:       0xffff0fc0

Reference prototype:

  int __kuser_cmpxchg(int32_t oldval, int32_t newval, volatile int32_t *ptr);

Input:

  r0 = oldval
  r1 = newval
  r2 = ptr
  lr = return address

Output:

  r0 = success code (zero or non-zero)
  C flag = set if r0 == 0, clear if r0 != 0

Clobbered registers:

  r3, ip, flags

Definition:

  Atomically store newval in *ptr only if *ptr is equal to oldval.
  Return zero if *ptr was changed or non-zero if no exchange happened.
  The C flag is also set if *ptr was changed to allow for assembly
  optimization in the calling code.

Usage example:
 typedef int (__kuser_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
 #define __kuser_cmpxchg (*(__kuser_cmpxchg_t *)0xffff0fc0)

 int atomic_add(volatile int *ptr, int val)
 {
        int old, new;

        do {
                old = *ptr;
                new = old + val;
        } while(__kuser_cmpxchg(old, new, ptr));

        return new;
}

Notes:

注:

  • This routine already includes memory barriers as needed.
  • 这个例程已经包含了必要的内存障碍。
  • Valid only if __kuser_helper_version >= 2 (from kernel version 2.6.12).
  • 只有在__kuser_helper_version >= 2时有效(来自内核版本2.6.12)。

This is for use with Linux with ARMv3 using the swp primitive. You must have a very ancient ARM not to support this. Only a data abort or interrupt can cause the spinning to fail, so the kernel monitors for this address ~0xffff0fc0 and performs a user space PC fix-up when either a data abort or an interrupt occurs. All user-space libraries that support ARMv5 and lower will use this facility.

这是用于使用swp原语的Linux和ARMv3。你必须有一个非常古老的手臂来支撑这个。只有数据中止或中断会导致旋转失败,因此,当数据中止或中断发生时,内核监控器对这个地址~0xffff0fc0进行监视,并执行用户空间PC修复。所有支持ARMv5和lower的用户空间库都将使用这个工具。

For instance, QtConcurrent uses this.

例如,QtConcurrent会使用它。