如何使用SSE4.2和AVX指令编译Tensorflow ?

This is the message received from running a script to check if Tensorflow is working:

这是从运行一个脚本来检查Tensorflow是否工作的消息:

I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

I noticed that it has mentioned SSE4.2 and AVX,

我注意到它提到了SSE4.2和AVX，

1) What are SSE4.2 and AVX?

1)SSE4.2和AVX是什么?

2) How do these SSE4.2 and AVX improve CPU computations for Tensorflow tasks.

2)这些SSE4.2和AVX如何改进对Tensorflow任务的CPU计算。

3) How to make Tensorflow compile using the two libraries?

3)如何使用两个库进行Tensorflow编译?

10 个解决方案

#1

115

I just ran into this same problem, it seems like Yaroslav Bulatov's suggestion doesn't cover SSE4.2 support, adding --copt=-msse4.2 would suffice. In the end, I successfully built with

我刚刚遇到了同样的问题，雅罗斯拉夫·布拉托夫的建议并没有涵盖SSE4.2的支持，补充——copt=-msse4.2就足够了。最后，我成功地建立了。

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

without getting any warning or errors.

没有任何警告或错误。

Probably the best choice for any system is:

对任何系统来说，最好的选择可能是:

bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

(Update: the build scripts may be eating -march=native, possibly because it contains an =.)

(更新:构建脚本可能是eat -march=native，可能是因为它包含一个=。)

-mfpmath=both only works with gcc, not clang. -mfpmath=sse is probably just as good, if not better, and is the default for x86-64. 32-bit builds default to -mfpmath=387, so changing that will help for 32-bit. (But if you want high-performance for number crunching, you should build 64-bit binaries.)

-mfpmath=两者都只适用于gcc，而不是clang。-mfpmath=sse可能同样好，如果不是更好的话，是x86-64的默认值。32位构建默认为-mfpmath=387，因此更改将有助于32位。(但如果你想要高性能的数字运算，你应该构建64位二进制文件。)

I'm not sure what TensorFlow's default for -O2 or -O3 is. gcc -O3 enables full optimization including auto-vectorization, but that sometimes can make code slower.

我不确定TensorFlow对-O2或-O3的默认值是多少。gcc -O3实现了完整的优化，包括自动向量化，但有时会使代码变得更慢。

What this does: --copt for bazel build passes an option directly to gcc for compiling C and C++ files (but not linking, so you need a different option for cross-file link-time-optimization)

这样做的目的是:—对bazel构建的copt直接将一个选项传递给gcc，用于编译C和c++文件(但不链接，因此您需要一个不同的跨文件链接时间优化选项)

x86-64 gcc defaults to using only SSE2 or older SIMD instructions, so you can run the binaries on any x86-64 system. (See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html). That's not what you want. You want to make a binary that takes advantage of all the instructions your CPU can run, because you're only running this binary on the system where you built it.

x86-64 gcc默认只使用SSE2或更老的SIMD指令，因此您可以在任何x86-64系统上运行二进制文件。(见https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html)。那不是你想要的。你想要利用你的CPU可以运行的所有指令来创建一个二进制文件，因为你只在你构建它的系统上运行这个二进制文件。

-march=native enables all the options your CPU supports, so it makes -mavx512f -mavx2 -mavx -mfma -msse4.2 redundant. (Also, -mavx2 already enables -mavx and -msse4.2, so Yaroslav's command should have been fine). Also if you're using a CPU that doesn't support one of these options (like FMA), using -mfma would make a binary that faults with illegal instructions.

本机支持CPU支持的所有选项，因此它使-mavx512f -mavx2 -mavx -mfma -msse4.2冗余。(同时，-mavx2已经启用-mavx和-msse4.2，所以Yaroslav的命令应该没问题)。另外，如果您使用的CPU不支持这些选项之一(比如FMA)，那么使用-mfma将会使用非法的指令来生成一个二进制错误。

TensorFlow's ./configure defaults to enabling -march=native, so using that should avoid needing to specify compiler options manually.

TensorFlow的./配置默认值以启用-march=本机，因此使用它应该避免手工指定编译器选项。

-march=native enables -mtune=native, so it optimizes for your CPU for things like which sequence of AVX instructions is best for unaligned loads.

本机支持-mtune=本机，所以它为您的CPU优化，例如，AVX指令的顺序对于未对齐的负载是最好的。

This all applies to gcc, clang, or ICC. (For ICC, you can use -xHOST instead of -march=native.)

这一切都适用于gcc、clang或ICC。(对于ICC，您可以使用-xHOST而不是-march=本机。)

#2

Let's start with the explanation of why do you see these warnings in the first place.

我们先来解释一下为什么你会首先看到这些警告。

Most probably you have not installed TF from source and instead of it used something like pip install tensorflow. That means that you installed pre-built (by someone else) binaries which were not optimized for your architecture. And these warnings tell you exactly this: something is available on your architecture, but it will not be used because the binary was not compiled with it. Here is the part from documentation.

很可能您没有从源代码中安装TF，而不是使用pip安装tensorflow之类的东西。这意味着您安装了预构建(由其他人)的二进制文件，而这些二进制文件并没有对您的体系结构进行优化。这些警告告诉您:在您的体系结构中有一些可用的东西，但是它不会被使用，因为二进制文件没有被编译。这是文档的一部分。

TensorFlow checks on startup whether it has been compiled with the optimizations available on the CPU. If the optimizations are not included, TensorFlow will emit warnings, e.g. AVX, AVX2, and FMA instructions not included.

在启动时，TensorFlow检查是否已经编译了CPU上可用的优化。如果不包括优化，那么TensorFlow将会发出警告，例如AVX、AVX2和FMA指令不包括在内。

Good thing is that most probably you just want to learn/experiment with TF so everything will work properly and you should not worry about it

很好，很可能你只是想和TF学习/实验，所以一切都能正常工作，你不应该担心。

What are SSE4.2 and AVX?

什么是SSE4.2和AVX?

Wikipedia has a good explanation about SSE4.2 and AVX. This knowledge is not required to be good at machine-learning. You may think about them as a set of some additional instructions for a computer to use multiple data points against a single instruction to perform operations which may be naturally parallelized (for example adding two arrays).

*对SSE4.2和AVX有很好的解释。这种知识不需要擅长机器学习。您可以把它们看作是一组额外的指令，让计算机在单个指令上使用多个数据点来执行可能是自然并行的操作(例如添加两个数组)。

Both SSE and AVX are implementation of an abstract idea of SIMD (Single instruction, multiple data), which is

SSE和AVX都实现了SIMD(单指令、多数据)的抽象概念。

a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Thus, such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but only a single process (instruction) at a given moment

弗林分类法中的一类并行计算机。它描述了具有多个处理元素的计算机，它们同时对多个数据点执行相同的操作。因此，这样的机器可以利用数据级别的并行性，而不是并发性:有同时的(并行的)计算，但是在给定的时刻只有一个过程(指令)。

This is enough to answer your next question.

这就足够回答下一个问题了。

How do these SSE4.2 and AVX improve CPU computations for TF tasks

这些SSE4.2和AVX如何改进TF任务的CPU计算?

They allow a more efficient computation of various vector (matrix/tensor) operations. You can read more in these slides

它们允许更有效地计算各种向量(矩阵/张量)运算。你可以在这些幻灯片中看到更多。

How to make Tensorflow compile using the two libraries?

如何使用这两个库进行Tensorflow编译?

You need to have a binary which was compiled to take advantage of these instructions. The easiest way is to compile it yourself. As Mike and Yaroslav suggested, you can use the following bazel command

你需要有一个二进制文件来利用这些指令。最简单的方法是自己编译。正如Mike和Yaroslav所建议的，您可以使用以下的bazel命令。

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

bazel构建-c选项—copt=-mavx—copt=-mavx2—copt=-mfma—copt=-mfpmath= -msse4.2 -config=cuda -k //tensorflow/tools/pip_package:build_pip_package。

#3

Let me answer your 3rd question first:

让我先回答你的第三个问题:

If you want to run a self-compiled version within a conda-env, you can. These are the general instructions I run to get tensorflow to install on my system with additional instructions. Note: This build was for an AMD A10-7850 build (check your CPU for what instructions are supported...it may differ) running Ubuntu 16.04 LTS. I use Python 3.5 within my conda-env. Credit goes to the tensorflow source install page and the answers provided above.

如果您想在conda-env中运行一个自编译版本，您可以。这些是我运行的一般指令，以便让tensorflow在我的系统上安装附加的指令。注意:此构建是针对AMD A10-7850构建(检查您的CPU，以支持什么指令)。运行Ubuntu 16.04 LTS可能不同。我在conda-env中使用Python 3.5。信用进入到tensorflow源代码安装页面和上面提供的答案。

git clone https://github.com/tensorflow/tensorflow 
# Install Bazel
# https://bazel.build/versions/master/docs/install.html
sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
# Create your virtual env with conda.
source activate YOUR_ENV
pip install six numpy wheel, packaging, appdir
# Follow the configure instructions at:
# https://www.tensorflow.org/install/install_sources
# Build your build like below. Note: Check what instructions your CPU 
# support. Also. If resources are limited consider adding the following 
# tag --local_resources 2048,.5,1.0 . This will limit how much ram many
# local resources are used but will increase time to compile.
bazel build -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2  -k //tensorflow/tools/pip_package:build_pip_package
# Create the wheel like so:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# Inside your conda env:
pip install /tmp/tensorflow_pkg/NAME_OF_WHEEL.whl
# Then install the rest of your stack
pip install keras jupyter etc. etc.

As to your 2nd question:

关于你的第二个问题:

A self-compiled version with optimizations are well worth the effort in my opinion. On my particular setup, calculations that used to take 560-600 seconds now only take about 300 seconds! Although the exact numbers will vary, I think you can expect about a 35-50% speed increase in general on your particular setup.

在我看来，使用优化的自编译版本是非常值得的。在我的特殊设置中，曾经花费560-600秒的计算现在只需要300秒!虽然确切的数字会有所不同，但我认为您可以期望在您的特定设置中，大约有35-50%的速度增长。

Lastly your 1st question:

最后你的问题1:

A lot of the answers have been provided above already. To summarize: AVX, SSE4.1, SSE4.2, MFA are different kinds of extended instruction sets on X86 CPUs. Many contain optimized instructions for processing matrix or vector operations.

上面已经提供了很多答案。综上所述:AVX, SSE4.1, SSE4.2, MFA是X86 cpu上不同类型的扩展指令集。许多包含了对处理矩阵或向量操作的优化指令。

I will highlight my own misconception to hopefully save you some time: It's not that SSE4.2 is a newer version of instructions superseding SSE4.1. SSE4 = SSE4.1 (a set of 47 instructions) + SSE4.2 (a set of 7 instructions).

我将强调我自己的误解，希望能节省你一段时间:不是SSE4.2是新版本的指令取代了SSE4.1。SSE4 = SSE4.1(一套47条指令)+ SSE4.2(一套7条指令)。

In the context of tensorflow compilation, if you computer supports AVX2 and AVX, and SSE4.1 and SSE4.2, you should put those optimizing flags in for all. Don't do like I did and just go with SSE4.2 thinking that it's newer and should superseed SSE4.1. That's clearly WRONG! I had to recompile because of that which cost me a good 40 minutes.

在tensorflow编译的上下文中，如果您的计算机支持AVX2和AVX，以及SSE4.1和SSE4.2，那么您应该将那些优化的标志放进去。不要像我那样去做，只是和SSE4.2一起去想它是更新的，应该是SSE4.1。这显然是错误的!我不得不重新编译，因为那花费了我40分钟。

#4

These are SIMD vector processing instruction sets.

这些是SIMD向量处理指令集。

Using vector instructions is faster for many tasks; machine learning is such a task.

使用向量指令对许多任务来说更快;机器学习就是这样一个任务。

Quoting the tensorflow installation docs:

引用tensorflow安装文档:

To be compatible with as wide a range of machines as possible, TensorFlow defaults to only using SSE4.1 SIMD instructions on x86 machines. Most modern PCs and Macs support more advanced instructions, so if you're building a binary that you'll only be running on your own machine, you can enable these by using --copt=-march=native in your bazel build command.

为了与尽可能广泛的机器兼容，TensorFlow默认只在x86机器上使用SSE4.1 SIMD指令。大多数现代pc和mac都支持更高级的指令，所以如果你正在构建一个二进制文件，你只需要在自己的机器上运行，你就可以通过使用-copt=-march=native在你的bazel构建命令中启用这些指令。

#5

Thanks to all this replies + some trial and errors, I managed to install it on a Mac with clang. So just sharing my solution in case it is useful to someone.

由于所有这些回复+一些尝试和错误，我设法安装了它在一个Mac和铿锵。所以分享我的解决方案，以防它对某人有用。

Follow the instructions on Documentation - Installing TensorFlow from Sources

遵循文档的说明——从源代码中安装TensorFlow。
When prompted for

当提示输入

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]

请指定当bazel选项“-config=opt”指定时使用的优化标志(默认为-march=本机)

then copy-paste this string:

然后复制粘贴这个字符串:

-mavx -mavx2 -mfma -msse4.2

(The default option caused errors, so did some of the other flags. I got no errors with the above flags. BTW I replied n to all the other questions)

(默认选项会导致错误，其他一些标记也会出错。我在上面的旗帜上没有错误。我回答了所有其他问题)

After installing, I verify a ~2x to 2.5x speedup when training deep models with respect to another installation based on the default wheels - Installing TensorFlow on macOS

在安装之后，我验证了一个~2x到2.5倍的加速，这是在对基于默认车轮的另一个安装进行培训的深度模型——在macOS上安装TensorFlow。

Hope it helps

希望它能帮助

#6

I have recently installed it from source and bellow are all the steps needed to install it from source with the mentioned instructions available.

我最近已经从源代码和bellow安装了它，这些步骤都是从源代码中安装的。

Other answers already describe why those messages are shown. My answer gives a step-by-step on how to isnstall, which may help people struglling on the actual installation as I did.

其他的答案已经描述了为什么会显示这些信息。我的答案一步一步地告诉大家如何使用isnstall，它可以帮助人们像我一样在实际的安装过程中表现得很好。

Install Bazel
安装巴泽尔

Download it from one of their available releases, for example 0.5.2. Extract it, go into the directory and configure it: bash ./compile.sh. Copy the executable to /usr/local/bin: sudo cp ./output/bazel /usr/local/bin

从它们的一个可用版本中下载它，例如0.5.2。提取它，进入目录并配置它:bash。/compile.sh。将可执行文件复制到/usr/local/bin: sudo cp ./output/bazel /usr/local/bin。

Install Tensorflow
安装Tensorflow

Clone tensorflow: git clone https://github.com/tensorflow/tensorflow.git Go to the cloned directory to configure it: ./configure

克隆tensorflow: git克隆https://github.com/tensorflow/tensorflow.git到克隆目录来配置它:./configure。

It will prompt you with several questions, bellow I have suggested the response to each of the questions, you can, of course, choose your own responses upon as you prefer:

它会提示你几个问题，我已经建议你回答每一个问题，当然，你可以根据自己的喜好选择你自己的答案:

Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with MKL support? [y/N] y
MKL support will be enabled for TensorFlow
Do you wish to download MKL LIB from the web? [Y/n] Y
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 
Do you wish to use jemalloc as the malloc implementation? [Y/n] n
jemalloc disabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] N
No XLA JIT support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N] N
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N] N
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] N
No CUDA support will be enabled for TensorFlow

The pip package. To build it you have to describe which instructions you want (you know, those Tensorflow informed you are missing).
皮普包。要构建它，您必须描述您想要的指令(您知道，那些Tensorflow通知您丢失了)。

Build pip script: bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 -k //tensorflow/tools/pip_package:build_pip_package

构建pip脚本:bazel构建-copt -- -mavx—copt=-mavx2—copt=-mfma—copt=-mfpmath= -msse4.1 -copt=-msse4.2 -k //tensorflow/tools/pip_package:build_pip_package。

Build pip package: bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

构建pip包:bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg。

Install Tensorflow pip package you just built: sudo pip install /tmp/tensorflow_pkg/tensorflow-1.2.1-cp27-cp27mu-linux_x86_64.whl

安装您刚构建的Tensorflow pip包:sudo pip安装/tmp/tensorflow_pkg/ Tensorflow -1.2.1-cp27-cp27mu-linux_x86_64.whl。

Now next time you start up Tensorflow it will not complain anymore about missing instructions.

下次你开始紧张的时候，你就不会再抱怨缺少指令了。

#7

I compiled a small Bash script for Mac (easily can be ported to Linux) to retrieve all CPU features and apply some of them to build TF. Im on TF master and use kinda often (couple times in a month).

我为Mac编译了一个小的Bash脚本(很容易就可以移植到Linux上)来检索所有CPU特性，并应用其中一些来构建TF。我经常在TF master上使用，而且经常使用(一个月几次)。

https://gist.github.com/venik/9ba962c8b301b0e21f99884cbd35082f

#8

This is the simplest method. Only one step.

这是最简单的方法。只有一个步骤。

It has significant impact on speed. In my case, time taken for a training step almost halved.

它对速度有显著的影响。在我的案例中，接受培训的时间几乎减半。

Refer custom builds of tensorflow

请参考自定义的tensorflow构建。

#9

When building TensorFlow from source, you'll run the configure script. One of the questions that the configure script asks is as follows:

当从源代码构建TensorFlow时，您将运行configure脚本。configure脚本询问的一个问题是:

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]

The configure script will attach the flag(s) you specify to the bazel command that builds the TensorFlow pip package. Broadly speaking, you can respond to this prompt in one of two ways:

configure脚本会将您指定的标志附加到构建TensorFlow pip包的bazel命令中。从广义上讲，你可以通过以下两种方式来回应这个提示:

If you are building TensorFlow on the same type of CPU type as the one on which you'll run TensorFlow, then you should accept the default (-march=native). This option will optimize the generated code for your machine's CPU type.
如果您在同一类型的CPU类型上构建了TensorFlow，而您将运行TensorFlow，那么您应该接受缺省值(-march=native)。此选项将优化为您的机器的CPU类型生成的代码。
If you are building TensorFlow on one CPU type but will run TensorFlow on a different CPU type, then consider supplying a more specific optimization flag as described in the gcc documentation.
如果您在一个CPU类型上构建了TensorFlow，但是会在不同的CPU类型上运行TensorFlow，那么考虑提供一个更具体的优化标志，如gcc文档中所描述的那样。

After configuring TensorFlow as described in the preceding bulleted list, you should be able to build TensorFlow fully optimized for the target CPU just by adding the --config=opt flag to any bazel command you are running.

在配置了前面的项目列表中所描述的TensorFlow之后，您应该能够通过添加-config=opt标志来为目标CPU构建完全优化的TensorFlow，您正在运行的任何bazel命令。

#10

To hide those warnings, you could do this before your actual code.

为了隐藏这些警告，您可以在您的实际代码之前执行这些警告。

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf

#1

115