参考
- https://tensorflow.google.cn/install/install_linux
- http://nvidia.com/cuda
- http://developer.nvidia.com/cudnn
说明
- 前提是机器上必须有Nvidia显卡,不太老就好(古董也没必要玩这个了吧,费电),在Nvidia官网可以查到显卡支持情况 https://developer.nvidia.com/cuda-gpus
- 安装过程中的命令都需要root身份,请使用su root切换或者每次加 sudo,编译运行测试代码使用普通用户就好
踩坑后的提示,怪我眼瞎坑自己,[手动抽脸表情]
- 必须按tensorflow 官网提示的版本安装 1.9 对应 CUDA 9.0,CUDA 9.0 要下载相应版本的cuDNN
- 如果喜欢折腾,建议使用没有重要数据的硬盘
- 安装包最好下载到其他电脑上,使用scp拷贝到安装机上,重装了几遍ubuntu,下一次包就2个G,作为联通40G所谓无线流量卡用户,想着还是蛋疼
下载主要安装文件
- CUDA® 工具包
#http://nvidia.com/cuda #我选的是16.04的run文件,其他的坑不敢踩了 cuda_9.0.176_384.81_linux.run
- cuDNN 深度神经网络(DNN)开发环境,需要网站注册
#http://developer.nvidia.com/cudnn libcudnn7-dev_7.1.4.18-1+cuda9.0_amd64.deb libcudnn7_7.1.4.18-1+cuda9.0_amd64.deb libcudnn7-doc_7.1.4.18-1+cuda9.0_amd64.deb
准备环境
看CUDA自带的驱动版本,这里是384.81,低于这个版本就要先卸载,>= 跳过
#建议run文件卸载,即你之前下载的Nvidia驱动run文件
chmod +x *.run
./NVIDIA-Linux-x86_64-384.59.run --uninstall
# 不建议采取这种,不知道为什么没尝试过 apt-get remove --purge nvidia*
禁用自带的nouveau驱动,如果你连Nvidia驱动都装过了,这一步也免了
vi /etc/modprobe.d/blacklist.conf
#加两行
blacklist nouveau
options nouveau modeset=0 #生效配置 update-initramfs -u #重启,后分辨率变低了,毕竟没有显卡驱动了 reboot #检查是否生效 lsmod | grep nouveau #如果屏幕没有输出则禁用nouveau成功
安装必要的编译环境否者自带网卡驱动安装不上
apt install gcc g++ make make-guile
针对CUDA 9.0,必须将GCC降级为gcc5,也是安装CUDA时发现的
apt install gcc-5 g++-5
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50
安装 CUDA® 工具包
一定要根据tensorflow版本安装对应版本的CUDA 1.9对应9.0,被自己眼瞎害的
chmod +x cuda_9.0.176_384.81_linux.run
sh ./cuda_9.0.176_384.81_linux.run
#会有说明,需要看的自己看,看了几页不想看/条款看不懂的 按q键
- 如果安装过程中提示失败,根据提示查看log排错
- 安装成功后的log
Do you accept the previously read EULA? accept/decline/quit: accept You are attempting to install on an unsupported configuration. Do you wish to continue? (y)es/(n)o [ default is no ]: y #这里384.81表示显卡驱动版本,如果本机安装的显卡驱动版本比它高就不需要安装 #选no主要是前面踩坑的时候安了CUDA9.2,呵呵 #正常应该是yes Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81? (y)es/(n)o/(q)uit: n Install the CUDA 9.0 Toolkit? (y)es/(n)o/(q)uit: y Enter Toolkit Location [ default is /usr/local/cuda-9.0 ]: Do you want to install a symbolic link at /usr/local/cuda? (y)es/(n)o/(q)uit: y Install the CUDA 9.0 Samples? (y)es/(n)o/(q)uit: y Enter CUDA Samples Location [ default is /root ]: Installing the CUDA Toolkit in /usr/local/cuda-9.0 ... Missing recommended library: libGLU.so Missing recommended library: libX11.so Missing recommended library: libXi.so Missing recommended library: libXmu.so Missing recommended library: libGL.so Installing the CUDA Samples in /root ... Copying samples to /root/NVIDIA_CUDA-9.0_Samples now... Finished copying samples. =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-9.0 Samples: Installed in /root, but missing recommended libraries Please make sure that - PATH includes /usr/local/cuda-9.0/bin - LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA. ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.0 functionality to work. To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file: sudo <CudaInstaller>.run -silent -driver Logfile is /tmp/cuda_install_7657.log /root/NVIDIA_CUDA-9.0_Samples
设置环境变量
vi /etc/ld.so.conf.d/cuda.conf
#写入两行
/usr/local/cuda/lib64
/usr/local/cuda/extras/CUPTI/lib64
vi /etc/profile
#加入两行
export CUDA_HOME=/usr/local/cuda/bin export PATH=$PATH:$CUDA_HOME
重启 reboot
测试安装情况
- 没有报错就表示安装成功
cd /root/NVIDIA_CUDA-9.0_Samples/samples/1_Utilities/deviceQuery
make
./deviceQuery
# Result = PASS 成功 cd ../bandwidthTest make ./bandwidthTest #Result = PASS 成功
cuDNN 安装
NVIDIA cuDNN is a GPU-accelerated library of primitives for deep neural networks.
#cuDNN v7.1.4 Runtime Library for Ubuntu16.04 (Deb)
dpkg -i libcudnn7_7.1.4.18-1+cuda9.0_amd64.deb
#cuDNN v7.1.4 Developer Library for Ubuntu16.04 (Deb) dpkg -i libcudnn7-dev_7.1.4.18-1+cuda9.0_amd64.deb #cuDNN v7.1.4 Code Samples and User Guide for Ubuntu16.04 (Deb) libcudnn7-doc_7.1.4.18-1+cuda9.0_amd64.deb # 锁定版本,免得自动更新破坏环境 apt-mark hold libcudnn7 libcudnn7-dev
测试
#Copy the cuDNN sample to a writable path.
$cp -r /usr/src/cudnn_samples_v7/ $HOME #Go to the writable path. $ cd $HOME/cudnn_samples_v7/mnistCUDNN #Compile the mnistCUDNN sample. $make clean && make #Run the mnistCUDNN sample. $ ./mnistCUDNN #If cuDNN is properly installed and running on your Linux system, you will see a message similar to the following: #Test passed!
安装 tensorflow-gpu 以python3为例
sudo apt-get install python3-pip python3-dev pip3 install tensorflow-gpu
测试安装
#测试代码,保存到比如test.py
import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() print(sess.run(hello)) #执行 python3 test.py #第一次有点慢 #没报错,有显卡信息,b'Hello, TensorFlow!',表示成功