富乎 · 地问

辗转探寻为富乎？《天问》无解向地问！

<<< 返回主页

Ubuntu 16.04 安装 C++ 版 TensorFlow 1.4.1

下载安装包

浏览器进入其GitHub发布页：https://github.com/tensorflow/tensorflow/releases
翻页，找到1.4.1版本，下载其压缩包，并放到合适的目录。本文下载的是tensorflow-1.4.1.zip，放到~/src目录。

编译及安装

安装依赖项

安装JDK 8（若已装则略过）

sudo apt-get install openjdk-8-jdk

安装bazel（至关重要！！！）

以下deb包安装方式不推荐，请勿使用，其内容仅保留用作反面教程！！ 简易的安装方式应该是下载并直接执行bazel-0.5.4-installer-linux-x86_64.sh。

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list

curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

sudo apt-get update

sudo apt-get install bazel

需要说明的是，如果编译器的功能、接口、语法等已趋于稳定，用以上方法是最方便的。然而现实并没那么美好，经本人亲测，这样安装得到的编译器用来编译低版本的TensorFlow，会报一大堆错。所以，必须安装对应版本的编译器才行（TensorFlow与bazel的版本对应关系见第6个参考链接）。

直接用apt来装低版本的0.5.4极有可能失败，因为有一大坨依赖关系没法满足，所以需要考虑直接下载其Shell脚本、deb包或者源码。尽管直接从官网下载可能会比较慢，这里还是贴出 bazel官网下载地址，推荐在早上下载。如果出现龟速，可直接用度娘来搜，一般度娘网盘、SDN、浪网盘之流还是能找到资源的，你缺的可能就是一个账号或几个积分……废话不多说，本人下的是bazel_0.5.4-linux-x86_64.deb包，放到并进入~/src目录，然后执行以下命令即可安装：

sudo dpkg -i bazel_0.5.4-linux-x86_64.deb

然而，不保证这种方法对所有人所有环境都行得通。如果行不通，请自行尝试Shell脚本和源码的安装方式，后面的参考链接里有详细的说明，这里不再展开。

如果想固定bazel的版本，防止apt自动更新，可以删除它的源：

sudo rm /etc/apt/sources.list.d/bazel.list

sudo apt-get update

安装Python依赖项

sudo apt-get install python-numpy python-dev python-pip python-wheel

安装CUDA和cuDNN（GPU版本必需，纯CPU版本也必需！）

可参考：Ubuntu 16.04 安装 CUDA 8.0 和 cuDNN 7

配置TensorFlow的编译条件

cd ~/src

unzip tensorflow-1.4.1.zip

cd tensorflow-1.4.1

# 执行配置脚本并按提示进行配置，以下示例仅供参考：

$ ./configure
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.15.2 installed.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: 
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: 
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: 
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: 
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL support? [y/N]: 
No OpenCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 


Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 7


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:/usr/lib/x86_64-linux-gnu              


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]


Do you want to use clang as CUDA compiler? [y/N]: 
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 


Do you wish to build TensorFlow with MPI support? [y/N]: 
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 


Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.
Configuration finished

编译TensorFlow

执行：

# 带GPU支持的版本
bazel build -c opt --config=cuda --verbose_failures //tensorflow:libtensorflow_cc.so

# 或：

# 纯CPU版本
bazel build -c opt --verbose_failures //tensorflow:libtensorflow_cc.so

注意最好要指定--verbose_failures，这样当出错时（极大几率会出错）会打印详细信息。

话没说完就报了个bazel版本“过低”的错误，如下：

ERROR: /home/foo/src/tensorflow-1.4.1/WORKSPACE:15:1: Traceback (most recent call last):
	File "/home/foo/src/tensorflow-1.4.1/WORKSPACE", line 15
		closure_repositories()
	File "/home/foo/.cache/bazel/_bazel_foo/8afcca331dcb8a9f02f9a9565832584e/external/io_bazel_rules_closure/closure/repositories.bzl", line 69, in closure_repositories
		_check_bazel_version("Closure Rules", "0.4.5")
	File "/home/foo/.cache/bazel/_bazel_foo/8afcca331dcb8a9f02f9a9565832584e/external/io_bazel_rules_closure/closure/repositories.bzl", line 172, in _check_bazel_version
		fail(("%s requires Bazel >=%s but was...)))
Closure Rules requires Bazel >=0.4.5 but was 0.15.2
ERROR: Error evaluating WORKSPACE file
Closure Rules requires Bazel >=0.4.5 but was 0.15.2
ERROR: Error evaluating WORKSPACE file
ERROR: /home/foo/src/tensorflow-1.4.1/WORKSPACE:41:1: Traceback (most recent call last):
	File "/home/foo/src/tensorflow-1.4.1/WORKSPACE", line 41
		tf_workspace()
	File "/home/foo/src/tensorflow-1.4.1/tensorflow/workspace.bzl", line 146, in tf_workspace
		check_version("0.5.4")
	File "/home/foo/src/tensorflow-1.4.1/tensorflow/workspace.bzl", line 56, in check_version
		fail("\nCurrent Bazel version is {}, ...))

Current Bazel version is 0.15.2, expected at least 0.5.4

事实上并非bazel版本过低，而是编译脚本的逻辑缺陷，即简单地进行纯数学的小数比较，导致得出0.5.4 > 0.15.2的结论。解决方法是打开上述的workspace.bzl文件，定位到报错的那一行（上面为第146行），将check_version("0.5.4")改为check_version("0.0.0")，再用同样的方法修改repositories.bzl文件，最后重新调用bazel进行编译即可。

如果出现如下标签（label）错误：

ERROR: /home/foo/.cache/bazel/_bazel_foo/8afcca331dcb8a9f02f9a9565832584e/external/local_config_sycl/sycl/BUILD:4:1: First argument of 'load' must be a label and start with either '//', ':', or '@'.
ERROR: /home/foo/.cache/bazel/_bazel_foo/8afcca331dcb8a9f02f9a9565832584e/external/local_config_sycl/sycl/BUILD:6:1: First argument of 'load' must be a label and start with either '//', ':', or '@'.
ERROR: /home/foo/.cache/bazel/_bazel_foo/8afcca331dcb8a9f02f9a9565832584e/external/local_config_sycl/sycl/BUILD:4:1: file 'platform' was not correctly loaded. Make sure the 'load' statement appears in the global scope in your file
ERROR: /home/foo/.cache/bazel/_bazel_foo/8afcca331dcb8a9f02f9a9565832584e/external/local_config_sycl/sycl/BUILD:6:1: file 'platform' was not correctly loaded. Make sure the 'load' statement appears in the global scope in your file

可在~/.cache/bazel目录下搜索名称带platform的bzl文件，如下：

$ find /home/foo/.cache/bazel -name "*platform*"
# 省略一部分输出
/home/foo/.cache/bazel/_bazel_foo/8afcca331dcb8a9f02f9a9565832584e/external/local_config_sycl/sycl/platform.bzl

找到后就打开以上BUILD文件，将出错的load("platform", ...)语句改为load("@local_config_sycl//sycl:platform.bzl", ...)。bazel语法详见第8个参考链接。

若出现某个依赖项有多个匹配结果时，如下：

ERROR: /home/foo/.cache/bazel/_bazel_foo/8afcca331dcb8a9f02f9a9565832584e/external/jpeg/BUILD:122:12: Illegal ambiguous match on configurable attribute "deps" in @jpeg//:jpeg:
@jpeg//:k8
@jpeg//:armeabi-v7a
Multiple matches are not allowed unless one is unambiguously more specialized.

可打开BUILD文件，定位到出错行附近，如下：

   deps = select({
       ":k8": [":simd_x86_64"],
       ":armeabi-v7a": [":simd_armv7a"],
       ":arm64-v8a": [":simd_armv8a"],
       "//conditions:default": [":simd_none"],
   }), 
)

将armeabi-v7a所在行删除。

如无意外，此时应进入真正的编译操作。视bazel的版本而定，如果不是0.5.4（若是用apt来安装，默认安装最新，即以上的0.15.2，指定版本号来安装则可能出现依赖条件不满足而安装不了），还有可能出幺蛾子，例如：

ERROR: /home/foo/.cache/bazel/_bazel_foo/8afcca331dcb8a9f02f9a9565832584e/external/jpeg/BUILD:40:1: C++ compilation of rule '@jpeg//:jpeg' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/foo/.cache/bazel/_bazel_foo/8afcca331dcb8a9f02f9a9565832584e/execroot/org_tensorflow && \
  exec env - \
    LD_LIBRARY_PATH=/home/foo/lib:/usr/local/lib:/usr/local/cuda/lib64 \
    PATH=/home/foo/git/lazy_script/details/private:/home/foo/git/lazy_script/details/shortcuts:/home/foo/git/lazy_script/details:/home/foo/bin:/home/foo/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/foo/linux/bin:/usr/local/cuda/bin \
    PWD=/proc/self/cwd \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -MD -MF bazel-out/host/bin/external/jpeg/_objs/jpeg/external/jpeg/jcapimin.pic.d -fPIC -iquote external/jpeg -iquote bazel-out/host/genfiles/external/jpeg -iquote external/bazel_tools -iquote bazel-out/host/genfiles/external/bazel_tools -g0 -O3 -w -D__ARM_NEON__ '-march=armv7-a' '-mfloat-abi=softfp' -fprefetch-loop-arrays -no-canonical-prefixes -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fno-canonical-system-headers -c external/jpeg/jcapimin.c -o bazel-out/host/bin/external/jpeg/_objs/jpeg/external/jpeg/jcapimin.pic.o)
gcc: error: unrecognized command line option '-mfloat-abi=softfp'

看crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc、-march=armv7-a、-mfloat-abi=softfp几个地方，觉得是交叉编译出了问题，但之前执行configure脚本时，明明使用了默认的-march=native选项，这就蛋疼了……于是，生平第一次怀疑编译器，而且还真怀疑对了。首先bazel在解析其.bzl文件时不应该报那么多错误，而且看起来是高版本与低版本严重不兼容；其次编译时的参数貌似没有完全按照之前configure的要求来做。种种现象只能归结为要么是bazel编译器的问题，要么是.bzl编译脚本的写法有问题，尽管编译器版本不是公开可用的1.0以上的版本，但作为一个大厂的出品搞出这么多蛋疼的问题，还是着实令人费解。后来和别人交流一下，发现人家同样是TensorFlow 1.4.1，编译的时候顺风顺水，再看他系统装的bazel版本，赫然是0.5.4！果断卸载原来的bazel，重新装上0.5.4的版本（见前面），最后编译的结果是一路绿灯（当然，一些编译警告还是有的，没有警告的代码是不存在的，这辈子都不可能看到，除非是Hello World）！

手动拷贝头文件和库文件

拷贝库文件：

sudo cp bazel-bin/tensorflow/libtensorflow_cc.so /usr/local/lib/
sudo cp bazel-bin/tensorflow/libtensorflow_framework.so /usr/local/lib/

下载缺失的依赖包并安装：

# 直接运行脚本即可下载依赖包

./tensorflow/contrib/makefile/download_dependencies.sh

# 但只需安装eigen

cd tensorflow/contrib/makefile/downloads/eigen
mkdir .build
cd .build
cmake ..
make
sudo make install
cd ~/src/tensorflow-1.4.1/

# 以及protobuf（可选）。如果你的系统已经装有，并且刚好是3.4.0，
# 恭喜你，这一步可以省略，否则仍然需要折腾一番。
# 还要注意的是，为了避免与系统已有的protobuf产生冲突，tensorflow
# 依赖的protobuf最好不要装在系统目录，以下操作就选择装在用户家目录。
# 当然，也可以选择其它目录，只需记住在写Makefile或CMakeLists.txt时
# 将其头文件和库文件目录指定好就行了。

cd tensorflow/contrib/makefile/downloads/protobuf
./autogen.sh
./configure --prefix=$HOME
make
make install
make clean
cd -

还需要说明的是，这些依赖包的缺失不影响tensorflow库的生成，但会影响其使用，因为在包含tensorflow库头文件时，不可避免要包含这些依赖包的头文件。

拷贝头文件：

sudo mkdir /usr/include/tensorflow
sudo cp -L -r bazel-genfiles /usr/include/tensorflow/
sudo rm -rf /usr/include/tensorflow/bazel-genfiles/external/local_config_cuda
sudo cp -r tensorflow /usr/include/tensorflow/
sudo cp -r third_party /usr/include/tensorflow/
find /usr/include/tensorflow/ -name "*.o" -o -name "*.cc" | xargs sudo rm
sudo rm -rf /usr/include/tensorflow/tensorflow/contrib/makefile/downloads/eigen
sudo rm -rf /usr/include/tensorflow/tensorflow/contrib/makefile/downloads/protobuf
sudo rm -rf /usr/include/tensorflow/tensorflow/contrib/makefile/downloads/gemmlowp
sudo rm -rf /usr/include/tensorflow/tensorflow/contrib/makefile/downloads/googletest

# 其实还可以清除更多与C++头文件无关的东西，但太费事，清除以上内容就可以释放很多空间了。
# 强迫症患者可以继续人肉找出其它多余的东西，这里不再详述。

嗯……以上操作是不是很操蛋、很反人类？

验证安装情况

需要自己写小程序来测，可以参考后面参考链接中一个叫zwx1995zwx的博客。本文前面“下载缺失依赖包”和“eigen安装”的内容就是参考这个网友的博客的，在此表示感谢！

另外，需要注意应用程序的Makefile至少需要添加以下几个头文件目录：

/usr/include/tensorflow
/usr/include/tensorflow/bazel-genfiles
/usr/include/tensorflow/tensorflow/contrib/makefile/downloads/nsync/public
/usr/local/include/eigen3

参考（部分链接可能需要翻墙）

https://docs.bazel.build/versions/master/install.html

https://docs.bazel.build/versions/master/install-ubuntu.html

https://docs.bazel.build/versions/master/install-compile-source.html

https://www.tensorflow.org/install/

https://www.tensorflow.org/install/install_linux#determine_which_tensorflow_to_install

https://www.tensorflow.org/install/install_sources

https://github.com/bazelbuild/bazel/issues/4834

https://blog.csdn.net/u013510838/article/details/80102438

https://blog.csdn.net/zwx1995zwx/article/details/79064064

https://blog.csdn.net/wzz18191171661/article/details/70153526

备注

本文针对的是C++版本的TensorFlow，另有Python版本（使用最广泛）和C版本（偏重于简洁性和一致性，安装最简单，但调用时不太方便），详见官方说明文档。
操作之前必须仔细阅读官方说明文档，所用的依赖库和编译工具的版本尽量与之一致，这样能踩少很多坑。如果还有坑，就需要分析报错信息，以及自行搜索官方文档未及之处的资料。