[Tensorflow] Practice - The Tensorflow Way

该系列主要是《Tensorflow 实战Google深度学习框架》阅读笔记；有了Cookbook的热身后，以这本书作为基础形成个人知识体系。

Ref: [Tensorflow] Cookbook - The Tensorflow Way

第一章，简介（略）

第二章，安装（仅记录个别要点）

Protocol buffer

Bazel, similar with Makefile for complile.

Install steps:

　　(1) Docker

　　(2) Tensorflow

Source code --> pip install package --> pip install.

第三章，入门

[Tensorflow] Practice - The Tensorflow Way

计算图

[Tensorflow] Practice - The Tensorflow Way

1. 定义计算

2. 执行计算

In [1]: import tensorflow as tf

In [2]: a = tf.constant([1.0, 2.0], name = "a")

In [3]: b = tf.constant([2.0, 3.0], name = "b")

In [4]: result = a+b

# 必须sess才能执行，这里只是定义

In [5]: result

Out[5]: <tf.Tensor 'add:0' shape=(2,) dtype=float32>

系统默认了一个计算图：

In [6]: print(a.graph is tf.get_default_graph())

True

In [7]: print(b.graph is tf.get_default_graph())

True

两个图，两个name = 'v'的variable；但这里不冲突。

import tensorflow as tf

g1 = tf.Graph()　　#自定了一个图

with g1.as_default():　　#设置为当前要操作的

    v = tf.get_variable("v", [1])

g2 = tf.Graph()

with g2.as_default():

    v = tf.get_variable("v", [1])

# 定义结构图
# 执行结构图

with tf.Session(graph = g1) as sess:　　# 执行图g1

    tf.global_variables_initializer().run()
    with tf.variable_scope("", reuse=True):

        print(sess.run(tf.get_variable("v")))

with tf.Session(graph = g2) as sess:　　# 执行图g2

    tf.global_variables_initializer().run()

    with tf.variable_scope("", reuse=True):

        print(sess.run(tf.get_variable("v")))

通过图，指定运行图的设备
g = tf.Graph()
with g.device('/gpu:0'):
　　result = a + b

集合

　　-- 将资源加入集合

[Tensorflow] Practice - The Tensorflow Way

张量

　　-- 仅保存了如何得到这些数字的计算过程

import tensorflow as tf

a = tf.constant([1.0, 2.0], name="a")

b = tf.constant([2.0, 3.0], name="b")

result = a + b

print(result)

sess = tf.InteractiveSession()

print(result.eval())

sess.close()

得到的是：对结果的一个引用。【一个张量的结构】

【add:0 表示result这个张量是计算节点“add"输出的第一个结果】

【2, 表示是一维数组，长度为2】

Tensor("add:0", shape=(2,), dtype=float32)

[ 3.  5.]

基本概念：

零阶张量：scalar

一阶张量：vector

二阶张量：matrix

三阶张量：super matrix :-p

会话

将所有计算放在“with"的内部：

with tf.Session() as sess:

    print(sess.run(result))

NB: Graph有默认的，自动生成；但session没有！The sess you create will be added autometically into this default Graph.

设置默认会话：【sess过程中有一次with就可以了】

sess = tf.Session()

with sess.as_default():

     print(result.eval())

Output:
[ 3.  7.]

指定为默认会话的意义是什么？获取张量的取值更加方便。

sess = tf.Session()

with sess.as_default():　　# 注册的过程

     print(result.eval())

通过InteractiveSession自动将会话注册为默认会话。

sess = tf.InteractiveSession ()　　# create session即同时注册

print(result.eval())

sess.close()　　# 但岂不是多了一行代码？方便在了哪里，不解

会话配置的修改

config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)

sess1  = tf.InteractiveSession(config=config)

sess2  = tf.Session(config=config)

[Tensorflow] Practice - The Tensorflow Way

矩阵计算

 a = tf.matmul(x, w1)　　# 已经默认考虑了转置问题，故比较方便

变量

【cookbook有详细实例】

w1= tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))

NB：seed的意义在于：保证每次运行得到的结果是一样的。

获得shape：

w1.get_shape()

Out[51]: TensorShape([Dimension(2), Dimension(3)])

w1.get_shape()[0]

Out[52]: Dimension(2)

w1.get_shape()[1]

Out[53]: Dimension(3)

通过“拷贝”初始化

w2 = tf.Variable(w1.initialized_value())　　# 直接拷贝别人家的初始值

w3 = tf.Variable(w1.initialized_value() * 2.0)

通过"随机数"初始化

[Tensorflow] Practice - The Tensorflow Way

通过"常数"初始化

[Tensorflow] Practice - The Tensorflow Way

变量初始化的执行

通过 tf.global_variables_initializer() 真正执行对变量初始化的设定。

w1= tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))

w2= tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))

x = tf.placeholder(tf.float32, shape=(1, 2), name="input")　　// 没有初始值，但最好给出自身“容器”的大小，将来给feed瞧

a = tf.matmul(x, w1)

y = tf.matmul(a, w2)

sess = tf.Session()

init_op = tf.global_variables_initializer()

sess.run(init_op)

print(sess.run(y, feed_dict={x: [[0.7,0.9]]}))

例如：w₁在Graph中的解析

[Tensorflow] Practice - The Tensorflow Way

Assign

变量维度的改变，但基本不用，也不会给自己找麻烦。

tf.assign( w1, w2, validate_shape=False )

第四章，深层神经网络

激活函数让神经网络不再线性化。

[Tensorflow] Practice - The Tensorflow Way

实现代码，可见极其简洁：

a = tf.nn.relu(tf.matmul(x, w1) + biases1)

y = tf.nn.relu(tf.matmul(a, w2) + biases2)

Cross-entropy

[Tensorflow] Practice - The Tensorflow Way

避免log值过小的方式：clip_by_value

cross_entropy = -tf.reduce_mean(t * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))

Before cross-entropy, we always use softmax: X * W --> softmax --> cross-entropy

softmax_cross_entropy_with_logits(

    _sentinel=None,

    labels   =None,

    logits   =None,

    dim      =-1,

    name     =None

)

sparse_softmax_cross_entropy_with_logits(

    _sentinel=None,

    labels   =None,

    logits   =None,

    name     =None

)

如果只是关心前向传播的预测值，那么其实只关心logits部分，然后需要取出最大概率的那个label。

MSE - L2 loss

NB: Classification by xentropy; For regression, we use MSE as following:

mse = tf.reduce_mean(tf.square(y_ - y))

Loss最终的归宿：

train_step = tf.train.AdamOptimizer(0.001).minimize(loss)

tensorflow api for LOSS:

absolute_difference(...): Adds an Absolute Difference loss to the training procedure.

add_loss(...): Adds a externally defined loss to the collection of losses.

compute_weighted_loss(...): Computes the weighted loss.

cosine_distance(...): Adds a cosine-distance loss to the training procedure.

get_losses(...): Gets the list of losses from the loss_collection.

get_regularization_loss(...): Gets the total regularization loss.

get_regularization_losses(...): Gets the list of regularization losses.

get_total_loss(...): Returns a tensor whose value represents the total loss.

hinge_loss(...): Adds a hinge loss to the training procedure.

huber_loss(...): Adds a Huber Loss term to the training procedure.

log_loss(...): Adds a Log Loss term to the training procedure.

mean_pairwise_squared_error(...): Adds a pairwise-errors-squared loss to the training procedure.

mean_squared_error(...): Adds a Sum-of-Squares loss to the training procedure.

sigmoid_cross_entropy(...): Creates a cross-entropy loss using tf.nn.sigmoid_cross_entropy_with_logits.

softmax_cross_entropy(...): Creates a cross-entropy loss using tf.nn.softmax_cross_entropy_with_logits.

sparse_softmax_cross_entropy(...): Cross-entropy loss using tf.nn.sparse_softmax_cross_entropy_with_logits.

高级点的问题

CNN--两个Loss层计算的数值问题 (overflow...)

　　From: https://zhuanlan.zhihu.com/p/22260935

在计算Loss部分是可能出现的一些小问题以及现在的解决方法。

其实也是仔细阅读下Caffe代码中有关Softmax loss和sigmoid cross entropy loss两个部分的真实计算方法。

exp这个函数实在是有毒

指数函数是一个很容易让数值爆炸的函数，那么输入大概到多少会溢出呢？蛋疼的我还是做了一个实验：

np.exp()

8.2184074615549724e+307

出现如下问题：

def naive_softmax(x):

    y = np.exp(x)

    return y / np.sum(y)

#b取值很大，部分值大于了709

b = np.random.rand(10) * 1000

print b

print naive_softmax(b)

[ 497.46732916  227.75385779  537.82669096  787.54950048  663.13861524

  224.69389572  958.39441314  139.09633232  381.35034548  604.08586655]

[  0.   0.   0.  nan   0.   0.  nan   0.   0.   0.]

那么如何解决呢？我们只要给每个数字除以一个大数，保证它不溢出，问题不就解决了？

老司机给出的方案是找出输入数据中最大的数，然后除以e的最大数次幂，相当于下面的代码：

def high_level_softmax(x):

    max_val = np.max(x)

    x -= max_val

    return naive_softmax(x)

However，scale太大，个别值太小了！

b = np.random.rand(10) * 1000

print b

print high_level_softmax(b)

[ 903.27437996  260.68316085   22.31677464  544.80611744  506.26848644

  698.38019158  833.72024087  200.55675076  924.07740602  909.39841128]

[  9.23337324e-010   7.79004225e-289   0.00000000e+000

   1.92562645e-165   3.53094986e-182   9.57072864e-099

   5.73299537e-040   6.01134555e-315   9.99999577e-001

   4.21690097e-007]

使用一点平滑的小技巧还是很有必要的，于是代码又变成：

def practical_softmax(x):

    max_val = np.max(x)

    x -= max_val

    y = np.exp(x)

    y[y < 1e-20] = 1e-20

    return y / np.sum(y)

Result: 相当于加了个下限

[  9.23337325e-10   9.99999577e-21   9.99999577e-21   9.99999577e-21

   9.99999577e-21   9.99999577e-21   9.99999577e-21   9.99999577e-21

   9.99999577e-01   4.21690096e-07]

【但，貌似一个简单的封装好的 preds = tf.nn.softmax(z)，即可解决这个问题】

sigmoid也是中毒专业户

因为其中包含了exp，*_*b

def naive_sigmoid_loss(x, t):

    y = 1 / (1 + np.exp(-x))

    return -np.sum(t * np.log(y) + (1 - t) * np.log(1 - y)) / y.shape[0]

a = np.random.rand(10)* 1000

b = a > 500

print a

print b

print naive_sigmoid_loss(a,b)

[  63.20798359  958.94378279  250.75385942  895.49371345  965.62635077

   81.1217712   423.36466749  532.20604694  333.45425951  185.72621262]

[False  True False  True  True False False  True False False]

nan

改进方法：

[Tensorflow] Practice - The Tensorflow Way

对应代码：

def high_level_sigmoid_loss(x, t):

    first = (t - (x > 0)) * x

    second = np.log(1 + np.exp(x - 2 * x * (x > 0)))

    return -np.sum(first - second) / x.shape[0]

a = np.random.rand(10)* 1000 - 500

b = a > 0

print a

print b

print high_level_sigmoid_loss(a,b)

[-173.48716596  462.06216262 -417.78666769    6.10480948  340.13986055

   23.64615392  256.33358957 -332.46689674  416.88593348 -246.51402684]

[False  True False  True  True  True  True False  True False]

0.000222961919658

NN的进一步优化问题

学习率的设置

没有label，求得的值 y = x² 就直接是lost function。

对于learning_rate = 1的理解：

导数是2x，故w变化是10，这就是震荡的原因。

import tensorflow as tf

TRAINING_STEPS =

LEARNING_RATE  = 1 　　#尝试改变学习率，查看收敛效果

# x here denotes w
x = tf.Variable(tf.constant(5, dtype=tf.float32), name="x")

y = tf.square(x)　　# y = x²train_op = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(y)

with tf.Session() as sess:

    sess.run(tf.global_variables_initializer())

    for i in range(TRAINING_STEPS):

        sess.run(train_op)

        x_value = sess.run(x)

        print "After %s iteration(s): x%s is %f."% (i+1, i+1, x_value)

指数递减学习率

TRAINING_STEPS = 100

global_step = tf.Variable(0)
LEARNING_RATE = tf.train.exponential_decay(0.1, global_step, 1, 0.96, staircase=True)

# 初始学习率
# 没1次训练学习率衰减为原来的0.96

x = tf.Variable(tf.constant(5, dtype=tf.float32), name="x")

y = tf.square(x)

train_op = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(y, global_step=global_step)

with tf.Session() as sess:

    sess.run(tf.global_variables_initializer())

    for i in range(TRAINING_STEPS):

        sess.run(train_op)

        if i % 10 == 0:

            LEARNING_RATE_value = sess.run(LEARNING_RATE)

            x_value = sess.run(x)

            print("After %s iteration(s): x%s is %f, learning rate is %f."% (i+1, i+1, x_value, LEARNING_RATE_value))

[Tensorflow] Practice - The Tensorflow Way

过拟合问题

[Tensorflow] Practice - The Tensorflow Way

画出这两个图，感觉很好玩的样子，怎么画呢？

import tensorflow as tf

import matplotlib.pyplot as plt

import numpy as np

data  = []
label = []

np.random.seed(0)

# 以原点为圆心，半径为1的圆把散点划分成红蓝两部分，并加入随机噪音。

剩下就是给data, label对儿不断添加一对对儿数据的过程。

for i in range(150):

    x1 = np.random.uniform(-1,1)

    x2 = np.random.uniform(0,2)

    if x1**2 + x2**2 <= 1:

        data.append([np.random.normal(x1, 0.1),np.random.normal(x2,0.1)])

        label.append(0)

    else:

        data.append([np.random.normal(x1, 0.1), np.random.normal(x2, 0.1)])

        label.append(1)

data  = np.hstack(data ).reshape(-1,2)　　# 这里的2对应了二维空间的x,y两个坐标值

label = np.hstack(label).reshape(-1,1)

plt.scatter(data[:,0], data[:,1], c=label,

           cmap="RdBu", vmin=-.2, vmax=1.2, edgecolor="white")

plt.show()

np.hstack 用法

>>> a = np.array((1,2,3))

>>> b = np.array((2,3,4))  

>>> np.hstack((a,b))

array([1, 2, 3, 2, 3, 4])  

>>> a = np.array([[1],[2],[3]])

>>> b = np.array([[2],[3],[4]])  

>>> np.hstack((a,b))

array([[1, 2],

       [2, 3],

       [3, 4]])

np.reshape 用法

a=array([[1,2,3],[4,5,6]])

reshape(a, 6) 

  Out[202]: 
  array([1, 2, 3, 4, 5, 6])

NB：这里的 ‘-1’

reshape(a, (3, -1)) #为指定的值将被推断出为2

Out[204]:
array([[1, 2],
[3, 4],
[5, 6]])

循环生成网络结构，好巧妙的技巧！

x  = tf.placeholder(tf.float32, shape=(None, 2))

y_ = tf.placeholder(tf.float32, shape=(None, 1))

sample_size = len(data)

# 每层节点的个数：比较有意思的构建网络方法

layer_dimension = [2,10,5,3,1]

n_layers = len(layer_dimension)

cur_layer = x

# 循环生成网络结构

for i in range(1, n_layers):　　# NB：这是是从2nd layer开始，也就是第一个out_layer

    in_dimension  = layer_dimension[i-1]
    out_dimension = layer_dimension[i]

    weight    = get_weight([in_dimension, out_dimension], 0.003)　　# 正则参数 ---->

    bias      = tf.Variable(tf.constant(0.1, shape=[out_dimension]))
cur_layer = tf.nn.elu(tf.matmul(cur_layer, weight) + bias)    

y= cur_layer

# 损失函数的定义。
# 这里只需要计算"刻画模型在训练数据集上的表现"的损失函数
mse_loss = tf.reduce_sum(tf.pow(y_ - y, 2)) / sample_size

tf.add_to_collection('losses', mse_loss)　　# 还没有正则的loss

# 得到了最终的损失函数 - 同时也结合了get_weight中的add_to_collection
loss = tf.add_n(tf.get_collection('losses'))

tf.get_collection('losses') 的内容如下：

[<tf.Tensor 'l2_regularizer:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_1:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_2:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_3:0' shape=() dtype=float32>, <tf.Tensor 'truediv:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_4:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_5:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_6:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_7:0' shape=() dtype=float32>, <tf.Tensor 'truediv_1:0' shape=() dtype=float32>]

将“L2正则后的权重变量var”加入到集合中：tf.add_to_collecdtion。

def get_weight(shape, lambda1):
var = tf.Variable(tf.random_normal(shape), dtype=tf.float32)

    tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(lambda1)(var))

    return var

训练不带正则项的损失函数mse_loss

# 定义训练的目标函数mse_loss，训练次数及训练模型

train_op = tf.train.AdamOptimizer(0.001).minimize(mse_loss)

TRAINING_STEPS = 40000

with tf.Session() as sess:

    tf.global_variables_initializer().run()

    for i in range(TRAINING_STEPS):

        sess.run(train_op, feed_dict={x: data, y_: label})

        if i % 2000 == 0:

            print("After %d steps, mse_loss: %f" % (i,sess.run(mse_loss, feed_dict={x: data, y_: label})))


    # 画出训练后的分割曲线 - 很有意思！  
    # 1. 画网格

    xx, yy = np.mgrid[-1.2:1.2:.01, -0.2:2.2:.01]
grid = np.c_[xx.ravel(), yy.ravel()]
    # 2.

    probs = sess.run(y, feed_dict={x:grid})　　# y在这里代表了最后一层
probs = probs.reshape(xx.shape)

plt.scatter(data[:,0], data[:,1], c=label, cmap="RdBu", vmin=-.2, vmax=1.2, edgecolor="white")

plt.contour(xx, yy, probs, levels=[.5], cmap="Greys", vmin=0, vmax=.1)

plt.show()

Ref: http://blog.****.net/u013534498/article/details/51399035

这篇博文我喜欢，数据表现也需要开专题学习。

np.mgrid用法

np.mgrid[-1.2:1.2:.01, -0.2:2.2:.01]
参数格式：行，列，间隙

Out[217]:

array([[[-1.2 , -1.2 , -1.2 , ..., -1.2 , -1.2 , -1.2 ],

        [-1.19, -1.19, -1.19, ..., -1.19, -1.19, -1.19],

        [-1.18, -1.18, -1.18, ..., -1.18, -1.18, -1.18],

        ...,

        [ 1.17,  1.17,  1.17, ...,  1.17,  1.17,  1.17],

        [ 1.18,  1.18,  1.18, ...,  1.18,  1.18,  1.18],

        [ 1.19,  1.19,  1.19, ...,  1.19,  1.19,  1.19]],

       [[-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ],

        [-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ],

        [-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ],

        ...,

        [-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ],

        [-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ],

        [-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ]]])

滑动平均模型

衰减率：模型更新的速度

变量 --> 影子变量 (share init)

影子变量 = 衰减率*影子变量+(1-衰减率)*变量

衰减率越大，变量更新越快！

decay总体上不希望更新太快，但前期希望更新快些的衰减率设置办法：

[Tensorflow] Practice - The Tensorflow Way

查看不同迭代中变量取值的变化

import tensorflow as tf

v1   = tf.Variable(0, dtype=tf.float32)
step = tf.Variable(0, trainable=False)
ema  = tf.train.ExponentialMovingAverage(0.99, step)   # step：控制衰减率的变量
maintain_averages_op = ema.apply([v1])                 # 更新列表中的变量

with tf.Session() as sess:

    # 初始化

    init_op = tf.global_variables_initializer()

    sess.run(init_op)

    print(sess.run([v1, ema.average(v1)]))

     [0.0, 0.0]

    # 更新变量v1的取值

    sess.run(tf.assign(v1, 5))

    sess.run(maintain_averages_op)

    print(sess.run([v1, ema.average(v1)]))

     [5.0, 4.5]

    # 更新step和v1的取值

    sess.run(tf.assign(step, 10000))  

    sess.run(tf.assign(v1, 10))

    sess.run(maintain_averages_op)

    print(sess.run([v1, ema.average(v1)]))

     [10.0, 4.5549998]

    # 更新一次v1的滑动平均值

    sess.run(maintain_averages_op)

    print(sess.run([v1, ema.average(v1)]))

     [10.0, 4.6094499]

还是不太了解其目的：难道就是为了ema.average(v1) 这个返回结果？

疑难杂症

版本查看：

python  -c 'import tensorflow as tf; print(tf.__version__)'  # for Python 2

python3 -c 'import tensorflow as tf; print(tf.__version__)'  # for Python 3

安装升级：

unsw@unsw-UX303UB$ pip3 install --upgrade tensorflow

Requirement already up-to-date: tensorflow in /usr/local/anaconda3/lib/python3.5/site-packages

Requirement already up-to-date: six>=1.10.0 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow)

Requirement already up-to-date: tensorflow-tensorboard<0.2.0,>=0.1.0 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow)

Requirement already up-to-date: wheel>=0.26 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow)

Requirement already up-to-date: protobuf>=3.3.0 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow)

Requirement already up-to-date: numpy>=1.11.0 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow)

Requirement already up-to-date: werkzeug>=0.11.10 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow)

Requirement already up-to-date: markdown>=2.6.8 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow)

Requirement already up-to-date: bleach==1.5.0 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow)

Requirement already up-to-date: html5lib==0.9999999 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow)

Requirement already up-to-date: setuptools in /usr/local/anaconda3/lib/python3.5/site-packages (from protobuf>=3.3.0->tensorflow)

unsw@unsw-UX303UB$ python3 -c 'import tensorflow as tf; print(tf.__version__)'

1.3.0

忽略警告：https://github.com/tensorflow/tensorflow/issues/7778

import os

os.environ['TF_CPP_MIN_LOG_LEVEL']=''

秒客网

[Tensorflow] Practice - The Tensorflow Way

计算图

集合

张量

基本概念：

会话

会话配置的修改

矩阵计算

变量

变量初始化的执行

Assign

Cross-entropy

MSE - L2 loss

高级点的问题

CNN--两个Loss层计算的数值问题 (overflow...)

NN的进一步优化问题

学习率的设置

过拟合问题

训练不带正则项的损失函数mse_loss

np.mgrid用法

滑动平均模型

查看不同迭代中变量取值的变化

疑难杂症

相关文章