1 人工智能三学派

行为主义控制论
符号主义专家系统
连接主义神经元

2 CUDA cuDNN

CUDA: Compute Unified Device Architecture 相当于一个平台

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64

加入环境变量，让系统能找到相应的动态库和程序
cuDNN: CUDA Deep Neural Network library 相当于一个插件插到CUDA平台上，用于深度神经网络的GPU加速库所以下载好cuDNN之后，把其bin、include、lib目录下的文件复制到对应CUDA安装目录的bin、include、lib目录。(仔细想想为什么，还可以怎么做？)

3 激活函数

评判好坏：

梯度消失？
收敛速度？
运算速度？
Sigmoid
1
tf.nn.sigmoid(x)
- 容易造成梯度消失
- 输出非0均值，收敛慢
- 由于幂运算，训练时间长
tan h
1
tf.math.tanh(x)
- 容易造成梯度消失
- 输出0均值，收敛慢
- 由于幂运算，训练时间长
ReLU
1
tf.nn.relu(x)
优点:
- 解决了梯度消失的问题（正区间）
- 由于只需要判断与0的大小，运算速度快
- 收敛速度远快于前两个
缺点：
- 输入非0均值，收敛慢
- Dead ReLU问题，送入激活函数的值小于0时，某些神经元永远不被激活，导致相应的参数永远不被更新？（Solver：设置更小的学习率，减少参数分布的巨大变化）
Leak ReLU
1
tf.nn.leaky_relu(x)
- 拥有ReLU的所有优点
- 实际中不一定比ReLU好

建议:

首选ReLU
学习率设置较小值
特征标准化均值为0，标准差为1的正态分布
初始参数中心化，均值为0，标准差为$sqrt(2/当前层输入特征的个数)$的正态分布

4 欠拟合过拟合正则化

欠拟合
- 增加输入特征项
- 增加网络参数
- 减少正则化参数
过拟合
- 数据清洗，减少噪声
- 增大训练集
- 采用正则化
- 增大正则化参数

文件dot.csv内容如下:

x1,x2,y_c
-0.416757847,-0.056266827,1
-2.136196096,1.640270808,0
-1.793435585,-0.841747366,0
0.502881417,-1.245288087,1
-1.057952219,-0.909007615,1
0.551454045,2.292208013,0
0.041539393,-1.117925445,1
0.539058321,-0.5961597,1
-0.019130497,1.17500122,1
-0.747870949,0.009025251,1
-0.878107893,-0.15643417,1
0.256570452,-0.988779049,1
-0.338821966,-0.236184031,1
-0.637655012,-1.187612286,1
-1.421217227,-0.153495196,0
-0.26905696,2.231366789,0
-2.434767577,0.112726505,0
0.370444537,1.359633863,1
0.501857207,-0.844213704,1
9.76E-06,0.542352572,1
-0.313508197,0.771011738,1
-1.868090655,1.731184666,0
1.467678011,-0.335677339,0
0.61134078,0.047970592,1
-0.829135289,0.087710218,1
1.000365887,-0.381092518,1
-0.375669423,-0.074470763,1
0.43349633,1.27837923,1
-0.634679305,0.508396243,1
0.216116006,-1.858612386,0
-0.419316482,-0.132328898,1
-0.03957024,0.326003433,1
-2.040323049,0.046255523,0
-0.677675577,-1.439439027,0
0.52429643,0.735279576,1
-0.653250268,0.842456282,1
-0.381516482,0.066489009,1
-1.098738947,1.584487056,0
-2.659449456,-0.091452623,0
0.695119605,-2.033466546,0
-0.189469265,-0.077218665,1
0.824703005,1.248212921,0
-0.403892269,-1.384518667,0
1.367235424,1.217885633,0
-0.462005348,0.350888494,1
0.381866234,0.566275441,1
0.204207979,1.406696242,0
-1.737959504,1.040823953,0
0.38047197,-0.217135269,1
1.173531498,-2.343603191,0
1.161521491,0.386078048,1
-1.133133274,0.433092555,1
-0.304086439,2.585294868,0
1.835332723,0.440689872,0
-0.719253841,-0.583414595,1
-0.325049628,-0.560234506,1
-0.902246068,-0.590972275,1
-0.276179492,-0.516883894,1
-0.69858995,-0.928891925,1
2.550438236,-1.473173248,0
-1.021414731,0.432395701,1
-0.32358007,0.423824708,1
0.799179995,1.262613663,0
0.751964849,-0.993760983,1
1.109143281,-1.764917728,0
-0.114421297,-0.498174194,1
-1.060799036,0.591666521,1
-0.183256574,1.019854729,1
-1.482465478,0.846311892,0
0.497940148,0.126504175,1
-1.418810551,-0.251774118,0
-1.546674611,-2.082651936,0
3.279745401,0.97086132,0
1.792592852,-0.429013319,0
0.69619798,0.697416272,1
0.601515814,0.003659491,1
-0.228247558,-2.069612263,0
0.610144086,0.4234969,1
1.117886733,-0.274242089,1
1.741812188,-0.447500876,0
-1.255427218,0.938163671,0
-0.46834626,-1.254720307,1
0.124823646,0.756502143,1
0.241439629,0.497425649,1
4.108692624,0.821120877,0
1.531760316,-1.985845774,0
0.365053516,0.774082033,1
-0.364479092,-0.875979478,1
0.396520159,-0.314617436,1
-0.593755583,1.149500568,1
1.335566168,0.302629336,1
-0.454227855,0.514370717,1
0.829458431,0.630621967,1
-1.45336435,-0.338017777,0
0.359133332,0.622220414,1
0.960781945,0.758370347,1
-1.134318483,-0.707420888,1
-1.221429165,1.804476642,0
0.180409807,0.553164274,1
1.033029066,-0.329002435,1
-1.151002944,-0.426522471,1
-0.148147191,1.501436915,0
0.869598198,-1.087090575,1
0.664221413,0.734884668,1
-1.061365744,-0.108516824,1
-1.850403974,0.330488064,0
-0.31569321,-1.350002103,1
-0.698170998,0.239951198,1
-0.55294944,0.299526813,1
0.552663696,-0.840443012,1
-0.31227067,2.144678089,0
0.121105582,-0.846828752,1
0.060462449,-1.33858888,1
1.132746076,0.370304843,1
1.085806404,0.902179395,1
0.39029645,0.975509412,1
0.191573647,-0.662209012,1
-1.023514985,-0.448174823,1
-2.505458132,1.825994457,0
-1.714067411,-0.076639564,0
-1.31756727,-2.025593592,0
-0.082245375,-0.304666585,1
-0.15972413,0.54894656,1
-0.618375485,0.378794466,1
0.513251444,-0.334844125,1
-0.283519516,0.538424263,1
0.057250947,0.159088487,1
-2.374402684,0.058519935,0
0.376545911,-0.135479764,1
0.335908395,1.904375909,0
0.085364433,0.665334278,1
-0.849995503,-0.852341797,1
-0.479985112,-1.019649099,1
-0.007601138,-0.933830661,1
-0.174996844,-1.437143432,0
-1.652200291,-0.675661789,0
-1.067067124,-0.652931145,1
-0.61209475,-0.351262461,1
1.045477988,1.369016024,0
0.725353259,-0.359474459,1
1.49695179,-1.531111108,0
-2.023363939,0.267972576,0
-0.002206445,-0.139291883,1
0.032565469,-1.640560225,0
-1.156699171,1.234034681,0
1.028184899,-0.721879726,1
1.933156966,-1.070796326,0
-0.571381608,0.292432067,1
-1.194999895,-0.487930544,1
-0.173071165,-0.395346401,1
0.870840765,0.592806797,1
-1.099297309,-0.681530644,1
0.180066685,-0.066931044,1
-0.78774954,0.424753672,1
0.819885117,-0.631118683,1
0.789059649,-1.621673803,0
-1.610499259,0.499939764,0
-0.834515207,-0.996959687,1
-0.263388077,-0.677360492,1
0.327067038,-1.455359445,0
-0.371519124,3.16096597,0
0.109951013,-1.913523218,0
0.599820429,0.549384465,1
1.383781035,0.148349243,1
-0.653541444,1.408833984,0
0.712061227,-1.800716041,0
0.747598942,-0.232897001,1
1.11064528,-0.373338813,1
0.78614607,0.194168696,1
0.586204098,-0.020387292,1
-0.414408598,0.067313412,1
0.631798924,0.417592731,1
1.615176269,0.425606211,0
0.635363758,2.102229267,0
0.066126417,0.535558351,1
-0.603140792,0.041957629,1
1.641914637,0.311697707,0
1.4511699,-1.06492788,0
-1.400845455,0.307525527,0
-1.369638673,2.670337245,0
1.248450298,-1.245726553,0
-0.167168774,-0.57661093,1
0.416021749,-0.057847263,1
0.931887358,1.468332133,0
-0.221320943,-1.173155621,1
0.562669078,-0.164515057,1
1.144855376,-0.152117687,1
0.829789046,0.336065952,1
-0.189044051,-0.449328601,1
0.713524448,2.529734874,0
0.837615794,-0.131682403,1
0.707592866,0.114053878,1
-1.280895178,0.309846277,1
1.548290694,-0.315828043,0
-1.125903781,0.488496666,1
1.830946657,0.940175993,0
1.018717047,2.302378289,0
1.621092978,0.712683273,0
-0.208703629,0.137617991,1
-0.103352168,0.848350567,1
-0.883125561,1.545386826,0
0.145840073,-0.400106056,1
0.815206041,-2.074922365,0
-0.834437391,-0.657718447,1
0.820564332,-0.489157001,1
1.424967034,-0.446857897,0
0.521109431,-0.70819438,1
1.15553059,-0.254530459,1
0.518924924,-0.492994911,1
-1.086548153,-0.230917497,1
1.098010039,-1.01787805,0
-1.529391355,-0.307987737,0
0.780754356,-1.055839639,1
-0.543883381,0.184301739,1
-0.330675843,0.287208202,1
1.189528137,0.021201548,1
-0.06540968,0.766115904,1
-0.061635085,-0.952897152,1
-1.014463064,-1.115263963,0
1.912600678,-0.045263203,0
0.576909718,0.717805695,1
-0.938998998,0.628775807,1
-0.564493432,-2.087807462,0
-0.215050132,-1.075028564,1
-0.337972149,0.343212732,1
2.28253964,-0.495778848,0
-0.163962832,0.371622161,1
0.18652152,-0.158429224,1
-1.082929557,-0.95662552,0
-0.183376735,-1.159806896,1
-0.657768362,-1.251448406,1
1.124482861,-1.497839806,0
1.902017223,-0.580383038,0
-1.054915674,-1.182757204,0
0.779480054,1.026597951,1
-0.848666001,0.331539648,1
-0.149591353,-0.2424406,1
0.151197175,0.765069481,1
-1.916630519,-2.227341292,0
0.206689897,-0.070876356,1
0.684759969,-1.707539051,0
-0.986569665,1.543536339,0
-1.310270529,0.363433972,1
-0.794872445,-0.405286267,1
-1.377757931,1.186048676,0
-1.903821143,-1.198140378,0
-0.910065643,1.176454193,0
0.29921067,0.679267178,1
-0.01766068,0.236040923,1
0.494035871,1.546277646,0
0.246857508,-1.468775799,0
1.147099942,0.095556985,1
-1.107438726,-0.176286141,1
-0.982755667,2.086682727,0
-0.344623671,-2.002079233,0
0.303234433,-0.829874845,1
1.288769407,0.134925462,1
-1.778600641,-0.50079149,0
-1.088161569,-0.757855553,1
-0.6437449,-2.008784527,0
0.196262894,-0.87589637,1
-0.893609209,0.751902355,1
1.896932244,-0.629079151,0
1.812085527,-2.056265741,0
0.562704887,-0.582070757,1
-0.074002975,-0.986496364,1
-0.594722499,-0.314811843,1
-0.346940532,0.411443516,1
2.326390901,-0.634053128,0
-0.154409962,-1.749288804,0
-2.519579296,1.391162427,0
-1.329346443,-0.745596414,0
0.02126085,0.910917515,1
0.315276082,1.866208205,0
-0.182497623,-1.82826634,0
0.138955717,0.119450165,1
-0.8188992,-0.332639265,1
-0.586387955,1.734516344,0
-0.612751558,-1.393442017,0
0.279433757,-1.822231268,0
0.427017458,0.406987749,1
-0.844308241,-0.559820113,1
-0.600520405,1.614873237,0
0.39495322,-1.203813469,1
-1.247472432,-0.07754625,1
-0.013339751,-0.76832325,1
0.29123401,-0.197330948,1
1.07682965,0.437410232,1
-0.093197866,0.135631416,1
-0.882708822,0.884744194,1
0.383204463,-0.416994149,1
0.11779655,-0.536685309,1
2.487184575,-0.451361054,0
0.518836127,0.364448005,1
-0.798348729,0.005657797,1
-0.320934708,0.24951355,1
0.256308392,0.767625083,1
0.783020087,-0.407063047,1
-0.524891667,-0.589808683,1
-0.862531086,-1.742872904,0

未加入正则化

# 导入所需模块
import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

# 读入数据/标签 生成x_train y_train
df = pd.read_csv('dot.csv')
x_data = np.array(df[['x1', 'x2']])
y_data = np.array(df['y_c'])

x_train = np.vstack(x_data).reshape(-1,2)
y_train = np.vstack(y_data).reshape(-1,1)

Y_c = [['red' if y else 'blue'] for y in y_train]

# 转换x的数据类型，否则后面矩阵相乘时会因数据类型问题报错
x_train = tf.cast(x_train, tf.float32)
y_train = tf.cast(y_train, tf.float32)

# from_tensor_slices函数切分传入的张量的第一个维度，生成相应的数据集，使输入特征和标签值一一对应
train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)

# 生成神经网络的参数，输入层为2个神经元，隐藏层为11个神经元，1层隐藏层，输出层为1个神经元
# 用tf.Variable()保证参数可训练
w1 = tf.Variable(tf.random.normal([2, 11]), dtype=tf.float32)
b1 = tf.Variable(tf.constant(0.01, shape=[11]))

w2 = tf.Variable(tf.random.normal([11, 1]), dtype=tf.float32)
b2 = tf.Variable(tf.constant(0.01, shape=[1]))

lr = 0.01  # 学习率
epoch = 400  # 循环轮数

# 训练部分
for epoch in range(epoch):
    for step, (x_train, y_train) in enumerate(train_db):
        with tf.GradientTape() as tape:  # 记录梯度信息

            h1 = tf.matmul(x_train, w1) + b1  # 记录神经网络乘加运算
            h1 = tf.nn.relu(h1)
            y = tf.matmul(h1, w2) + b2

            # 采用均方误差损失函数mse = mean(sum(y-out)^2)
            loss = tf.reduce_mean(tf.square(y_train - y))

        # 计算loss对各个参数的梯度
        variables = [w1, b1, w2, b2]
        grads = tape.gradient(loss, variables)

        # 实现梯度更新
        # w1 = w1 - lr * w1_grad tape.gradient是自动求导结果与[w1, b1, w2, b2] 索引为0，1，2，3 
        w1.assign_sub(lr * grads[0])
        b1.assign_sub(lr * grads[1])
        w2.assign_sub(lr * grads[2])
        b2.assign_sub(lr * grads[3])

    # 每20个epoch，打印loss信息
    if epoch % 20 == 0:
        print('epoch:', epoch, 'loss:', float(loss))

# 预测部分
print("*******predict*******")
# xx在-3到3之间以步长为0.01，yy在-3到3之间以步长0.01,生成间隔数值点
xx, yy = np.mgrid[-3:3:.1, -3:3:.1]
# 将xx , yy拉直，并合并配对为二维张量，生成二维坐标点
grid = np.c_[xx.ravel(), yy.ravel()]
grid = tf.cast(grid, tf.float32)
# 将网格坐标点喂入神经网络，进行预测，probs为输出
probs = []
for x_test in grid:
    # 使用训练好的参数进行预测
    h1 = tf.matmul([x_test], w1) + b1
    h1 = tf.nn.relu(h1)
    y = tf.matmul(h1, w2) + b2  # y为预测结果
    probs.append(y)

# 取第0列给x1，取第1列给x2
x1 = x_data[:, 0]
x2 = x_data[:, 1]
# probs的shape调整成xx的样子
probs = np.array(probs).reshape(xx.shape)
plt.scatter(x1, x2, color=np.squeeze(Y_c)) #squeeze去掉纬度是1的纬度,相当于去掉[['red'],[''blue]],内层括号变为['red','blue']
# 把坐标xx yy和对应的值probs放入contour<[‘kɑntʊr]>函数，给probs值为0.5的所有点上色  plt点show后 显示的是红蓝点的分界线
plt.contour(xx, yy, probs, levels=[.5])
plt.show()

# 读入红蓝点，画出分割线，不包含正则化
# 不清楚的数据，建议print出来查看

加入正则化

# 导入所需模块
import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

# 读入数据/标签 生成x_train y_train
df = pd.read_csv('dot.csv')
x_data = np.array(df[['x1', 'x2']])
y_data = np.array(df['y_c'])

x_train = x_data
y_train = y_data.reshape(-1, 1)

Y_c = [['red' if y else 'blue'] for y in y_train]

# 转换x的数据类型，否则后面矩阵相乘时会因数据类型问题报错
x_train = tf.cast(x_train, tf.float32)
y_train = tf.cast(y_train, tf.float32)

# from_tensor_slices函数切分传入的张量的第一个维度，生成相应的数据集，使输入特征和标签值一一对应
train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)

# 生成神经网络的参数，输入层为4个神经元，隐藏层为32个神经元，2层隐藏层，输出层为3个神经元
# 用tf.Variable()保证参数可训练
w1 = tf.Variable(tf.random.normal([2, 11]), dtype=tf.float32)
b1 = tf.Variable(tf.constant(0.01, shape=[11]))

w2 = tf.Variable(tf.random.normal([11, 1]), dtype=tf.float32)
b2 = tf.Variable(tf.constant(0.01, shape=[1]))

lr = 0.01  # 学习率为
epoch = 400  # 循环轮数

# 训练部分
for epoch in range(epoch):
    for step, (x_train, y_train) in enumerate(train_db):
        with tf.GradientTape() as tape:  # 记录梯度信息

            h1 = tf.matmul(x_train, w1) + b1  # 记录神经网络乘加运算
            h1 = tf.nn.relu(h1)
            y = tf.matmul(h1, w2) + b2

            # 采用均方误差损失函数mse = mean(sum(y-out)^2)
            loss_mse = tf.reduce_mean(tf.square(y_train - y))
            # 添加l2正则化
            loss_regularization = []
            # tf.nn.l2_loss(w)=sum(w ** 2) / 2 
            loss_regularization.append(tf.nn.l2_loss(w1))
            loss_regularization.append(tf.nn.l2_loss(w2))
            # 求和
            # 例：x=tf.constant(([1,1,1],[1,1,1]))
            #   tf.reduce_sum(x)
            # >>>6
            # loss_regularization = tf.reduce_sum(tf.stack(loss_regularization))
            loss_regularization = tf.reduce_sum(loss_regularization)
            loss = loss_mse + 0.03 * loss_regularization  # REGULARIZER = 0.03

        # 计算loss对各个参数的梯度
        variables = [w1, b1, w2, b2]
        grads = tape.gradient(loss, variables)

        # 实现梯度更新
        # w1 = w1 - lr * w1_grad
        w1.assign_sub(lr * grads[0])image-20210512233454999
        b1.assign_sub(lr * grads[1])
        w2.assign_sub(lr * grads[2])
        b2.assign_sub(lr * grads[3])

    # 每200个epoch，打印loss信息
    if epoch % 20 == 0:
        print('epoch:', epoch, 'loss:', float(loss))

# 预测部分
print("*******predict*******")
# xx在-3到3之间以步长为0.01，yy在-3到3之间以步长0.01,生成间隔数值点
xx, yy = np.mgrid[-3:3:.1, -3:3:.1]
# 将xx, yy拉直，并合并配对为二维张量，生成二维坐标点
grid = np.c_[xx.ravel(), yy.ravel()]
grid = tf.cast(grid, tf.float32)
# 将网格坐标点喂入神经网络，进行预测，probs为输出
probs = []
for x_predict in grid:
    # 使用训练好的参数进行预测
    h1 = tf.matmul([x_predict], w1) + b1
    h1 = tf.nn.relu(h1)
    y = tf.matmul(h1, w2) + b2  # y为预测结果
    probs.append(y)

# 取第0列给x1，取第1列给x2
x1 = x_data[:, 0]
x2 = x_data[:, 1]
# probs的shape调整成xx的样子
probs = np.array(probs).reshape(xx.shape)
plt.scatter(x1, x2, color=np.squeeze(Y_c))
# 把坐标xx yy和对应的值probs放入contour<[‘kɑntʊr]>函数，给probs值为0.5的所有点上色  plt点show后 显示的是红蓝点的分界线
plt.contour(xx, yy, probs, levels=[.5])
plt.show()

# 读入红蓝点，画出分割线，包含正则化
# 不清楚的数据，建议print出来查看

5 tensorflow实现寻找$loss=(w+1)^2$的最小值

import tensorflow as tf

w = tf.Variable(tf.constant(5, dtype=tf.float32))  # tf.Variable 设置为可训练
lr = 0.2
epoch = 40
print(epoch)

# for epoch 定义顶层循环，表示对数据集循环epoch次，此例数据集数据仅有1个w,初始化时候constant赋值为5，循环40次迭代。
for epoch in range(epoch):
    with tf.GradientTape() as tape:  # with结构到grads框起了梯度的计算过程。
        loss = tf.square(w + 1)
    grads = tape.gradient(loss, w)  # .gradient函数告知谁对谁求导

    # .assign_sub 对变量做自减 即：w -= lr*grads 即 w = w - lr*grads
    w.assign_sub(lr * grads)
    print("After %s epoch,w is %f,loss is %f" % (epoch, w.numpy(), loss))

# lr初始值：0.2   请自改学习率  0.001  0.999 看收敛过程
# 最终目的：找到 loss 最小 即 w = -1 的最优参数w

6 张量

6.1 创建一个Tensor

import tensorflow as tf

a = tf.constant([1, 5], dtype=tf.int64)
print("a:", a)
print("a.dtype:", a.dtype)
print("a.shape:", a.shape)

# 本机默认 tf.int32  可去掉dtype试一下 查看默认值

import tensorflow as tf

a = tf.zeros([2, 3])
b = tf.ones(4)
c = tf.fill([2, 2], 9)
print("a:", a)
print("b:", b)
print("c:", c)

import tensorflow as tf

d = tf.random.normal([2, 2], mean=0.5, stddev=1)
print("d:", d)
e = tf.random.truncated_normal([2, 2], mean=0.5, stddev=1)
print("e:", e)

import tensorflow as tf

f = tf.random.uniform([2, 2], minval=0, maxval=1)
print("f:", f)

6.2 numpy类型 < ---------- >tensor类型

import tensorflow as tf
import numpy as np

a = np.arange(0, 5)
b = tf.convert_to_tensor(a, dtype=tf.int64)  # b.numpy()
print("a:", a)
print("b:", b)
print("b:", b.numpy())  # tensor ----> numpy类型

7 tensorflow常用函数

import tensorflow as tf

x1 = tf.constant([1., 2., 3.], dtype=tf.float64)
print("x1:", x1)
x2 = tf.cast(x1, tf.int32)
print("x2", x2)
print("minimum of x2：", tf.reduce_min(x2))
print("maxmum of x2:", tf.reduce_max(x2))

import tensorflow as tf

x = tf.constant([[1, 2, 3], [2, 2, 3]])
print("x:", x)
print("mean of x:", tf.reduce_mean(x))  # 求x中所有数的均值
print("sum of x:", tf.reduce_sum(x, axis=1))  # 求每一行的和

import tensorflow as tf

a = tf.ones([1, 3])
b = tf.fill([1, 3], 3.)
print("a:", a)
print("b:", b)
print("a+b:", tf.add(a, b))
print("a-b:", tf.subtract(a, b))
print("a*b:", tf.multiply(a, b))
print("b/a:", tf.divide(b, a))

import tensorflow as tf

a = tf.fill([1, 2], 3.)
print("a:", a)
print("a的平方:", tf.pow(a, 3))
print("a的平方:", tf.square(a))
print("a的开方:", tf.sqrt(a))

import tensorflow as tf

a = tf.ones([3, 2])
b = tf.fill([2, 3], 3.)
print("a:", a)
print("b:", b)
print("a*b:", tf.matmul(a, b))

import tensorflow as tf

features = tf.constant([12, 23, 10, 17])
labels = tf.constant([0, 1, 1, 0])
dataset = tf.data.Dataset.from_tensor_slices((features, labels))
for element in dataset:
    print(element)

import tensorflow as tf

with tf.GradientTape() as tape:
    x = tf.Variable(tf.constant(3.0))
    y = tf.pow(x, 2)
grad = tape.gradient(y, x)
print(grad)

1
2
3

seq = ['one', 'two', 'three']
for i, element in enumerate(seq):
    print(i, element)

import tensorflow as tf

classes = 3  # 3分类
labels = tf.constant([1, 0, 2])  # 输入的元素值最小为0，最大为2
output = tf.one_hot(labels, depth=classes)
print("output:\n", output)

假设输出y中的值都很大的时候，由于有指数运算可能导致计算机溢出，一个好的解决办法是找出输出y中的最大值，让y中的每一个元素都减去这个最大值,再把处理后的y送给softmax函数。

import tensorflow as tf

y = tf.constant([1.01, 2.01, -0.66])
y_pro = tf.nn.softmax(y)

print("After softmax, y_pro is:", y_pro)  # y_pro 符合概率分布

print("The sum of y_pro:", tf.reduce_sum(y_pro))  # 通过softmax后，所有概率加起来和为1

import tensorflow as tf

x = tf.Variable(4)
x.assign_sub(1)
print("x:", x)  # 4-1=3

import numpy as np
import tensorflow as tf

test = np.array([[1, 2, 3], [2, 3, 4], [5, 4, 3], [8, 7, 2]])
print("test:\n", test)
print("每一列的最大值的索引：", tf.argmax(test, axis=0))  # 返回每一列最大值的索引
print("每一行的最大值的索引", tf.argmax(test, axis=1))  # 返回每一行最大值的索引

import tensorflow as tf

a = tf.constant([1, 2, 3, 1, 1])
b = tf.constant([0, 1, 3, 4, 5])
c = tf.where(tf.greater(a, b), a, b)  # 若a>b，返回a对应位置的元素，否则返回b对应位置的元素
print("c：", c)

import numpy as np

rdm = np.random.RandomState(seed=1)
a = rdm.rand()
b = rdm.rand(2, 3)    # 注意这里不是rdm.rand([2, 3])
print("a:", a)
print("b:", b)

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.vstack((a, b))
print("c:\n", c)

import numpy as np
import tensorflow as tf

# 生成等间隔数值点
x, y = np.mgrid[1:3:1, 2:4:0.5]    # 请特别注意这里是方括号！！！！
# 将x, y拉直，并合并配对为二维张量，生成二维坐标点
grid = np.c_[x.ravel(), y.ravel()]      # 请特别注意这里是方括号！！！！
print("x:\n", x)
print("y:\n", y)
print("x.ravel():\n", x.ravel())
print("y.ravel():\n", y.ravel())
print('grid:\n', grid)

8 鸢尾花数据集的读入

from sklearn import datasets
from pandas import DataFrame
import pandas as pd

x_data = datasets.load_iris().data  # .data返回iris数据集所有输入特征
y_data = datasets.load_iris().target  # .target返回iris数据集所有标签
print("x_data from datasets: \n", x_data)
print("y_data from datasets: \n", y_data)

# 为表格增加行索引（左侧）和列标签（上方）
x_data = DataFrame(x_data, columns=['花萼长度', '花萼宽度', '花瓣长度', '花瓣宽度'])
pd.set_option('display.unicode.east_asian_width', True)  # 设置列名对齐
print("x_data add index: \n", x_data)

x_data['类别'] = y_data  # 新加一列，列标签为‘类别’，数据为y_data
print("x_data add a column: \n", x_data)

# 类型维度不确定时，建议用print函数打印出来确认效果

9 神经网络实现鸢尾花分类

# -*- coding: UTF-8 -*-
# 利用鸢尾花数据集，实现前向传播、反向传播，可视化loss曲线

# 导入所需模块
import tensorflow as tf
from sklearn import datasets
from matplotlib import pyplot as plt
import numpy as np

# 导入数据，分别为输入特征和标签    一共150行
x_data = datasets.load_iris().data
y_data = datasets.load_iris().target

# 随机打乱数据（因为原始数据是顺序的，顺序不打乱会影响准确率）
# seed: 随机数种子，是一个整数，当设置之后，每次生成的随机数都一样（为方便教学，以保每位同学结果一致）
np.random.seed(116)  # 使用相同的seed，保证输入特征和标签一一对应
np.random.shuffle(x_data)
np.random.seed(116)
np.random.shuffle(y_data)
tf.random.set_seed(116)

# 将打乱后的数据集分割为训练集和测试集，训练集为前120行，测试集为后30行
x_train = x_data[:-30]
y_train = y_data[:-30]
x_test = x_data[-30:]
y_test = y_data[-30:]

# 转换x的数据类型，否则后面矩阵相乘时会因数据类型不一致报错
x_train = tf.cast(x_train, tf.float32)
x_test = tf.cast(x_test, tf.float32)

# from_tensor_slices函数使输入特征和标签值一一对应。（把数据集分批次，每个批次batch组数据）
train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)
test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)

# 生成神经网络的参数，4个输入特征，故输入层为4个输入节点；因为3分类，故输出层为3个神经元
# 用tf.Variable()标记参数可训练
# 使用seed使每次生成的随机数相同（方便教学，使大家结果都一致，在现实使用时不写seed）
w1 = tf.Variable(tf.random.truncated_normal([4, 3], stddev=0.1, seed=1))
b1 = tf.Variable(tf.random.truncated_normal([3], stddev=0.1, seed=1))

lr = 0.1  # 学习率为0.1
train_loss_results = []  # 将每轮的loss记录在此列表中，为后续画loss曲线提供数据
test_acc = []  # 将每轮的acc记录在此列表中，为后续画acc曲线提供数据
epoch = 500  # 循环500轮
loss_all = 0  # 每轮分4个step，loss_all记录四个step生成的4个loss的和

# 训练部分
for epoch in range(epoch):  # 数据集级别的循环，每个epoch循环一次数据集
    for step, (x_train, y_train) in enumerate(train_db):  # batch级别的循环 ，每个step循环一个batch
        with tf.GradientTape() as tape:  # with结构记录梯度信息
            y = tf.matmul(x_train, w1) + b1  # 神经网络矩阵的乘加运算
            y = tf.nn.softmax(y)  # 使输出y符合概率分布（此操作后与独热码同量级，可相减求loss）
            y_ = tf.one_hot(y_train, depth=3)  # 将标签值转换为独热码格式，方便计算loss和accuracy
            # 譬如y_train=1 经过tf.one_hot转换后为[0, 1, 0]
            # 采用均方误差损失函数mse = mean(sum(y-out)^2)
            loss = tf.reduce_mean(tf.square(y_ - y))
            loss_all += loss.numpy()  # 将每个step计算出的loss累加，为后续求loss平均值提供数据，这样计算的loss更准确
        # 计算loss对各个参数的梯度
        grads = tape.gradient(loss, [w1, b1])

        # 实现梯度更新 w1 = w1 - lr * w1_grad    b = b - lr * b_grad
        w1.assign_sub(lr * grads[0])  # 参数w1自更新
        b1.assign_sub(lr * grads[1])  # 参数b自更新

    # 每个epoch，打印loss信息
    print("Epoch {}, loss: {}".format(epoch, loss_all/4))
    train_loss_results.append(loss_all / 4)  # 将4个step的loss求平均记录在此变量中
    loss_all = 0  # loss_all归零，为记录下一个epoch的loss做准备

    # 测试部分
    # total_correct为预测对的样本个数, total_number为测试的总样本数，将这两个变量都初始化为0
    total_correct, total_number = 0, 0
    for x_test, y_test in test_db:
        # 使用更新后的参数进行预测
        y = tf.matmul(x_test, w1) + b1
        y = tf.nn.softmax(y)
        pred = tf.argmax(y, axis=1)  # 返回y中最大值的索引，即预测的分类
        # 将pred转换为y_test的数据类型
        pred = tf.cast(pred, dtype=y_test.dtype)
        # 若分类正确，则correct=1，否则为0，将bool型的结果转换为int型
        correct = tf.cast(tf.equal(pred, y_test), dtype=tf.int32)
        # 将每个batch的correct数加起来
        correct = tf.reduce_sum(correct)
        # 将所有batch中的correct数加起来
        total_correct += int(correct)
        # total_number为测试的总样本数，也就是x_test的行数，shape[0]返回变量的行数
        total_number += x_test.shape[0]
    # 总的准确率等于total_correct/total_number
    acc = total_correct / total_number
    test_acc.append(acc)
    print("Test_acc:", acc)
    print("--------------------------")

# 绘制 loss 曲线
plt.title('Loss Function Curve')  # 图片标题
plt.xlabel('Epoch')  # x轴变量名称
plt.ylabel('Loss')  # y轴变量名称
# 逐点画出trian_loss_results值并连线，连线图标是Loss
plt.plot(train_loss_results, label="$Loss$")
plt.legend()  # 画出曲线图标
plt.show()  # 画出图像

# 绘制 Accuracy 曲线
plt.title('Acc Curve')  # 图片标题
plt.xlabel('Epoch')  # x轴变量名称
plt.ylabel('Acc')  # y轴变量名称
plt.plot(test_acc, label="$Accuracy$")  # 逐点画出test_acc值并连线，连线图标是Accuracy
plt.legend()
plt.show()

10 神经网络复杂度

11 指数衰减学习率

import tensorflow as tf

w = tf.Variable(tf.constant(5, dtype=tf.float32))

epoch = 40
LR_BASE = 0.2  # 最初学习率
LR_DECAY = 0.99  # 学习率衰减率
LR_STEP = 1  # 喂入多少轮BATCH_SIZE后，更新一次学习率

for epoch in range(epoch):  # for epoch 定义顶层循环，表示对数据集循环epoch次，此例数据集数据仅有1个w,初始化时候constant赋值为5，循环100次迭代。
    lr = LR_BASE * LR_DECAY ** (epoch / LR_STEP)
    with tf.GradientTape() as tape:  # with结构到grads框起了梯度的计算过程。
        loss = tf.square(w + 1)
    grads = tape.gradient(loss, w)  # .gradient函数告知谁对谁求导

    w.assign_sub(lr * grads)  # .assign_sub 对变量做自减 即：w -= lr*grads 即 w = w - lr*grads
    print("After %s epoch,w is %f,loss is %f,lr is %f" % (epoch, w.numpy(), loss, lr))

12 损失函数

预测酸奶销量$y$, $x_1$和$x_2$是影响酸奶销量的因素。

建模前，应预先采集的数据集有：每日$x_1$、$x_2$和销量$y$。(即已知答案，最佳情况是产量等于销量)

拟造数据集$X,Y_$:

$y_=x_1+x_2$

噪声: $-0.05\thicksim+0.05$

要求:

拟合可以预测销量的函数.

使用$MSE$损失函数

import tensorflow as tf
import numpy as np

SEED = 23455

rdm = np.random.RandomState(seed=SEED)  # 生成[0,1)之间的随机数
x = rdm.rand(32, 2)
y_ = [[x1 + x2 + (rdm.rand() / 10.0 - 0.05)] for (x1, x2) in x]  # 生成噪声[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)
x = tf.cast(x, dtype=tf.float32)

w1 = tf.Variable(tf.random.normal([2, 1], stddev=1, seed=1))

epoch = 15000
lr = 0.002

for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y = tf.matmul(x, w1)
        loss_mse = tf.reduce_mean(tf.square(y_ - y))

    grads = tape.gradient(loss_mse, w1)
    w1.assign_sub(lr * grads)

    if epoch % 500 == 0:
        print("After %d training steps,w1 is " % (epoch))
        print(w1.numpy(), "\n")
print("Final w1 is: ", w1.numpy())

自定义损失函数

import tensorflow as tf
import numpy as np

SEED = 23455
COST = 1
PROFIT = 99

rdm = np.random.RandomState(SEED)
x = rdm.rand(32, 2)
y_ = [[x1 + x2 + (rdm.rand() / 10.0 - 0.05)] for (x1, x2) in x]  # 生成噪声[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)
x = tf.cast(x, dtype=tf.float32)

w1 = tf.Variable(tf.random.normal([2, 1], stddev=1, seed=1))

epoch = 10000
lr = 0.002

for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y = tf.matmul(x, w1)
        loss = tf.reduce_sum(tf.where(tf.greater(y, y_), (y - y_) * COST, (y_ - y) * PROFIT))

    grads = tape.gradient(loss, w1)
    w1.assign_sub(lr * grads)

    if epoch % 500 == 0:
        print("After %d training steps,w1 is " % (epoch))
        print(w1.numpy(), "\n")
print("Final w1 is: ", w1.numpy())

# 自定义损失函数
# 酸奶成本1元， 酸奶利润99元
# 成本很低，利润很高，人们希望多预测些，生成模型系数大于1，往多了预测


# COST = 99
# PROFIT = 1     往小了预测

13 交叉熵损失函数

交叉熵是一个信息论中的概念，它原来是用来估算平均编码长度的。如果用在概率分布中，比如给定两个概率分布p和q，通过q来表示p的交叉熵如下图所示：

交叉熵刻画的是两个概率分布之间的距离，p代表正确答案，q代表的是预测值，交叉熵越小，两个概率的分布约接近，损失越低。对于机器学习中的多分类问题，通常用交叉熵做为损失函数。

import tensorflow as tf

loss_ce1 = tf.losses.categorical_crossentropy([1, 0], [0.6, 0.4])
loss_ce2 = tf.losses.categorical_crossentropy([1, 0], [0.8, 0.2])  # 模型的效果与标准答案更接近
print("loss_ce1:", loss_ce1)
print("loss_ce2:", loss_ce2)

# 交叉熵损失函数

# softmax与交叉熵损失函数的结合
import tensorflow as tf
import numpy as np

y_ = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0]])
y = np.array([[12, 3, 2], [3, 10, 1], [1, 2, 5], [4, 6.5, 1.2], [3, 6, 1]])
y_pro = tf.nn.softmax(y)
loss_ce1 = tf.losses.categorical_crossentropy(y_, y_pro)
loss_ce2 = tf.nn.softmax_cross_entropy_with_logits(y_, y)

print('分步计算的结果:\n', loss_ce1)
print('结合计算的结果:\n', loss_ce2)


# 输出的结果相同

14 神经网络优化器——引导神经网络更新参数

不同的优化器只是$m_t$和$V_t$不同

14.1 SGD (Stochastic Gradient Descent)

1 2	w1.assign_sub(lr * grads[0]) # 参数w1自更新 b1.assign_sub(lr * grads[1]) # 参数b自更新

14.2 SGDM

m_w, m_b = 0, 0
beta = 0.9

# sgd-momentum
m_w = beta * m_w + (1 - beta) * grads[0]
m_b = beta * m_b + (1 - beta) * grads[1]
w1.assign_sub(lr * m_w)
b1.assign_sub(lr * m_b)

14.3 Adagrad

v_w, v_b = 0, 0

# adagrad
v_w += tf.square(grads[0])
v_b += tf.square(grads[1])
w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))

14.4 RMSProp

v_w, v_b = 0, 0
beta = 0.9

# rmsprop
v_w = beta * v_w + (1 - beta) * tf.square(grads[0])
v_b = beta * v_b + (1 - beta) * tf.square(grads[1])
w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))

14.5 Adam

m_w, m_b = 0, 0
v_w, v_b = 0, 0
beta1, beta2 = 0.9, 0.999
delta_w, delta_b = 0, 0
global_step = 0

# adam
m_w = beta1 * m_w + (1 - beta1) * grads[0]
m_b = beta1 * m_b + (1 - beta1) * grads[1]
v_w = beta2 * v_w + (1 - beta2) * tf.square(grads[0])
v_b = beta2 * v_b + (1 - beta2) * tf.square(grads[1])

m_w_correction = m_w / (1 - tf.pow(beta1, int(global_step)))
m_b_correction = m_b / (1 - tf.pow(beta1, int(global_step)))
v_w_correction = v_w / (1 - tf.pow(beta2, int(global_step)))
v_b_correction = v_b / (1 - tf.pow(beta2, int(global_step)))

w1.assign_sub(lr * m_w_correction / tf.sqrt(v_w_correction))
b1.assign_sub(lr * m_b_correction / tf.sqrt(v_b_correction))

15 tf.keras搭建网络

用TensorFlow API：tf.keras搭建网络八步法:

import
train,test
model = tf.keras.models.Sequential 描述网络结构
model.compile 配置训练方法，优化器，损失函数，评测指标
model.fit 执行训练过程 batch 要迭代多少次
model.summary 打印网络结构和参数统计

# tf.keras.models.Sequential([网络结构])
# 网络结构举例
# 拉直层 tf.keras.layers.Flatten()

# 全连接层 tf.keras.layers.Dense(神经元个数,
#   activation='激活函数',
#   kernel_regularizer='哪种正则化'
# )
# activation(字符串给出)可选: relu  softmax  sigmoid  tanh
# kernel_regularizer可选: tf.keras.regularizers.l1()  tf.keras.regularizers.l2()

# 卷积层 tf.keras.layers.Conv2D(
#   filters = 卷积核个数,
#   kernel_size = 卷积核尺寸, # 正方形写核长整数，或 (核高h, 核宽w)
#   strides = 滑动步长, # 横纵向相同写步长整数，或(纵向步长h, 横向步长w)，默认1
#   padding = "same" or "valid"  # 全零填充是"same", 不使用是"valid"(默认)
#   activation = "relu" or "sigmoid" or "tanh" or "softmax", 等 # 如有BN此处不写
#   input_shape(高, 宽, 通道数) # 输入特征维度, 可省略
# )
# eg:
#   model = tf.keras.models.Squential([
#   Conv2D(6, 5, padding='valid', activation='sigmoid'),
#   MaxPool2D(2, 2),
#   Conv2D(6, (5, 5), padding='valid', activation='sigmoid'),
#   MaxPool2D(2, (2,2))
#   Conv2D(filters=6, kernel_size=(5, 5), padding='valid', activation='sigmoid'),
#   MaxPool2D(pool_size=(2, 2), strides=2),
#   Flatten(),
#   Dense(10, activation='softmax')
# ])

# 描述池化  tf.keras.layers.MaxPool2D    tf.keras.layers.AveragePooling2D
# tf.keras.layers.MaxPool2D(
#   pool_size=池化核尺寸, # 正方形写核长整数，或 (核高h, 核宽w)
#   strides=池化步长,  # 步长整数，或(纵向步长h, 横向步长w)，默认为pool_size
#   padding='valid' or 'same' # 全零填充是"same", 不使用是"valid"(默认)
# )

# 舍弃 Dropout
# tf.keras.layers.Dropout(舍弃的概率)

#  卷积层八股  C P A B D
# model = tf.keras.models.Sequential([
#   Conv2D(filters=6, kernel_size=(5, 5), padding='same'), 
#   BatchNormalization(),
#   Activation('relu'),
#   MaxPool2D(pool_size=(2,2), strides=2, padding='same'),
#   Dropout(0.2)
# ])

# model.compile(optimizer=优化器,
#   loss = 损失函数
#   metrics=["准确率"]
# )
# optimizer可选:
#   'sgd'  or   tf.keras.optimizers.SGD(lr=学习率,momentum=动量参数)
#   'adagrad' or    tf.keras.optimizers.Adagrad(lr=学习率)
#   'adadelta' or   tf.keras.optimizers.Adadelta(lr=学习率)
#   'adam' or   tf.keras.optimizers.Adam(lr=学习率, beta_1=0.9, beta_2=0.999)

# loss可选:
# 'mse' or  tf.keras.losses.MeanSquaredError()
# 'sparse_categorical_crossentropy' or  tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
# 如果神经网络输出前经过了概率分布 from_logits是False

# Metrics可选:
# 'accuracy': y_和y都是数值，如y_=[1] y=[1]
# 'categorical_accuracy':y_和y都是独热码（概率分布） 如y_=[0,1,0] y=[0.256,0.695,0.048]
# 'sparse_categorical_accuracy':y_是数值，y是独热码（概率分布） 如y_=[1] y=[0.256,0.695,0.048]

# model.fit(训练集的输入特征, 训练集的标签
#   batch_size= , epochs= ,
#   validation_data=(测试机的输入特征, 测试集的标签),   和下面一个二选一
#   validation_split=从训练集划分多少比例给测试集
#   validation_freq=多少次epoch测试一次
# )

16 tf.keras实现鸢尾花分类

import tensorflow as tf
from sklearn import datasets
import numpy as np

x_train = datasets.load_iris().data
y_train = datasets.load_iris().target

np.random.seed(116)
np.random.shuffle(x_train)
np.random.seed(116)
np.random.shuffle(y_train)
tf.random.set_seed(116)


model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(3, activation='softmax',
                          kernel_regularizer=tf.keras.regularizers.l2())
])

model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.1),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(
                  from_logits=False),
              metrics=['sparse_categorical_accuracy'])

model.fit(x_train, y_train, batch_size=32, epochs=500,
          validation_split=0.2, validation_freq=20)

model.summary()

17 使用tf.keras 类实现鸢尾花分类

import tensorflow as tf
#######################################
from tensorflow.keras.layers import Dense
from tensorflow.keras import Model
#######################################
from sklearn import datasets
import numpy as np

x_train = datasets.load_iris().data
y_train = datasets.load_iris().target

np.random.seed(116)
np.random.shuffle(x_train)
np.random.seed(116)
np.random.shuffle(y_train)
tf.random.set_seed(116)


class IrisModel(Model):
    def __init__(self):
        super(IrisModel, self).__init__()
        # 定义网络结构块
        self.d1 = Dense(3, activation='softmax',
                        kernel_regularizer=tf.keras.regularizers.l2())

    def call(self, x):
        # 调用网络结构块，实现前向传播
        y = self.d1(x)
        return y


model = IrisModel()

model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.1),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(
                  from_logits=False),
              metrics=['sparse_categorical_accuracy'])

model.fit(x_train, y_train, batch_size=32, epochs=500,
          validation_split=0.2, validation_freq=20)
model.summary()

18 MNIST数据集

import tensorflow as tf
from matplotlib import pyplot as plt

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# 可视化训练集输入特征的第一个元素
plt.imshow(x_train[0], cmap='gray')  # 绘制灰度图
plt.show()

# 打印出训练集输入特征的第一个元素
print("x_train[0]:\n", x_train[0])
# 打印出训练集标签的第一个元素
print("y_train[0]:\n", y_train[0])

# 打印出整个训练集输入特征形状
print("x_train.shape:\n", x_train.shape)
# 打印出整个训练集标签的形状
print("y_train.shape:\n", y_train.shape)
# 打印出整个测试集输入特征的形状
print("x_test.shape:\n", x_test.shape)
# 打印出整个测试集标签的形状
print("y_test.shape:\n", y_test.shape)

18.1 tf.keras 实现手写数字识别

import tensorflow as tf

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# [0, 255] 特征归一化 到[0, 1]
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),   # 先把输入特征拉直为一维数组
    tf.keras.layers.Dense(128, activation='relu'),  # 第一层网络有128个神经元
    tf.keras.layers.Dense(10, activation='softmax') 
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

model.fit(x_train, y_train, batch_size=32, epochs=5, validation_data=(x_test, y_test), validation_freq=1)
model.summary()

18.2 tf.keras 类实现手写数字识别

import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Model

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0


class MnistModel(Model):
    def __init__(self):
        super(MnistModel, self).__init__()
        self.flatten = Flatten()
        self.d1 = Dense(128, activation='relu')
        self.d2 = Dense(10, activation='softmax')

    def call(self, x):
        x = self.flatten(x)
        x = self.d1(x)
        y = self.d2(x)
        return y


model = MnistModel()

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

model.fit(x_train, y_train, batch_size=32, epochs=5, validation_data=(x_test, y_test), validation_freq=1)
model.summary()

19 FASHION数据集

19.2 tf.keras 实现FASHION数据集的神经网络训练

import tensorflow as tf

fashion = tf.keras.datasets.fashion_mnist
(x_train, y_train),(x_test, y_test) = fashion.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

model.fit(x_train, y_train, batch_size=32, epochs=5, validation_data=(x_test, y_test), validation_freq=1)
model.summary()

19.2 tf.keras 类实现FASHION数据集的神经网络训练

import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Model

fashion = tf.keras.datasets.fashion_mnist
(x_train, y_train),(x_test, y_test) = fashion.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0


class MnistModel(Model):
    def __init__(self):
        super(MnistModel, self).__init__()
        self.flatten = Flatten()
        self.d1 = Dense(128, activation='relu')
        self.d2 = Dense(10, activation='softmax')

    def call(self, x):
        x = self.flatten(x)
        x = self.d1(x)
        y = self.d2(x)
        return y


model = MnistModel()

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

model.fit(x_train, y_train, batch_size=32, epochs=5, validation_data=(x_test, y_test), validation_freq=1)
model.summary()

20 tf.keras搭建神经网络的延申

20.1 自制数据集，解决本领域应用

之前都是用的别人写好的

1 2	mnist = tf.keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data()

自己写个函数把load_data()替换掉

# 训练集图片路径
train_path = './mnist_image_label/mnist_train_jpg_60000/'
# 训练集标签文件
train_txt = './mnist_image_label/mnist_train_jpg_60000.txt'
# 训练集输入特征存储文件
x_train_savepath = './mnist_image_label/mnist_x_train.npy'
# 训练集标签存储文件
y_train_savepath = './mnist_image_label/mnist_y_train.npy'

# 测试集图片路径
test_path = './mnist_image_label/mnist_test_jpg_10000/'
# 测试集标签文件
test_txt = './mnist_image_label/mnist_test_jpg_10000.txt'
# 测试集输入特征存储文件
x_test_savepath = './mnist_image_label/mnist_x_test.npy'
# 测试集标签存储文件
y_test_savepath = './mnist_image_label/mnist_y_test.npy'

def generateds(path, txt):
    f = open(txt, 'r')  # 以只读形式打开txt文件
    contents = f.readlines()  # 读取文件中所有行
    f.close()  # 关闭txt文件
    x, y_ = [], []  # 建立空列表
    for content in contents:  # 逐行取出
        value = content.split()  # 以空格分开，图片路径为value[0] , 标签为value[1] , 存入列表
        img_path = path + value[0]  # 拼出图片路径和文件名
        img = Image.open(img_path)  # 读入图片
        img = np.array(img.convert('L'))  # 图片变为8位宽灰度值的np.array格式
        img = img / 255.  # 数据归一化 （实现预处理）
        x.append(img)  # 归一化后的数据，贴到列表x
        y_.append(value[1])  # 标签贴到列表y_
        print('loading : ' + content)  # 打印状态提示

    x = np.array(x)  # 变为np.array格式
    y_ = np.array(y_)  # 变为np.array格式
    y_ = y_.astype(np.int64)  # 变为64位整型
    return x, y_  # 返回输入特征x，返回标签y_
    
    
if os.path.exists(x_train_savepath) and os.path.exists(y_train_savepath) and os.path.exists(
        x_test_savepath) and os.path.exists(y_test_savepath):
    print('-------------Load Datasets-----------------')
    x_train_save = np.load(x_train_savepath)
    y_train = np.load(y_train_savepath)
    x_test_save = np.load(x_test_savepath)
    y_test = np.load(y_test_savepath)
    x_train = np.reshape(x_train_save, (len(x_train_save), 28, 28))
    x_test = np.reshape(x_test_save, (len(x_test_save), 28, 28))
else:
    print('-------------Generate Datasets-----------------')
    x_train, y_train = generateds(train_path, train_txt)
    x_test, y_test = generateds(test_path, test_txt)

    print('-------------Save Datasets-----------------')
    x_train_save = np.reshape(x_train, (len(x_train), -1))
    x_test_save = np.reshape(x_test, (len(x_test), -1))
    np.save(x_train_savepath, x_train_save)
    np.save(y_train_savepath, y_train)
    np.save(x_test_savepath, x_test_save)
    np.save(y_test_savepath, y_test)

20.2 数据增强，扩展数据集，提高泛化力

数据增强:Data Augumentation

1 2	model.fit(image_gen_train.flow(x_train, y_train, batch_size=32), epochs=5, validation_data=(x_test, y_test), validation_freq=1)

20.3 断点续训，实时保存最优模型，存取模型

使用回调函数callbacks=[cp_callback] 实现断点续训

checkpoint_save_path = "./checkpoint/mnist.ckpt"
if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True)

history = model.fit(x_train, y_train, batch_size=32, epochs=5, validation_data=(x_test, y_test), validation_freq=1,
                    callbacks=[cp_callback])

20.4 参数提取，把参数存入文本

# model.trainable_variables 返回模型中可训练的参数
print(model.trainable_variables)
file = open('./weights.txt', 'w')
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

20.5 acc/loss可视化，见证模型的优化过程


history = model.fit(x_train, y_train, batch_size=32, epochs=5, validation_data=(x_test, y_test), validation_freq=1,
                    callbacks=[cp_callback])
                    
                    
# 显示训练集和验证集的acc和loss曲线   .history 返回一个字典
acc = history.history['sparse_categorical_accuracy']
val_acc = history.history['val_sparse_categorical_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.show()

20.6 应用程序，给图识物

在这个阶段，往往要对现实中输入的图片进行“预处理”，譬如MINIST数据集中是黑底白字，则需要将现实中的输入处理成黑底白字再送入神经网络。不进行预处理则效果很差。

from PIL import Image
import numpy as np
import tensorflow as tf

model_save_path = './checkpoint/mnist.ckpt'

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')])

model.load_weights(model_save_path)

preNum = int(input("input the number of test pictures:"))

for i in range(preNum):
    image_path = input("the path of test picture:")
    img = Image.open(image_path)
    img = img.resize((28, 28), Image.ANTIALIAS)
    img_arr = np.array(img.convert('L'))

    for i in range(28):
        for j in range(28):
            if img_arr[i][j] < 200:
                img_arr[i][j] = 255
            else:
                img_arr[i][j] = 0

    img_arr = img_arr / 255.0
    x_predict = img_arr[tf.newaxis, ...]
    result = model.predict(x_predict)

    pred = tf.argmax(result, axis=1)

    print('\n')
    tf.print(pred)

21 卷积神经网络 CNN

卷积神经网络就是特征提取器: C B A P D

21.1 感受野

故两个3x3的卷积核的特征提取能力和一个5x5的卷积核的特征提取能力一样，哪个好？

经过卷积作用，输出多少个像素点？每个像素点需要多少次乘加运算？

21.2 填充

21.3 TF描述卷积层

21.4 Batch Normalization

为什么需要BN?

经过多层网络，数据有偏移

将数据重新拉回到N(0, 1),使进入激活函数的数据分布在激活函数的线性区，使得数据的微小变化更能体现到激活函数的输出，提升激活函数对输入数据的区分力

缩放因子和偏移因子是可训练参数，保证网络的非线性表达力

21.5 池化

21.6 Dropout

22 Cifar10 数据集

23 卷积神经网络搭建示例

import tensorflow as tf
import os
import numpy as np
from matplotlib import pyplot as plt
from tensorflow.keras.layers import Conv2D, BatchNormalization, Activation, MaxPool2D, Dropout, Flatten, Dense
from tensorflow.keras import Model

np.set_printoptions(threshold=np.inf)

cifar10 = tf.keras.datasets.cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0


class Baseline(Model):
    def __init__(self):
        super(Baseline, self).__init__()
        self.c1 = Conv2D(filters=6, kernel_size=(5, 5), padding='same')  # 卷积层
        self.b1 = BatchNormalization()  # BN层
        self.a1 = Activation('relu')  # 激活层
        self.p1 = MaxPool2D(pool_size=(2, 2), strides=2, padding='same')  # 池化层
        self.d1 = Dropout(0.2)  # dropout层

        self.flatten = Flatten()
        self.f1 = Dense(128, activation='relu')
        self.d2 = Dropout(0.2)
        self.f2 = Dense(10, activation='softmax')

    def call(self, x):
        x = self.c1(x)
        x = self.b1(x)
        x = self.a1(x)
        x = self.p1(x)
        x = self.d1(x)

        x = self.flatten(x)
        x = self.f1(x)
        x = self.d2(x)
        y = self.f2(x)
        return y


model = Baseline()

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

checkpoint_save_path = "./checkpoint/Baseline.ckpt"
if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True)

history = model.fit(x_train, y_train, batch_size=32, epochs=5, validation_data=(x_test, y_test), validation_freq=1,
                    callbacks=[cp_callback])
model.summary()

# print(model.trainable_variables)
file = open('./weights.txt', 'w')
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

###############################################    show   ###############################################

# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
val_acc = history.history['val_sparse_categorical_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.show()

24 经典卷积神经网络

24.1 LeNet

class LeNet5(Model):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.c1 = Conv2D(filters=6, kernel_size=(5, 5),
                         activation='sigmoid')
        self.p1 = MaxPool2D(pool_size=(2, 2), strides=2)

        self.c2 = Conv2D(filters=16, kernel_size=(5, 5),
                         activation='sigmoid')
        self.p2 = MaxPool2D(pool_size=(2, 2), strides=2)

        self.flatten = Flatten()
        self.f1 = Dense(120, activation='sigmoid')
        self.f2 = Dense(84, activation='sigmoid')
        self.f3 = Dense(10, activation='softmax')

    def call(self, x):
        x = self.c1(x)
        x = self.p1(x)

        x = self.c2(x)
        x = self.p2(x)

        x = self.flatten(x)
        x = self.f1(x)
        x = self.f2(x)
        y = self.f3(x)
        return y

24.2 AlexNet

class AlexNet8(Model):
    def __init__(self):
        super(AlexNet8, self).__init__()
        self.c1 = Conv2D(filters=96, kernel_size=(3, 3))
        self.b1 = BatchNormalization()
        self.a1 = Activation('relu')
        self.p1 = MaxPool2D(pool_size=(3, 3), strides=2)

        self.c2 = Conv2D(filters=256, kernel_size=(3, 3))
        self.b2 = BatchNormalization()
        self.a2 = Activation('relu')
        self.p2 = MaxPool2D(pool_size=(3, 3), strides=2)

        self.c3 = Conv2D(filters=384, kernel_size=(3, 3), padding='same',
                         activation='relu')
                         
        self.c4 = Conv2D(filters=384, kernel_size=(3, 3), padding='same',
                         activation='relu')
                         
        self.c5 = Conv2D(filters=256, kernel_size=(3, 3), padding='same',
                         activation='relu')
        self.p3 = MaxPool2D(pool_size=(3, 3), strides=2)

        self.flatten = Flatten()
        self.f1 = Dense(2048, activation='relu')
        self.d1 = Dropout(0.5)
        self.f2 = Dense(2048, activation='relu')
        self.d2 = Dropout(0.5)
        self.f3 = Dense(10, activation='softmax')

    def call(self, x):
        x = self.c1(x)
        x = self.b1(x)
        x = self.a1(x)
        x = self.p1(x)

        x = self.c2(x)
        x = self.b2(x)
        x = self.a2(x)
        x = self.p2(x)

        x = self.c3(x)

        x = self.c4(x)

        x = self.c5(x)
        x = self.p3(x)

        x = self.flatten(x)
        x = self.f1(x)
        x = self.d1(x)
        x = self.f2(x)
        x = self.d2(x)
        y = self.f3(x)
        return y

24.3 VGGNet

class VGG16(Model):
    def __init__(self):
        super(VGG16, self).__init__()
        self.c1 = Conv2D(filters=64, kernel_size=(3, 3), padding='same')  # 卷积层1
        self.b1 = BatchNormalization()  # BN层1
        self.a1 = Activation('relu')  # 激活层1
        self.c2 = Conv2D(filters=64, kernel_size=(3, 3), padding='same', )
        self.b2 = BatchNormalization()  # BN层1
        self.a2 = Activation('relu')  # 激活层1
        self.p1 = MaxPool2D(pool_size=(2, 2), strides=2, padding='same')
        self.d1 = Dropout(0.2)  # dropout层

        self.c3 = Conv2D(filters=128, kernel_size=(3, 3), padding='same')
        self.b3 = BatchNormalization()  # BN层1
        self.a3 = Activation('relu')  # 激活层1
        self.c4 = Conv2D(filters=128, kernel_size=(3, 3), padding='same')
        self.b4 = BatchNormalization()  # BN层1
        self.a4 = Activation('relu')  # 激活层1
        self.p2 = MaxPool2D(pool_size=(2, 2), strides=2, padding='same')
        self.d2 = Dropout(0.2)  # dropout层

        self.c5 = Conv2D(filters=256, kernel_size=(3, 3), padding='same')
        self.b5 = BatchNormalization()  # BN层1
        self.a5 = Activation('relu')  # 激活层1
        self.c6 = Conv2D(filters=256, kernel_size=(3, 3), padding='same')
        self.b6 = BatchNormalization()  # BN层1
        self.a6 = Activation('relu')  # 激活层1
        self.c7 = Conv2D(filters=256, kernel_size=(3, 3), padding='same')
        self.b7 = BatchNormalization()
        self.a7 = Activation('relu')
        self.p3 = MaxPool2D(pool_size=(2, 2), strides=2, padding='same')
        self.d3 = Dropout(0.2)

        self.c8 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b8 = BatchNormalization()  # BN层1
        self.a8 = Activation('relu')  # 激活层1
        self.c9 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b9 = BatchNormalization()  # BN层1
        self.a9 = Activation('relu')  # 激活层1
        self.c10 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b10 = BatchNormalization()
        self.a10 = Activation('relu')
        self.p4 = MaxPool2D(pool_size=(2, 2), strides=2, padding='same')
        self.d4 = Dropout(0.2)

        self.c11 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b11 = BatchNormalization()  # BN层1
        self.a11 = Activation('relu')  # 激活层1
        self.c12 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b12 = BatchNormalization()  # BN层1
        self.a12 = Activation('relu')  # 激活层1
        self.c13 = Conv2D(filters=512, kernel_size=(3, 3), padding='same')
        self.b13 = BatchNormalization()
        self.a13 = Activation('relu')
        self.p5 = MaxPool2D(pool_size=(2, 2), strides=2, padding='same')
        self.d5 = Dropout(0.2)

        self.flatten = Flatten()
        self.f1 = Dense(512, activation='relu')
        self.d6 = Dropout(0.2)
        self.f2 = Dense(512, activation='relu')
        self.d7 = Dropout(0.2)
        self.f3 = Dense(10, activation='softmax')

    def call(self, x):
        x = self.c1(x)
        x = self.b1(x)
        x = self.a1(x)
        x = self.c2(x)
        x = self.b2(x)
        x = self.a2(x)
        x = self.p1(x)
        x = self.d1(x)

        x = self.c3(x)
        x = self.b3(x)
        x = self.a3(x)
        x = self.c4(x)
        x = self.b4(x)
        x = self.a4(x)
        x = self.p2(x)
        x = self.d2(x)

        x = self.c5(x)
        x = self.b5(x)
        x = self.a5(x)
        x = self.c6(x)
        x = self.b6(x)
        x = self.a6(x)
        x = self.c7(x)
        x = self.b7(x)
        x = self.a7(x)
        x = self.p3(x)
        x = self.d3(x)

        x = self.c8(x)
        x = self.b8(x)
        x = self.a8(x)
        x = self.c9(x)
        x = self.b9(x)
        x = self.a9(x)
        x = self.c10(x)
        x = self.b10(x)
        x = self.a10(x)
        x = self.p4(x)
        x = self.d4(x)

        x = self.c11(x)
        x = self.b11(x)
        x = self.a11(x)
        x = self.c12(x)
        x = self.b12(x)
        x = self.a12(x)
        x = self.c13(x)
        x = self.b13(x)
        x = self.a13(x)
        x = self.p5(x)
        x = self.d5(x)

        x = self.flatten(x)
        x = self.f1(x)
        x = self.d6(x)
        x = self.f2(x)
        x = self.d7(x)
        y = self.f3(x)
        return y

24.4 InceptionNet

引入了Inception结构块
在同一层网络内使用不同尺度的卷积核，提升了模型的感知力
使用了批标准化，缓解了梯度消失

class ConvBNRelu(Model):
    def __init__(self, ch, kernelsz=3, strides=1, padding='same'):
        super(ConvBNRelu, self).__init__()
        self.model = tf.keras.models.Sequential([
            Conv2D(ch, kernelsz, strides=strides, padding=padding),
            BatchNormalization(),
            Activation('relu')
        ])

    def call(self, x):
        x = self.model(x, training=False) #在training=False时，BN通过整个训练集计算均值、方差去做批归一化，training=True时，通过当前batch的均值、方差去做批归一化。推理时 training=False效果好
        return x


class InceptionBlk(Model):
    def __init__(self, ch, strides=1):
        super(InceptionBlk, self).__init__()
        self.ch = ch
        self.strides = strides
        self.c1 = ConvBNRelu(ch, kernelsz=1, strides=strides)
        self.c2_1 = ConvBNRelu(ch, kernelsz=1, strides=strides)
        self.c2_2 = ConvBNRelu(ch, kernelsz=3, strides=1)
        self.c3_1 = ConvBNRelu(ch, kernelsz=1, strides=strides)
        self.c3_2 = ConvBNRelu(ch, kernelsz=5, strides=1)
        self.p4_1 = MaxPool2D(3, strides=1, padding='same')
        self.c4_2 = ConvBNRelu(ch, kernelsz=1, strides=strides)

    def call(self, x):
        x1 = self.c1(x)
        x2_1 = self.c2_1(x)
        x2_2 = self.c2_2(x2_1)
        x3_1 = self.c3_1(x)
        x3_2 = self.c3_2(x3_1)
        x4_1 = self.p4_1(x)
        x4_2 = self.c4_2(x4_1)
        # concat along axis=channel
        x = tf.concat([x1, x2_2, x3_2, x4_2], axis=3)
        return x


class Inception10(Model):
    def __init__(self, num_blocks, num_classes, init_ch=16, **kwargs):
        super(Inception10, self).__init__(**kwargs)
        self.in_channels = init_ch
        self.out_channels = init_ch
        self.num_blocks = num_blocks
        self.init_ch = init_ch
        self.c1 = ConvBNRelu(init_ch)
        self.blocks = tf.keras.models.Sequential()
        for block_id in range(num_blocks):
            for layer_id in range(2):
                if layer_id == 0:
                    block = InceptionBlk(self.out_channels, strides=2)
                else:
                    block = InceptionBlk(self.out_channels, strides=1)
                self.blocks.add(block)
            # enlarger out_channels per block
            self.out_channels *= 2
        self.p1 = GlobalAveragePooling2D()
        self.f1 = Dense(num_classes, activation='softmax')

    def call(self, x):
        x = self.c1(x)
        x = self.blocks(x)
        x = self.p1(x)
        y = self.f1(x)
        return y

24.5 ResNet

层间残差跳连，引入前方信息，缓解梯度消失

class ResnetBlock(Model):

    def __init__(self, filters, strides=1, residual_path=False):
        super(ResnetBlock, self).__init__()
        self.filters = filters
        self.strides = strides
        self.residual_path = residual_path

        self.c1 = Conv2D(filters, (3, 3), strides=strides, padding='same', use_bias=False)
        self.b1 = BatchNormalization()
        self.a1 = Activation('relu')

        self.c2 = Conv2D(filters, (3, 3), strides=1, padding='same', use_bias=False)
        self.b2 = BatchNormalization()

        # residual_path为True时，对输入进行下采样，即用1x1的卷积核做卷积操作，保证x能和F(x)维度相同，顺利相加
        if residual_path:
            self.down_c1 = Conv2D(filters, (1, 1), strides=strides, padding='same', use_bias=False)
            self.down_b1 = BatchNormalization()
        
        self.a2 = Activation('relu')

    def call(self, inputs):
        residual = inputs  # residual等于输入值本身，即residual=x
        # 将输入通过卷积、BN层、激活层，计算F(x)
        x = self.c1(inputs)
        x = self.b1(x)
        x = self.a1(x)

        x = self.c2(x)
        y = self.b2(x)

        if self.residual_path:
            residual = self.down_c1(inputs)
            residual = self.down_b1(residual)

        out = self.a2(y + residual)  # 最后输出的是两部分的和，即F(x)+x或F(x)+Wx,再过激活函数
        return out


class ResNet18(Model):

    def __init__(self, block_list, initial_filters=64):  # block_list表示每个block有几个卷积层
        super(ResNet18, self).__init__()
        self.num_blocks = len(block_list)  # 共有几个block
        self.block_list = block_list
        self.out_filters = initial_filters
        self.c1 = Conv2D(self.out_filters, (3, 3), strides=1, padding='same', use_bias=False)
        self.b1 = BatchNormalization()
        self.a1 = Activation('relu')
        self.blocks = tf.keras.models.Sequential()
        # 构建ResNet网络结构
        for block_id in range(len(block_list)):  # 第几个resnet block
            for layer_id in range(block_list[block_id]):  # 第几个卷积层

                if block_id != 0 and layer_id == 0:  # 对除第一个block以外的每个block的输入进行下采样
                    block = ResnetBlock(self.out_filters, strides=2, residual_path=True)
                else:
                    block = ResnetBlock(self.out_filters, residual_path=False)
                self.blocks.add(block)  # 将构建好的block加入resnet
            self.out_filters *= 2  # 下一个block的卷积核数是上一个block的2倍
        self.p1 = tf.keras.layers.GlobalAveragePooling2D()
        self.f1 = tf.keras.layers.Dense(10, activation='softmax', kernel_regularizer=tf.keras.regularizers.l2())

    def call(self, inputs):
        x = self.c1(inputs)
        x = self.b1(x)
        x = self.a1(x)
        x = self.blocks(x)
        x = self.p1(x)
        y = self.f1(x)
        return y