Pytorch笔记 之 torch.nn 模块简介

Pytorch笔记 之 torch.nn 模块简介

2023-12-21 00:12| 来源: 网络整理| 查看: 265

文章目录 torch.nnbeforenn.functionalnn.Module & nn.Parameternn.Lineartorch.optimDataLoaderAdd Validationnn.SequentialUsing GPU


import torch.nn as nn

参考翻译 What is torch.nn really? 主要是对 PyTorch 框架的各模块进行简要介绍 一定程度上是 PyTorch 的入门笔记 假设已经对神经网络相关基础知识有了一定了解 (或实现过机器学习梯度下降相关代码)


PyTorch 使用 torch.tensor,需要将数据进行转换

import torch x_train, y_train, x_valid, y_valid = map( torch.tensor, (x_train, y_train, x_valid, y_valid) ) x_train.shape x_train.min() x_train.max()

map(function, iterable, …) return iterable


import torch.nn.functional as F 包含 torch.nn 库中所有函数 同时包含大量 loss 和 activation function

import torch.nn.functional as F loss_func = F.cross_entropy loss = loss_func(model(x), y) loss.backward()

其中 loss.backward() 更新模型的梯度,包括 weights 和 bias

PyTorch 中,nn 与 nn.functional 有什么区别? 是函数接口,nn.Xxx 是 的类封装,并且nn.Xxx 都继承于一个共同祖先 nn.Module nn.Xxx 除了具有 功能之外,内部附带 nn.Module 相关的属性和方法,eg. train(), eval(), load_state_dict, state_dict


nn.Xxx ,实例化 -> 函数调用 -> 传入数据

inputs = torch.rand(64, 3, 28, 28) conv = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1) out = conv(inputs) 传入数据 和 weight、bias 等其他参数

weight = torch.rand(64, 3, 3, 3) bias = torch.rand(64) out = nn.functional.conv2d(inputs, weight, bias, padding=1) 能否和 nn.Sequential 结合使用

nn.Xxx 继承于 nn.Module,能够很好的与 nn.Sequential 结合使用

fm_layer = nn.Sequential( nn.Conv2d(3, 64, kernel_size=3, padding=1), nn.BatchNorm2d(num_features=64), nn.ReLU(), nn.MaxPool2d(kernel_size=2), nn.Droput(0.2) )

而 无法与 nn.Sequential 结合使用

是否需要自己定义和管理 weight 和 bias 等参数 nn.Xxx 不需要自己定义和管理weight class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=16, padding=0) self.relu1 = nn.ReLU() self.maxpool1 = nn.MaxPool2d(kernel_size=2) self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, padding=0) self.relu2 = nn.ReLU() self.maxpool2 = nn.MaxPool2d(kernel_size=2) self.linear1 = nn.Linear(4 * 4 * 32, 10) def forward(self, x): x = x.view(x.size(0), -1) out = self.maxpool1(self.relu1(self.cnn1(x))) out = self.maxpool2(self.relu2(self.cnn2(x))) out = self.linear1(out.view(x.size(0), -1)) return out 需要自己定义 weight,每次调用的时候都需要手动传入 weight,不利于代码复用

class CNN(nn.Module): """docstring for CNN""" def __init__(self): super(CNN, self).__init__() self.cnn1_weight = nn.Parameter(torch.rand(16, 1, 5, 5)) self.bias1_weight = nn.Parameter((torch.rand(16))) self.cnn2_weight = nn.Parameter(torch.rand(32, 16, 5, 5)) self.bias2_weight = nn.Parameter(torch.rand(32)) self.linear1_weight = nn.Parameter(torch.rand(4 * 4 * 32, 10)) self.bias3_weight = nn.Parameter(torch.rand(10)) def forward(self, x): x = x.view(x.size(0), -1) out = F.conv2d(x, self.cnn1_weight, self.bias1_weight) out = F.relu(out) out = F.max_pool2d(out) out = F.conv2d(out, self.cnn2_weight, self.bias2_weight) out = F.relu(out) out = F.max_pool2d(out) out = F.linear(out, self.linear1_weight, self.bias3_weight)

上述两中定义方式得到的 CNN 功能都是相同的 PyTorch 官方推荐:

具有学习参数的(eg. conv2d, linear, batch_norm) 采用 nn.Xxx没有学习参数的(eg. maxpool, loss_func, activation func) 等根据个人选择使用 或 nn.Xxx最后,关于 dropout,强烈推荐使用 nn.Xxx 方式,因为一般情况下只有训练阶段才进行 dropout,在 eval 阶段不会进行 dropout。使用nn.Xxx 方法定义 dropout,在调用 model.eval() 之后,model 中所有的 dropout layer 都关闭,但以 nn.functional.dropout 方式定义 dropout,在调用 model.eval() 之后并不能关闭 dropout。需要使用 F.dropout(x,。 nn.Module & nn.Parameter

继承 nn.Module,构造一个保存 weights,bias 和具有前向传播方法(forward step)的类 nn.Module 有大量属性和方法(eg. .parameters() 和 .zero_grad())

nn.Linear torch.optim

torch.optim 有各种优化算法,可以使用优化器的 step 来进行前向传播,而不用人工的更新所有参数

opt.step() opt.zero_grad()

optim.zero_grad() 将所有的梯度置为 0,需要在下个批次计算梯度之前调用


TensorDataset 是 Dataset 的 tensor 包装

from import TensorDataset train_ds = TensorDataset(x_train, y_train)

DataLoader 用于管理 batches,便于迭代

from import DataLoader train_dl = DataLoader(train_ds, batch_size=32)


model, opt = get_model() for epoch in range(epochs): for xb, yb in train_dl: pred = model(xb) loss = loss_func(pred, yb) loss.backward() opt.step() opt.zero_grad() print(loss_func(model(xb), yb)) Add Validation

在训练过程中计算并打印每个 epoch 的 validation loss

model, opt = get_model() for epoch in range(epochs): # 训练前 model.train() for xb, yb in train_dl: pred = model(xb) loss = loss_func(pred, yb) loss.backward() opt.step() opt.zero_grad() # 训练后,验证前 # 确保 nn.BatchNorm2d 和 nn.Dropout 采取适当的行为(关闭) model.eval() with torch.no_grad(): valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl) print(epoch, valid_loss / len(valid_dl))

为了简化代码,增强可读性,可以构建 fit() 和 get_data() 函数

def get_data(train_ds, valid_ds, bs): return ( DataLoader(train_ds, batch_size=bs, shuffle=True), DataLoader(valid_ds, batch_size=bs *2) ) def loss_batch(model, loss_func, xb, yb, opt=None): loss = loss_func(model(xb), yb) if opt is not None: loss.backward() opt.step() opt.zero_grad() return loss.item(), len(xb) import numpy as np def fit(epochs, model, loss_func, opt, train_dl, valid_dl): for epoch in range(epochs): model.train() # 遍历 batch 中的每个样本 for xb, yb in train_dl: loss_batch(model, loss_func, xb, yb, opt) model.eval() with torch.no_grad(): losses, nums = zip(*[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]) val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums) print(epoch, val_loss)


train_dl, valid_dl = get_data(train_ds, valid_ds, bs) model, opt = get_model() fit(epochs, model, loss_func, opt, train_dl, valid_dl) nn.Sequential

参考 Keras 中的 Sequential Model

model = nn.Sequential( Lambda(preprocess), nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1), nn.ReLU(), nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1), nn.ReLU(), nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1), nn.ReLU(), nn.AvgPool2d(4), Lambda(lambda x: x.view(x.size(0), -1)) )

其中,可以 PyTorch 没有提供 view layer,需要构造(Sequential中的Lambda)

class Lambda(nn.Module): def __init__(self, func): super(Lambda, self).__init__() self.func = func def forward(self, x): return self.func(x) def preprocess(x): return x.view(-1, 1, 28, 28) Using GPU

GPU 和 CPU 训练的模型的加载不一样,参数需要设置

首先,判断 GPU 是否可以使用


使用指定 GPU

dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

将数据(batch)移到GPU(使用 .to(torch.device("cuda")) 或 .cuda()) # xb.cuda() # yb.cuda()

最后,需要将模型移到 GPU # model.cuda()






