[深度学习]

2023-08-21 11:19| 来源: 网络整理| 查看: 265

前言

本文主要分为两部分：

第一部分大致的介绍了VGG原理第二部分详细的介绍了如何用pytorch实现VGG模型训练自己的数据集实现图像分类

想只看代码部分的同学，可以直接看第二部分

内容一：VGG原理简介 1.VGG主要工作

2014年的论文，主要工作是证明了增加网络的深度能够在一定程度上影响网络最终的性能。VGG有两种结构，VGG16和VGG19，两者并没有本质上的区别，只是网络深度不一样。

论文地址：VGG论文

2.VGG主要改进

前一代的经典网络为AlexNet，VGG相对于AlexNet最大的改进就是采用连续的几个3x3的卷积核代替AlexNet中的较大卷积核（11x11，7x7，5x5）。对于给定的感受野，采用堆积的小卷积核是优于采用大的卷积核，因为多层非线性层可以增加网络深度来保证学习更复杂的模式，而且网络参数量更小。

简单来说，在VGG中，使用了3个3x3卷积核来代替7x7卷积核，使用了2个3x3卷积核来代替5*5卷积核，这样做的主要目的是在保证具有相同感知野的条件下，提升了网络的深度，在一定程度上提升了神经网络的效果。

比如，3个步长为1的3x3卷积核的一层层叠加作用可看成一个大小为7的感受野（其实就表示3个3x3连续卷积相当于一个7x7卷积），其参数总量为 3x(9xC^2) ，如果直接使用7x7卷积核，其参数总量为 49xC^2 ，这里 C 指的是输入和输出的通道数。很明显，参数量减小了；而且3x3卷积核有利于更好地保持图像性质。

3. 两个3 * 3卷积核如何替代一个5 * 5卷积核

在这里插入图片描述如上图所示，对于最下面的特征图(5*5)来说：

一个 5 × 5卷积核卷积后，得到一个特征点使用两个3 × 3卷积核卷积分后，同样得到了一个特征点。

可以看到的是，感受野相同都是5 * 5,但是两个3 * 3卷积核参数量更少，且小卷积核卷积整合了多个非线性激活层，代替单一非线性激活层，增加了判别能力。

同理使用，3个3x3卷积核来代替7x7卷积核。

4.VGG网络结构

在这里插入图片描述 VGG16包含16层，VGG19包含19层。一系列的VGG在最后三层的全连接层上完全一样，整体结构上都包含5组卷积层，卷积层之后跟一个MaxPool。所不同的是5组卷积层中包含的级联的卷积层越来越多。

步骤理解下面算一下每一层的像素值计算：输入：224 * 224 * 3

conv3-64(卷积核的数量)----------------------------------------kernel size:3 stride:1 pad:1 像素：（224 - 3 + 2 * 1） / 1 + 1=224 ---------------------输出尺寸：224 * 224 * 64 参数：（3 * 3 * 3）* 64 =1728conv3-64-------------------------------------------------------------kernel size:3 stride:1 pad:1 像素：（224 - 3 + 1 * 2） / 1 + 1=224 ---------------------输出尺寸：224 * 224 * 64 参数：（3 * 3 * 64） * 64 =36864pool2 ----------------------------------------------------------------kernel size:2 stride:2 pad:0 像素：（224 - 2） / 2 = 112 ----------------------------------输出尺寸：112 * 112 * 64 参数： 0conv3-128----------------------------------------------------------kernel size:3 stride:1 pad:1 像素：（112 - 3 + 2 * 1） / 1 + 1 = 112 -------------------输出尺寸：224 * 224 * 64112 * 112 * 128 参数：（3 * 3 * 64） * 128 =73728conv3-128------------------------------------------------------------kernel size:3 stride:1 pad:1 像素：（112 - 3 + 2 * 1） / 1 + 1 = 112 ---------------------输出尺寸：224 * 224 * 64112 * 112 * 128 参数：（3 * 3 * 128） * 128 =147456pool2--------------------------------------------------------------------kernel size:2 stride:2 pad:0 像素：（112 - 2） / 2 + 1=56 ----------------------------------输出尺寸：224 * 224 * 6456 * 56 * 128 参数：0conv3-256-------------------------------------------------------------kernel size:3 stride:1 pad:1 像素：（56 - 3 + 2 * 1）/1+1=56 -----------------------------输出尺寸：224 * 224 * 6456 * 56 * 256 参数：（33128）*256=294912conv3-256-------------------------------------------------------------kernel size:3 stride:1 pad:1 像素：（56 - 3 + 2 * 1） / 1 + 1=56 --------------------------输出尺寸：56 * 56 * 256 参数：（3 * 3 * 256） * 256=589824conv3-256------------------------------------------------------------- kernel size:3 stride:1 pad:1 像素：（56 - 3 + 2 * 1） / 1 + 1=56 -----------------------------输出尺寸：56 * 56 * 256 参数：（33256）*256=589824pool2---------------------------------------------------------------------kernel size:2 stride:2 pad:0 像素：（56 - 2） / 2 + 1 = 28-------------------------------------输出尺寸： 28 * 28 * 256 参数：0conv3-512-------------------------------------------------------------kernel size:3 stride:1 pad:1 像素：（28 - 3 + 2 * 1） / 1 + 1=28 ----------------------------输出尺寸：28 * 28 * 512 参数：（3 * 3 * 256） * 512 = 1179648conv3-512-------------------------------------------------------------kernel size:3 stride:1 pad:1 像素：（28 - 3 + 2 * 1） / 1 + 1=28 ----------------------------输出尺寸：28 * 28 * 512 参数：（3 * 3 * 512） * 512 = 2359296conv3-512-------------------------------------------------------------kernel size:3 stride:1 pad:1 像素：（28 - 3 + 2 * 1） / 1 + 1=28 ----------------------------输出尺寸：28 * 28 * 512 参数：（3 * 3 * 512） * 512 = 2359296pool2------------------------------------------------------------------- kernel size:2 stride:2 pad:0 像素：（28 - 2） / 2 + 1=14 -------------------------------------输出尺寸：14 * 14 * 512 参数： 0conv3-512-------------------------------------------------------------kernel size:3 stride:1 pad:1 像素：（14 - 3 + 2 * 1） / 1 + 1=14 ---------------------------输出尺寸：14 * 14 * 512 参数：（3 * 3 * 512） * 512 = 2359296conv3-512-------------------------------------------------------------kernel size:3 stride:1 pad:1 像素：（14 - 3 + 2 * 1） / 1 + 1=14 ---------------------------输出尺寸：14 * 14 * 512 参数：（3 * 3 * 512） * 512 = 2359296conv3-512-------------------------------------------------------------kernel size:3 stride:1 pad:1 像素：（14 - 3 + 2 * 1） / 1 + 1=14 ---------------------------输出尺寸：14 * 14 * 512 参数：（3 * 3 * 512） * 512 = 2359296pool2---------------------------------------------------------------------kernel size:2 stride:2 pad:0 像素：（14 - 2） / 2 + 1=7 ----------------------------------------输出尺寸：7 * 7 * 512 参数：0FC------------------------------------------------------------------------ 4096 neurons 像素：1 * 1 * 4096 参数：7 * 7 * 512 * 4096 = 102760448FC------------------------------------------------------------------------ 4096 neurons 像素：1 * 1 * 4096 参数：4096 * 4096 = 16777216FC------------------------------------------------------------------------ 1000 neurons 像素：1 * 1 * 1000 参数：4096 * 1000=4096000 内容二：pytorch实现VGG16训练自己的数据集实现图像分类

1.数据集加载部分

getdata.py 该部分主要定义了一个加载数据集的类。

import os import glob from torch.utils.data import Dataset, DataLoader from torchvision.transforms import transforms from PIL import Image import torch import csv import random class GetData(Dataset): def __init__(self, root, resize, mode): super(GetData, self).__init__() self.root = root self.resize = resize self.name2label = {'empty': 0, 'occupied': 1} # "类别名称": 编号,对自己的类别进行定义 for name in sorted(os.listdir(os.path.join(root))): # 判断是否为一个目录 if not os.path.isdir(os.path.join(root, name)): continue self.name2label[name] = self.name2label.get(name) # 将类别名称转换为对应编号 # image, label 划分 self.images, self.labels = self.load_csv('images.csv') # csv文件存在直接读取 if mode == 'train': # 对csv中的数据集80%划分为训练集 self.images = self.images[:int(0.8 * len(self.images))] self.labels = self.labels[:int(0.8 * len(self.labels))] else: # 剩余20%划分为测试集 self.images = self.images[int(0.8 * len(self.images)):] self.labels = self.labels[int(0.8 * len(self.labels)):] def __len__(self): return len(self.images) def __getitem__(self, idx): img, label = self.images[idx], self.labels[idx] # 这里首先做一个数据预处理，因为VGG16是要求输入224*224*3的 tf = transforms.Compose([ # 常用的数据变换器 lambda x:Image.open(x).convert('RGB'), # string path= > image data # 这里开始读取了数据的内容了 transforms.Resize( # 数据预处理部分 (int(self.resize * 1.25), int(self.resize * 1.25))), transforms.RandomRotation(15), transforms.CenterCrop(self.resize), # 防止旋转后边界出现黑框部分 transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) img = tf(img) label = torch.tensor(label) # 转化tensor return img, label # 返回当前的数据内容和标签 def load_csv(self, filename): # 这个函数主要将data中不同class的图片读入csv文件中并打上对应的label，就是在做数据集处理 # 没有csv文件的话，新建一个csv文件 if not os.path.exists(os.path.join(self.root, filename)): images = [] for name in self.name2label.keys(): # 将文件夹内所有形式的图片读入images列表 images += glob.glob(os.path.join(self.root, name, '*.png')) images += glob.glob(os.path.join(self.root, name, '*.jpg')) images += glob.glob(os.path.join(self.root, name, '*.jpeg')) random.shuffle(images) # 随机打乱 with open(os.path.join(self.root, filename), mode='w', newline='') as f: # 新建csv文件，进行数据写入 writer = csv.writer(f) for img in images: # './data/class1/spot429.jpg' name = img.split(os.sep)[-2] # 截取出class名称 label = self.name2label[name] # 根据种类写入标签 writer.writerow([img, label]) # 保存csv文件 # 如果有csv文件的话直接读取 images, labels = [], [] with open(os.path.join(self.root, filename)) as f: reader = csv.reader(f) for row in reader: img, label = row label = int(label) images.append(img) labels.append(label) assert len(images) == len(labels) return images, labels 2.训练部分

train.py 该部分采用迁移学习，加载VGG16 的预训练模型，并冻结了前面提取特征的神经网络参数，只对最后的分类器参数进行微调，数据来源通过1中写好的类进行调用加载。

import torch import torch.nn as nn from torchvision import models from torchsummary import summary from getdata import GetData from torch.utils.data import DataLoader import matplotlib.pyplot as plt # 冻结网络层的参数 def set_parameter_requires_grad(model, feature_extracting): if feature_extracting: for param in model.parameters(): param.required_grad = False # VGG网络结构定义 class VGG16net(nn.Module): def __init__(self, feature_extract = True, num_class = 2): super(VGG16net, self).__init__() model = models.vgg16(pretrained = True) self.features = model.features set_parameter_requires_grad(self.features, feature_extract) self.avgpool = model.avgpool self.classifier = nn.Sequential( # nn.Linear(512 * 7 * 7, 512), # nn.ReLU(True), # nn.Dropout(), # nn.Linear(512, 128), # nn.ReLU(True), # nn.Dropout(), # nn.Linear(128, num_class), # nn.Linear(512 * 7 * 7, 1024), nn.ReLU(), nn.Linear(1024, 1024), nn.ReLU(), nn.Linear(1024, num_class) ) def forward(self, x): x = self.features(x) x = self.avgpool(x) # 改变tensor形状，拉伸成一维 x = x.view(x.size(0), -1) out = self.classifier(x) return out # 画图函数 def plt_image(x_input, y_input, title, xlabel, ylabel): plt.plot(x_input, y_input, linewidth=2) plt.title(title) plt.xlabel(xlabel) plt.ylabel(ylabel) plt.show() def main(): # GPU选择 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = VGG16net().to(device) # 关键参数设置 learning_rate=0.001 num_epochs = 1 train_batch_size = 16 test_batch_size = 16 # 优化器设置 criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.classifier.parameters(), lr=learning_rate) # 加载数据集 train_dataset = GetData('./data', 224, 'train') test_dataset = GetData('./data', 224, 'test') train_loader = DataLoader(train_dataset, batch_size=train_batch_size, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=test_batch_size, shuffle=True) # 画图需要的参数 epochs = [] evaloss = [] acc = [] # 打印模型结构 backbone = summary(model, (3, 224, 224)) for epoch in range(num_epochs): epochs.append(epoch+1) # train过程 total_step = len(train_loader) train_epoch_loss = 0 for i, (images, labels) in enumerate(train_loader): # 梯度清零 optimizer.zero_grad() # 加载标签与图片 images = images.to(device) labels = labels.to(device) # 前向计算 output = model(images) loss = criterion(output, labels) # 反向传播与优化 loss.backward() optimizer.step() # 累加每代中所有步数的loss train_epoch_loss += loss.item() # 打印部分结果 if (i + 1) % 2 == 0: print('Epoch [{}/{}], Step [{}/{}], Loss: {:.5f}' .format(epoch + 1, num_epochs, i + 1, total_step, loss.item())) if (i + 1) == total_step: epoch_eva_loss = train_epoch_loss / total_step evaloss.append(epoch_eva_loss) print('Epoch_eva loss is : {:.5f}'.format(epoch_eva_loss)) # test过程 model.eval() with torch.no_grad(): correct = 0 total = 0 for images, labels in test_loader: images = images.to(device) labels = labels.to(device) output = model(images) _, predicted = torch.max(output.data, 1) print(predicted) total += labels.size(0) correct += (predicted == labels).sum().item() acc.append(100*(correct/total)) print('Test Accuracy {} %'.format(100*(correct/total))) # print(model.state_dict()) torch.save(obj = model.state_dict(), f='model/model.pth') # 训练结束后绘图 plt_image(epochs, evaloss, 'loss', 'Epochs', 'EvaLoss') plt_image(epochs, acc, 'ACC', 'Epochs', 'EvaAcc' ) if __name__ == "__main__" : main() 3.推理部分

该部分用于载入之前训练好的模型权值对不在数据集中的图片进行分类预测输出结果

import torch from PIL import Image from torchvision import transforms from train import VGG16net import time name2label = {'empty': 0, 'occupied': 1} device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') resize = 224 tf = transforms.Compose([ # 常用的数据变换器 lambda x:Image.open(x).convert('RGB'), # string path= > image data # 这里开始读取了数据的内容了 transforms.Resize( # 数据预处理部分 (int(resize * 1.25), int(resize * 1.25))), transforms.RandomRotation(15), transforms.CenterCrop(resize), # 防止旋转后边界出现黑框部分 transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) def prediect(img_path): model = VGG16net().to(device) model.load_state_dict(torch.load('model/model.pth')) # net = net.to(device) with torch.no_grad(): img = tf(img_path).unsqueeze(0) img_ = img.to(device) start = time.time() outputs = model(img_) _, predicted = torch.max(outputs, 1) predicted_number = predicted[0].item() end = time.time() print('this picture maybe :',list(name2label.keys())[list(name2label.values()).index(predicted_number)]) print('FPS:', 1/(end-start)) if __name__ == '__main__': prediect('./pre/empty/spot524.jpg')

【本文地址】

公司简介

联系我们