在 PyTorch 中使用 LSTM 生成文本

您所在的位置：网站首页 › 相似句子生成 › 在 PyTorch 中使用 LSTM 生成文本

在 PyTorch 中使用 LSTM 生成文本

2024-07-16 14:01| 来源: 网络整理| 查看: 265

✍面向读者：软件工程师、架构师、IT人士、设计人员等

✍所属专栏：人工智能工具实践

概述

什么是生成模型

获取文本数据

用于预测下一个字符的小型 LSTM 网络

使用 LSTM 模型生成文本

使用更大的 LSTM 网络

使用 GPU 加快训练速度

进一步阅读

文章

文件

概括

循环神经网络可用于时间序列预测。其中创建了回归神经网络。它还可以用作生成模型，通常是分类神经网络模型。生成模型是从数据中学习某种模式，这样当它出现一些提示时，它可以创建与学习模式相同风格的完整输出。

在这篇文章中，您将了解如何在 PyTorch 中使用 LSTM 递归神经网络构建文本生成模型。看完这篇文章，你会知道：

在哪里下载可用于训练文本生成模型的免费文本语料库如何将文本序列问题构建为循环神经网络生成模型如何开发 LSTM 来针对给定问题生成合理的文本序列

概述

这篇文章分为六个部分；他们是：

什么是生成模型获取文本数据用于预测下一个字符的小型 LSTM 网络使用 LSTM 模型生成文本使用更大的 LSTM 网络使用 GPU 加快训练速度

什么是生成模型

生成模型确实是另一种能够创造新事物的机器学习模型。生成对抗网络（GAN）是一种独特的网络。使用注意力机制的 Transformer 模型也被发现对于生成文本段落很有用。

它只是一个机器学习模型，因为该模型是用现有数据进行训练的，因此它可以从中学到一些东西。取决于如何训练，它们的工作方式可能会有很大不同。在这篇文章中，创建了一个基于角色的生成模型。它的意思是训练一个模型，该模型将一系列字符（字母和标点符号）作为输入，并将紧邻的下一个字符作为目标。只要它可以根据前面的字符预测下一个字符是什么，您就可以循环运行模型来生成一长段文本。

该模型可能是最简单的一种。然而，人类语言是复杂的。您不应该期望它可以产生非常高质量的输出。即便如此，您仍然需要大量数据并训练模型很长时间才能看到合理的结果。

获取文本数据

获得高质量的数据对于成功的生成模型非常重要。幸运的是，许多经典文本不再受版权保护。这意味着您可以免费下载这些书籍的所有文本并在实验中使用它们，例如创建生成模型。也许获得不再受版权保护的免费书籍的最佳地点是古腾堡计划。

在这篇文章中，您将使用童年时最喜欢的一本书籍作为数据集，刘易斯·卡罗尔 (Lewis Carroll) 的《爱丽丝梦游仙境》：

Alice's Adventures in Wonderland by Lewis Carroll | Project Gutenberg

您的模型将学习字符之间的依赖关系以及序列中字符的条件概率，以便您可以反过来生成全新的原始字符序列。这篇文章很有趣，建议与古腾堡计划的其他书籍一起重复这些实验。这些实验不仅限于文本；您还可以尝试其他 ASCII 数据，例如计算机源代码、LATEX、HTML 或 Markdown 中的标记文档等。

您可以免费下载本书的 ASCII 格式（纯文本 UTF-8）完整文本，并将其放在您的工作目录中，文件名为wonderland.txt. 现在，您需要准备用于建模的数据集。古腾堡计划为每本书添加了标准的页眉和页脚，这不是原文的一部分。在文本编辑器中打开文件并删除页眉和页脚。标题很明显，并以文本结尾：

1 *** START OF THIS PROJECT GUTENBERG EBOOK ALICE'S ADVENTURES IN WONDERLAND ***

页脚是以下文本行之后的所有文本：

1 THE END

您应该留下一个包含大约 3,400 行文本的文本文件。

用于预测下一个字符的小型 LSTM 网络

首先，您需要对数据进行一些预处理，然后才能构建模型。神经网络模型只能处理数字，不能处理文本。因此，您需要将字符转换为数字。为了使问题更简单，您还希望将所有大写字母转换为小写。

在下面，您打开文本文件，将所有字母转换为小写，然后创建一个 Python 字典char_to_int以将字符映射为不同的整数。例如，书中唯一排序的小写字符列表如下：

1 2 3 ['\n', '\r', ' ', '!', '"', "'", '(', ')', '*', ',', '-', '.', ':', ';', '?', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '\xbb', '\xbf', '\xef']

由于这个问题是基于字符的，因此“词汇”是文本中曾经使用过的不同字符。

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import numpy as np # load ascii text and covert to lowercase filename = "wonderland.txt" raw_text = open(filename, 'r', encoding='utf-8').read() raw_text = raw_text.lower() # create mapping of unique chars to integers chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars)) # summarize the loaded data n_chars = len(raw_text) n_vocab = len(chars) print("Total Characters: ", n_chars) print("Total Vocab: ", n_vocab)

这应该打印：

1 2 Total Characters: 144574 Total Vocab: 50

您可以看到这本书有不到 150,000 个字符，当转换为小写时，词汇表中只有 50 个不同的字符可供网络学习 - 比字母表中的 26 个字符多得多。

接下来，您需要将文本分为输入和目标。这里使用 100 个字符的窗口。也就是说，以字符 1 到 100 作为输入，您的模型将预测字符 101。如果使用 5 的窗口，单词“chapter”将变成两个数据样本：

1 2 chapt -> e hapte -> r

在像这样的长文本中，可以创建大量窗口，这会生成包含大量样本的数据集：

1 2 3 4 5 6 7 8 9 10 11 # prepare the dataset of input to output pairs encoded as integers seq_length = 100 dataX = [] dataY = [] for i in range(0, n_chars - seq_length, 1): seq_in = raw_text[i:i + seq_length] seq_out = raw_text[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) n_patterns = len(dataX) print("Total Patterns: ", n_patterns)

运行上面的代码，您可以看到总共创建了 144,474 个样本。现在，每个样本都是整数形式，并使用映射进行转换char_to_int。然而，PyTorch 模型更愿意查看浮点张量中的数据。因此，您应该将它们转换为 PyTorch 张量。模型中将使用 LSTM 层，因此输入张量应该具有维度（样本、时间步长、特征）。为了帮助训练，将输入标准化为 0 到 1 也是一个好主意。因此，您将得到以下结果：

1 2 3 4 5 6 7 8 9 import torch import torch.nn as nn import torch.optim as optim # reshape X to be [samples, time steps, features] X = torch.tensor(dataX, dtype=torch.float32).reshape(n_patterns, seq_length, 1) X = X / float(n_vocab) y = torch.tensor(dataY) print(X.shape, y.shape)

您现在可以定义 LSTM 模型。在这里，您定义一个具有 256 个隐藏单元的隐藏 LSTM 层。输入是单一特征（即一个整数对应一个字符）。在 LSTM 层之后添加概率为 0.2 的 dropout 层。LSTM 层的输出是一个元组，其中第一个元素是每个时间步长的 LSTM 单元的隐藏状态。这是当 LSTM 单元接受每个时间步输入时隐藏状态如何演化的历史。据推测，最后一个隐藏状态包含最多的信息，因此只有最后一个隐藏状态被传递到输出层。输出层是一个全连接层，用于生成 50 个词汇表的逻辑。可以使用 softmax 函数将 logits 转换为类似概率的预测。

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 import torch.nn as nn import torch.optim as optim import torch.utils.data as data class CharModel(nn.Module): def __init__(self): super().__init__() self.lstm = nn.LSTM(input_size=1, hidden_size=256, num_layers=1, batch_first=True) self.dropout = nn.Dropout(0.2) self.linear = nn.Linear(256, n_vocab) def forward(self, x): x, _ = self.lstm(x) # take only the last output x = x[:, -1, :] # produce output x = self.linear(self.dropout(x)) return x

这是 50 个类别的单字符分类模型。因此应该使用交叉熵损失。它使用 Adam 优化器进行优化。训练循环如下。为简单起见，没有创建测试集，但在每个时期结束时再次使用训练集评估模型，以跟踪进度。

这个程序可以运行很长时间，特别是在CPU上！为了保存工作成果，找到的最佳模型将被保存以供将来重复使用。

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 n_epochs = 40 batch_size = 128 model = CharModel() optimizer = optim.Adam(model.parameters()) loss_fn = nn.CrossEntropyLoss(reduction="sum") loader = data.DataLoader(data.TensorDataset(X, y), shuffle=True, batch_size=batch_size) best_model = None best_loss = np.inf for epoch in range(n_epochs): model.train() for X_batch, y_batch in loader: y_pred = model(X_batch) loss = loss_fn(y_pred, y_batch) optimizer.zero_grad() loss.backward() optimizer.step() # Validation model.eval() loss = 0 with torch.no_grad(): for X_batch, y_batch in loader: y_pred = model(X_batch) loss += loss_fn(y_pred, y_batch) if loss < best_loss: best_loss = loss best_model = model.state_dict() print("Epoch %d: Cross-entropy: %.4f" % (epoch, loss)) torch.save([best_model, char_to_dict], "single-char.pth")

运行上述命令可能会产生以下结果：

1 2 3 4 5 6 ... Epoch 35: Cross-entropy: 245745.2500 Epoch 36: Cross-entropy: 243908.7031 Epoch 37: Cross-entropy: 238833.5000 Epoch 38: Cross-entropy: 239069.0000 Epoch 39: Cross-entropy: 234176.2812

交叉熵在每个时期几乎总是减少。这意味着模型可能尚未完全收敛，您可以对其进行更多训练。训练循环完成后，您应该创建一个文件single-char.pth来包含迄今为止找到的最佳模型权重，以及该模型使用的字符到整数的映射。

为了完整起见，下面将上述所有内容都绑定到一个脚本中：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 import numpy as np import torch import torch.nn as nn import torch.optim as optim import torch.utils.data as data # load ascii text and covert to lowercase filename = "wonderland.txt" raw_text = open(filename, 'r', encoding='utf-8').read() raw_text = raw_text.lower() # create mapping of unique chars to integers chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars)) # summarize the loaded data n_chars = len(raw_text) n_vocab = len(chars) print("Total Characters: ", n_chars) print("Total Vocab: ", n_vocab) # prepare the dataset of input to output pairs encoded as integers seq_length = 100 dataX = [] dataY = [] for i in range(0, n_chars - seq_length, 1): seq_in = raw_text[i:i + seq_length] seq_out = raw_text[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) n_patterns = len(dataX) print("Total Patterns: ", n_patterns) # reshape X to be [samples, time steps, features] X = torch.tensor(dataX, dtype=torch.float32).reshape(n_patterns, seq_length, 1) X = X / float(n_vocab) y = torch.tensor(dataY) class CharModel(nn.Module): def __init__(self): super().__init__() self.lstm = nn.LSTM(input_size=1, hidden_size=256, num_layers=1, batch_first=True) self.dropout = nn.Dropout(0.2) self.linear = nn.Linear(256, n_vocab) def forward(self, x): x, _ = self.lstm(x) # take only the last output x = x[:, -1, :] # produce output x = self.linear(self.dropout(x)) return x n_epochs = 40 batch_size = 128 model = CharModel() optimizer = optim.Adam(model.parameters()) loss_fn = nn.CrossEntropyLoss(reduction="sum") loader = data.DataLoader(data.TensorDataset(X, y), shuffle=True, batch_size=batch_size) best_model = None best_loss = np.inf for epoch in range(n_epochs): model.train() for X_batch, y_batch in loader: y_pred = model(X_batch) loss = loss_fn(y_pred, y_batch) optimizer.zero_grad() loss.backward() optimizer.step() # Validation model.eval() loss = 0 with torch.no_grad(): for X_batch, y_batch in loader: y_pred = model(X_batch) loss += loss_fn(y_pred, y_batch) if loss < best_loss: best_loss = loss best_model = model.state_dict() print("Epoch %d: Cross-entropy: %.4f" % (epoch, loss)) torch.save([best_model, char_to_int], "single-char.pth")

使用 LSTM 模型生成文本

鉴于模型经过良好训练，使用经过训练的 LSTM 网络生成文本相对简单。首先，您需要重新创建网络并从保存的检查点加载经过训练的模型权重。然后您需要为模型启动创建一些提示。提示可以是模型可以理解的任何内容。它是提供给模型以获得一个生成字符的种子序列。然后，将生成的字符添加到该序列的末尾，并修剪掉第一个字符以保持一致的长度。只要您想要预测新字符（例如，长度为 1,000 个字符的序列），就会重复此过程。您可以选择随机输入模式作为种子序列，然后在生成字符时打印它们。

生成提示的一个简单方法是从原始数据集中选取一个随机样本，例如，使用raw_text上一节中获得的样本，可以将提示创建为：

1 2 3 seq_length = 100 start = np.random.randint(0, len(raw_text)-seq_length) prompt = raw_text[start:start+seq_length]

但应该提醒您，您需要对其进行转换，因为此提示是一个字符串，而模型需要一个整数向量。

整个代码简单如下：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 import numpy as np import torch import torch.nn as nn best_model, char_to_int = torch.load("single-char.pth") n_vocab = len(char_to_int) int_to_char = dict((i, c) for c, i in char_to_int.items()) # reload the model class CharModel(nn.Module): def __init__(self): super().__init__() self.lstm = nn.LSTM(input_size=1, hidden_size=256, num_layers=1, batch_first=True) self.dropout = nn.Dropout(0.2) self.linear = nn.Linear(256, n_vocab) def forward(self, x): x, _ = self.lstm(x) # take only the last output x = x[:, -1, :] # produce output x = self.linear(self.dropout(x)) return x model = CharModel() model.load_state_dict(best_model) # randomly generate a prompt filename = "wonderland.txt" seq_length = 100 raw_text = open(filename, 'r', encoding='utf-8').read() raw_text = raw_text.lower() start = np.random.randint(0, len(raw_text)-seq_length) prompt = raw_text[start:start+seq_length] pattern = [char_to_int[c] for c in prompt] model.eval() print('Prompt: "%s"' % prompt) with torch.no_grad(): for i in range(1000): # format input array of int into PyTorch tensor x = np.reshape(pattern, (1, len(pattern), 1)) / float(n_vocab) x = torch.tensor(x, dtype=torch.float32) # generate logits as output from the model prediction = model(x) # convert logits into one character index = int(prediction.argmax()) result = int_to_char[index] print(result, end="") # append the new character into the prompt for the next iteration pattern.append(index) pattern = pattern[1:] print() print("Done.")

运行此示例首先输出所使用的提示，然后输出生成的每个字符。例如，下面是该文本生成器一次运行的结果。提示是：

1 2 Prompt: "nother rush at the stick, and tumbled head over heels in its hurry to get hold of it; then alice, th"

生成的文本是：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 e was qot a litule soteet of thet was sh the thiee harden an the courd, and was tuitk a little toaee th thite ththe and said to the suher, and the whrtght the pacbit sese tha woode of the soeee, and the white rabbit ses ani thr gort to the thite rabbit, and then she was aoiinnene th the three baaed of the sueen and saed “ota turpe ”hun mot,” “i don’t know the ter ano _enend to mere,” said the maccht ar a sore of great roaee. “ie you don’t teink if thet soued to soeed to the boeie the mooer, io you bane thing it wo tou het bn the crur, “h whsh you cen not,” said the manch hare. “wes, it aadi,” said the manch hare. “weat you tail to merer ae in an a gens if gre” ”he were thing,” said the maccht ar a sore of geeaghen asd tothe to the thieg harden an the could. “h dan tor toe taie thing,” said the manch hare. “wes, it aadi,” said the manch hare. “weat you tail to merer ae in an a gens if gre” ”he were thing,” said the maccht ar a sore of geeaghen asd tothe to the thieg harden an t

让我们注意一些关于生成文本的观察结果。

它可以发出换行符。原文将行宽限制为 80 个字符，生成模型尝试复制此模式这些字符被分成类似单词的组，有些组是实际的英语单词（例如“the”、“said”和“rabbit”），但许多组不是（例如“thite”、“soteet”和“他”）。有些单词按顺序有意义（例如，“我不知道”），但许多单词没有意义（例如，“他是东西”）。

事实上，本书基于字符的模型产生这样的输出是非常令人印象深刻的。它让您了解 LSTM 网络的学习能力。然而，结果并不完美。在下一节中，您将了解如何通过开发更大的 LSTM 网络来提高结果质量。

使用更大的 LSTM 网络

回想一下，LSTM 是一个循环神经网络。它采用序列作为输入，在序列的每一步中，输入与其内部状态混合以产生输出。因此 LSTM 的输出也是一个序列。在上面，最后一个时间步骤的输出被用于神经网络中的进一步处理，但早期步骤的输出被丢弃。然而，情况并非一定如此。您可以将一个 LSTM 层的序列输出视为另一 LSTM 层的输入。然后，你正在建立一个更大的网络。

与卷积神经网络类似，堆叠 LSTM 网络应该具有较早的 LSTM 层来学习低级特征，而较晚的 LSTM 层则学习高级特征。它可能并不总是有用，但您可以尝试一下，看看模型是否可以产生更好的结果。

在 PyTorch 中，制作堆叠 LSTM 层很容易。我们将上面的模型修改为如下：

1 2 3 4 5 6 7 8 9 10 11 12 13 class CharModel(nn.Module): def __init__(self): super().__init__() self.lstm = nn.LSTM(input_size=1, hidden_size=256, num_layers=2, batch_first=True, dropout=0.2) self.dropout = nn.Dropout(0.2) self.linear = nn.Linear(256, n_vocab) def forward(self, x): x, _ = self.lstm(x) # take only the last output x = x[:, -1, :] # produce output x = self.linear(self.dropout(x)) return x

唯一的变化是参数nn.LSTM()：您设置num_layers=2而不是 1 来添加另一个 LSTM 层。但是在两个 LSTM 层之间，您还通过参数添加了一个 dropout 层dropout=0.2。您所需要做的全部更改就是用旧模型替换此模型。重新运行训练，您应该看到以下内容：

1 2 3 4 5 6 7 ... Epoch 34: Cross-entropy: 203763.0312 Epoch 35: Cross-entropy: 204002.5938 Epoch 36: Cross-entropy: 210636.5625 Epoch 37: Cross-entropy: 199619.6875 Epoch 38: Cross-entropy: 199240.2969 Epoch 39: Cross-entropy: 196966.1250

您应该看到这里的交叉熵低于上一节中的交叉熵。这意味着该模型的性能更好。事实上，通过这个模型，你可以看到生成的文本看起来更合理：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Prompt: "ll say that ‘i see what i eat’ is the same thing as ‘i eat what i see’!” “you might just as well sa" y it to sea,” she katter said to the jury. and the thoee hardeners vhine she was seady to alice the was a long tay of the sooe of the court, and she was seady to and taid to the coor and the court. “well you see what you see, the mookee of the soog of the season of the shase of the court!” “i don’t know the rame thing is it?” said the caterpillar. “the cormous was it makes he it was it taie the reason of the shall bbout it, you know.” “i don’t know the rame thing i can’t gelp the sea,” the hatter went on, “i don’t know the peally was in the shall sereat it would be a teally. the mookee of the court ” “i don’t know the rame thing is it?” said the caterpillar. “the cormous was it makes he it was it taie the reason of the shall bbout it, you know.” “i don’t know the rame thing i can’t gelp the sea,” the hatter went on, “i don’t know the peally was in the shall sereat it would be a teally. the mookee of the court ” “i don’t know the rame thing is it?” said the caterpillar. “the Done.

不仅单词拼写正确，文本也更像英语。由于在训练模型时交叉熵损失仍在减少，因此您可以假设模型尚未收敛。如果增加训练周期，您可以期望使模型变得更好。

为了完整起见，下面是使用这个新模型的完整代码，包括训练和文本生成。

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 import numpy as np import torch import torch.nn as nn import torch.optim as optim import torch.utils.data as data # load ascii text and covert to lowercase filename = "wonderland.txt" raw_text = open(filename, 'r', encoding='utf-8').read() raw_text = raw_text.lower() # create mapping of unique chars to integers chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars)) # summarize the loaded data n_chars = len(raw_text) n_vocab = len(chars) print("Total Characters: ", n_chars) print("Total Vocab: ", n_vocab) # prepare the dataset of input to output pairs encoded as integers seq_length = 100 dataX = [] dataY = [] for i in range(0, n_chars - seq_length, 1): seq_in = raw_text[i:i + seq_length] seq_out = raw_text[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) n_patterns = len(dataX) print("Total Patterns: ", n_patterns) # reshape X to be [samples, time steps, features] X = torch.tensor(dataX, dtype=torch.float32).reshape(n_patterns, seq_length, 1) X = X / float(n_vocab) y = torch.tensor(dataY) class CharModel(nn.Module): def __init__(self): super().__init__() self.lstm = nn.LSTM(input_size=1, hidden_size=256, num_layers=2, batch_first=True, dropout=0.2) self.dropout = nn.Dropout(0.2) self.linear = nn.Linear(256, n_vocab) def forward(self, x): x, _ = self.lstm(x) # take only the last output x = x[:, -1, :] # produce output x = self.linear(self.dropout(x)) return x n_epochs = 40 batch_size = 128 model = CharModel() optimizer = optim.Adam(model.parameters()) loss_fn = nn.CrossEntropyLoss(reduction="sum") loader = data.DataLoader(data.TensorDataset(X, y), shuffle=True, batch_size=batch_size) best_model = None best_loss = np.inf for epoch in range(n_epochs): model.train() for X_batch, y_batch in loader: y_pred = model(X_batch) loss = loss_fn(y_pred, y_batch) optimizer.zero_grad() loss.backward() optimizer.step() # Validation model.eval() loss = 0 with torch.no_grad(): for X_batch, y_batch in loader: y_pred = model(X_batch) loss += loss_fn(y_pred, y_batch) if loss < best_loss: best_loss = loss best_model = model.state_dict() print("Epoch %d: Cross-entropy: %.4f" % (epoch, loss)) torch.save([best_model, char_to_int], "single-char.pth") # Generation using the trained model best_model, char_to_int = torch.load("single-char.pth") n_vocab = len(char_to_int) int_to_char = dict((i, c) for c, i in char_to_int.items()) model.load_state_dict(best_model) # randomly generate a prompt filename = "wonderland.txt" seq_length = 100 raw_text = open(filename, 'r', encoding='utf-8').read() raw_text = raw_text.lower() start = np.random.randint(0, len(raw_text)-seq_length) prompt = raw_text[start:start+seq_length] pattern = [char_to_int[c] for c in prompt] model.eval() print('Prompt: "%s"' % prompt) with torch.no_grad(): for i in range(1000): # format input array of int into PyTorch tensor x = np.reshape(pattern, (1, len(pattern), 1)) / float(n_vocab) x = torch.tensor(x, dtype=torch.float32) # generate logits as output from the model prediction = model(x) # convert logits into one character index = int(prediction.argmax()) result = int_to_char[index] print(result, end="") # append the new character into the prompt for the next iteration pattern.append(index) pattern = pattern[1:] print() print("Done.")

使用 GPU 加快训练速度

从这篇文章中运行程序可能会慢得可怜。即使您有 GPU，也不会立即看到改进。这是因为 PyTorch 的设计，它可能不会自动使用你的 GPU。但是，如果您有支持 CUDA 的 GPU，则可以通过小心地将繁重的计算从 CPU 上移开来大幅提高性能。

PyTorch 模型是张量计算的程序。张量可以存储在 GPU 或 CPU 中。只要所有操作人员都在同一台设备上就可以进行操作。在此特定示例中，模型权重（即 LSTM 层和全连接层的权重）可以移至 GPU。通过这样做，输入也应该在执行之前移至 GPU，并且输出也将存储在 GPU 中，除非您将其移回。

在 PyTorch 中，您可以使用以下函数检查是否有支持 CUDA 的 GPU：

1 torch.cuda.is_available()

它返回一个布尔值来指示您是否可以使用 GPU，这又取决于您拥有的硬件型号、您的操作系统是否安装了适当的库以及您的 PyTorch 是否编译有相应的 GPU 支持。如果一切正常，您可以创建一个设备并将模型分配给它：

1 2 device = torch.device("cuda:0") model.to(device)

如果您的模型正在 CUDA 设备上运行，但您的输入张量未在 CUDA 设备上运行，您将看到 PyTorch 对此进行抱怨并无法继续。要将张量移动到 CUDA 设备，您应该按以下方式运行：

1 y_pred = model(X_batch.to(device))

这.to(device)部分将发挥魔力。但请记住，y_pred上面生成的内容也将在 CUDA 设备上。因此，当您运行损失函数时，您应该执行相同的操作。修改上面的程序，使其能够在GPU上运行，将变成如下：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 import numpy as np import torch import torch.nn as nn import torch.optim as optim import torch.utils.data as data # load ascii text and covert to lowercase filename = "wonderland.txt" raw_text = open(filename, 'r', encoding='utf-8').read() raw_text = raw_text.lower() # create mapping of unique chars to integers chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars)) # summarize the loaded data n_chars = len(raw_text) n_vocab = len(chars) print("Total Characters: ", n_chars) print("Total Vocab: ", n_vocab) # prepare the dataset of input to output pairs encoded as integers seq_length = 100 dataX = [] dataY = [] for i in range(0, n_chars - seq_length, 1): seq_in = raw_text[i:i + seq_length] seq_out = raw_text[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) n_patterns = len(dataX) print("Total Patterns: ", n_patterns) # reshape X to be [samples, time steps, features] X = torch.tensor(dataX, dtype=torch.float32).reshape(n_patterns, seq_length, 1) X = X / float(n_vocab) y = torch.tensor(dataY) class CharModel(nn.Module): def __init__(self): super().__init__() self.lstm = nn.LSTM(input_size=1, hidden_size=256, num_layers=2, batch_first=True, dropout=0.2) self.dropout = nn.Dropout(0.2) self.linear = nn.Linear(256, n_vocab) def forward(self, x): x, _ = self.lstm(x) # take only the last output x = x[:, -1, :] # produce output x = self.linear(self.dropout(x)) return x n_epochs = 40 batch_size = 128 model = CharModel() device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") model.to(device) optimizer = optim.Adam(model.parameters()) loss_fn = nn.CrossEntropyLoss(reduction="sum") loader = data.DataLoader(data.TensorDataset(X, y), shuffle=True, batch_size=batch_size) best_model = None best_loss = np.inf for epoch in range(n_epochs): model.train() for X_batch, y_batch in loader: y_pred = model(X_batch.to(device)) loss = loss_fn(y_pred, y_batch.to(device)) optimizer.zero_grad() loss.backward() optimizer.step() # Validation model.eval() loss = 0 with torch.no_grad(): for X_batch, y_batch in loader: y_pred = model(X_batch.to(device)) loss += loss_fn(y_pred, y_batch.to(device)) if loss < best_loss: best_loss = loss best_model = model.state_dict() print("Epoch %d: Cross-entropy: %.4f" % (epoch, loss)) torch.save([best_model, char_to_int], "single-char.pth") # Generation using the trained model best_model, char_to_int = torch.load("single-char.pth") n_vocab = len(char_to_int) int_to_char = dict((i, c) for c, i in char_to_int.items()) model.load_state_dict(best_model) # randomly generate a prompt filename = "wonderland.txt" seq_length = 100 raw_text = open(filename, 'r', encoding='utf-8').read() raw_text = raw_text.lower() start = np.random.randint(0, len(raw_text)-seq_length) prompt = raw_text[start:start+seq_length] pattern = [char_to_int[c] for c in prompt] model.eval() print('Prompt: "%s"' % prompt) with torch.no_grad(): for i in range(1000): # format input array of int into PyTorch tensor x = np.reshape(pattern, (1, len(pattern), 1)) / float(n_vocab) x = torch.tensor(x, dtype=torch.float32) # generate logits as output from the model prediction = model(x.to(device)) # convert logits into one character index = int(prediction.argmax()) result = int_to_char[index] print(result, end="") # append the new character into the prompt for the next iteration pattern.append(index) pattern = pattern[1:] print() print("Done.")

与上一节中的代码相比，您应该看到它们本质上是相同的。除了使用以下行检测到 CUDA 设备：

1 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

如果没有找到 CUDA 设备，这将是您的 GPU 或回退到 CPU。然后，.to(device)在几个关键位置添加，将计算转移到 GPU。

进一步阅读

这种字符文本模型是使用循环神经网络生成文本的流行方法。如果您有兴趣深入了解，下面是有关该主题的更多资源和教程。

文章安德烈·卡帕蒂. 递归神经网络的不合理有效性。2015 年 5 月。拉斯·艾德内斯. 使用循环神经网络自动生成标题诱饵。2015年。PyTorch 教程。序列模型和长短期记忆网络文件伊利亚·苏茨克弗、詹姆斯·马滕斯和杰弗里·辛顿。“用循环神经网络生成文本”。见：第 28 届国际机器学习会议论文集。美国华盛顿州贝尔维尤，2011 年。

APIs nn.LSTM在 PyTorch 文档中概括

在这篇文章中，您了解了如何在 PyTorch 中开发用于文本生成的 LSTM 循环神经网络。完成这篇文章后，您知道：

如何免费查找经典书籍的文本作为机器学习模型的数据集如何训练文本序列的 LSTM 网络如何使用 LSTM 网络生成文本序列如何使用 CUDA 设备优化 PyTorch 中的深度学习训练

【本文地址】

公司简介

联系我们