自然语言处理NLP星空智能对话机器人系列:理解语言的 Transformer 模型 您所在的位置:网站首页 智能机器人翻译成英语 自然语言处理NLP星空智能对话机器人系列:理解语言的 Transformer 模型

自然语言处理NLP星空智能对话机器人系列:理解语言的 Transformer 模型

2024-02-25 08:55| 来源: 网络整理| 查看: 265

标签:NLP Transformer datasets 星空 机器人 分词器 tensorflow

自然语言处理NLP星空智能对话机器人系列:理解语言的 Transformer 模型

本文是将葡萄牙语翻译成英语的一个高级示例。

目录 安装部署 Tensorflow设置输入pipeline从训练数据集创建自定义子词分词器subwords tokenizer如果单词不在词典中,则分词器(tokenizer)通过将单词分解为子词来对字符串进行编码。将开始和结束标记(token)添加到输入和目标为了使示例较小且相对较快,删除长度大于40个标记的样本附录 最终的运行结果参考文献星空智能对话机器人系列博客

安装部署 Tensorflow import tensorflow_datasets as tfds import tensorflow as tf import time import numpy as np import matplotlib.pyplot as plt

运行报错,提示

ModuleNotFoundError Traceback (most recent call last) in ----> 1 import tensorflow_datasets as tfds 2 import tensorflow as tf 3 4 import time 5 import numpy as np ModuleNotFoundError: No module named 'tensorflow_datasets'

安装tensorflow_datasets

(base) C:\Users\admin>activate my_star_space (my_star_space) C:\Users\admin>pip install tensorflow-datasets Collecting tensorflow-datasets Using cached tensorflow_datasets-4.4.0-py3-none-any.whl (4.0 MB) Requirement already satisfied: dill in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (0.3.4) Collecting tensorflow-metadata Downloading tensorflow_metadata-1.2.0-py3-none-any.whl (48 kB) |████████████████████████████████| 48 kB 21 kB/s Requirement already satisfied: dataclasses in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (0.8) Requirement already satisfied: importlib-resources in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (5.2.2) Requirement already satisfied: promise in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (2.3) Requirement already satisfied: tqdm in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (4.62.2) Requirement already satisfied: attrs>=18.1.0 in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (21.2.0) Requirement already satisfied: requests>=2.19.0 in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (2.26.0) Requirement already satisfied: six in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (1.16.0) Requirement already satisfied: future in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (0.18.2) Requirement already satisfied: numpy in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (1.19.5) Requirement already satisfied: absl-py in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (0.13.0) Requirement already satisfied: typing-extensions in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (3.7.4.3) Requirement already satisfied: protobuf>=3.12.2 in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (3.17.3) Requirement already satisfied: termcolor in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (1.1.0) Requirement already satisfied: certifi>=2017.4.17 in e:\anaconda3\envs\my_star_space\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (2021.5.30) Requirement already satisfied: idna=2.5 in e:\anaconda3\envs\my_star_space\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (3.2) Requirement already satisfied: urllib3=1.21.1 in e:\anaconda3\envs\my_star_space\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (1.25.11) Requirement already satisfied: charset-normalizer~=2.0.0 in e:\anaconda3\envs\my_star_space\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (2.0.4) Requirement already satisfied: zipp>=3.1.0 in e:\anaconda3\envs\my_star_space\lib\site-packages (from importlib-resources->tensorflow-datasets) (3.5.0) Requirement already satisfied: googleapis-common-protos=1.52.0 in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-metadata->tensorflow-datasets) (1.53.0) Collecting absl-py Downloading absl_py-0.12.0-py3-none-any.whl (129 kB) |████████████████████████████████| 129 kB 14 kB/s Requirement already satisfied: colorama in e:\anaconda3\envs\my_star_space\lib\site-packages (from tqdm->tensorflow-datasets) (0.4.4) Installing collected packages: absl-py, tensorflow-metadata, tensorflow-datasets Attempting uninstall: absl-py Found existing installation: absl-py 0.13.0 Uninstalling absl-py-0.13.0: Successfully uninstalled absl-py-0.13.0 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow 2.6.0 requires six~=1.15.0, but you have six 1.16.0 which is incompatible. Successfully installed absl-py-0.12.0 tensorflow-datasets-4.4.0 tensorflow-metadata-1.2.0 WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available. You should consider upgrading via the 'e:\anaconda3\envs\my_star_space\python.exe -m pip install --upgrade pip' command. (my_star_space) C:\Users\admin>pip install tensorflow-datasets Requirement already satisfied: tensorflow-datasets in e:\anaconda3\envs\my_star_space\lib\site-packages (4.4.0) Requirement already satisfied: promise in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (2.3) Requirement already satisfied: future in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (0.18.2) Requirement already satisfied: numpy in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (1.19.5) Requirement already satisfied: absl-py in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (0.12.0) Requirement already satisfied: termcolor in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (1.1.0) Requirement already satisfied: six in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (1.16.0) Requirement already satisfied: tensorflow-metadata in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (1.2.0) Requirement already satisfied: dataclasses in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (0.8) Requirement already satisfied: requests>=2.19.0 in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (2.26.0) Requirement already satisfied: importlib-resources in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (5.2.2) Requirement already satisfied: typing-extensions in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (3.7.4.3) Requirement already satisfied: protobuf>=3.12.2 in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (3.17.3) Requirement already satisfied: tqdm in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (4.62.2) Requirement already satisfied: dill in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (0.3.4) Requirement already satisfied: attrs>=18.1.0 in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-datasets) (21.2.0) Requirement already satisfied: certifi>=2017.4.17 in e:\anaconda3\envs\my_star_space\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (2021.5.30) Requirement already satisfied: charset-normalizer~=2.0.0 in e:\anaconda3\envs\my_star_space\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (2.0.4) Requirement already satisfied: urllib3=1.21.1 in e:\anaconda3\envs\my_star_space\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (1.25.11) Requirement already satisfied: idna=2.5 in e:\anaconda3\envs\my_star_space\lib\site-packages (from requests>=2.19.0->tensorflow-datasets) (3.2) Requirement already satisfied: zipp>=3.1.0 in e:\anaconda3\envs\my_star_space\lib\site-packages (from importlib-resources->tensorflow-datasets) (3.5.0) Requirement already satisfied: googleapis-common-protos=1.52.0 in e:\anaconda3\envs\my_star_space\lib\site-packages (from tensorflow-metadata->tensorflow-datasets) (1.53.0) Requirement already satisfied: colorama in e:\anaconda3\envs\my_star_space\lib\site-packages (from tqdm->tensorflow-datasets) (0.4.4) WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available. You should consider upgrading via the 'e:\anaconda3\envs\my_star_space\python.exe -m pip install --upgrade pip' command. (my_star_space) C:\Users\admin> 设置输入pipeline

使用 TFDS 来导入 葡萄牙语-英语翻译数据集,该数据集来自于 TED 演讲开放翻译项目. 数据集包含来约 50000 条训练样本,1100 条验证样本,以及 2000 条测试样本。 在这里插入图片描述 在这里插入图片描述

examples, metadata = tfds.load('ted_hrlr_translate/pt_to_en', with_info=True, as_supervised=True) train_examples, val_examples = examples['train'], examples['validation']

下载的时间较长,运行结果如下

Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to C:\Users\admin\tensorflow_datasets\ted_hrlr_translate\pt_to_en\1.0.0... Dl Completed...: 100% 1/1 [2:57:36 T 1248 ----> ran 7946 ----> s 7194 ----> former 13 ----> is 2799 ----> awesome 7877 ----> . 将开始和结束标记(token)添加到输入和目标 BUFFER_SIZE = 20000 BATCH_SIZE = 64 def encode(lang1, lang2): lang1 = [tokenizer_pt.vocab_size] + tokenizer_pt.encode( lang1.numpy()) + [tokenizer_pt.vocab_size+1] lang2 = [tokenizer_en.vocab_size] + tokenizer_en.encode( lang2.numpy()) + [tokenizer_en.vocab_size+1] return lang1, lang2 为了使示例较小且相对较快,删除长度大于40个标记的样本 MAX_LENGTH = 40 def filter_max_length(x, y, max_length=MAX_LENGTH): return tf.logical_and(tf.size(x)


【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有