两种开源聊天机器人的性能测试（一）

您所在的位置：网站首页 › conversional › 两种开源聊天机器人的性能测试（一）

两种开源聊天机器人的性能测试（一）

2023-01-05 09:35| 来源: 网络整理| 查看: 265

因为最近在学习自然语言处理的相关知识，QQ小冰这个东西最近又很热，所以就试着玩了下两个开源聊天机器人，在这里分享一点小经验，希望对有共同兴趣的人能起到那么一点作用。

我主要测试了两个聊天机器人，一个是ChatterBot，另外一个是基于tensorflow的chatbot。我们首先看一下ChatterBot。

ChatterBot是Python自带的基于机器学习的语音对话引擎，可以基于已知的对话库来产生回应。ChatterBot独特的语言设计可以使它可以通过训练来用任何一种语言进行对话。该项目的开源代码链接：https://github.com/gunthercox/ChatterBot

语料库：语料库链接

我的测试是通过和聊天机器人进行闲聊型、任务型、知识型三种话题的交互进行的，下面正式进入测试。

首先我的测试环境是Ubuntu14.04(64位)，Pycharm-Edu-4.0。基于tensorflow的chatbot基本环境也是如此。

具体的步骤如下：

1.安装：命令行下输入sudo pip install chatterbot

如果安装了pip3建议使用sudo pip3 install chatterbot,这样就不需要第2步操作，可以直接进入第三步测试。

2.默认情况下，chatterbot是安装在python2下的，而要想使用中文语料库进行训练，需要在python3下才可以，否则会报UnicodeDecodeError。这里我们使用超级管理员身份登陆系统，然后将刚才安装的chatterbot及其依赖从python2(/usr/local/lib/python2.7/dist-packages/)下复制到python3(/usr/local/lib/python3.4/dist-packages/)下。具体文件夹如下图:

(如果这些文件夹不好找出来，可以看右侧的已修改时间，时间最新的即是)

复制好文件夹后进入Pycharm新建一个工程MyChatterBot，然后点击File>>Settings找到Project:MyChatterBot点开下三角，点击ProjectInterpreter，点击窗口右上方Project Interpreter下三角选择python3.4然后Apply，如图所示：

3.测试性能

首先测试闲聊。

由于在正式测试之前我尝试了一下它的训练功能，所以关于问答它多学了几句，以下测试都是基于我的简单训练的基础上。先把我训练的代码放上来：

#!/usr/bin/python # -*- coding: utf-8 -*- from chatterbot import ChatBot from chatterbot.trainers import ListTrainer my_bot = ChatBot("Training demo") my_bot.set_trainer(ListTrainer) my_bot.train([ "你叫什么名字？", "我叫ChatterBot。", "今天天气真好", "是啊，这种天气出去玩再好不过了。", "那你有没有想去玩的地方？", "我想去有山有水的地方。你呢？", "没钱哪都不去", "哈哈，这就比较尴尬了", ]) while True: print(my_bot.get_response(input("user:"))) 下面正式开始聊天。

其中user表示用户，前方无标注的为机器人。

测试代码如下:

#!/usr/bin/python # -*- coding: utf-8 -*- from chatterbot import ChatBot from chatterbot.trainers import ChatterBotCorpusTrainer chatbot = ChatBot("myBot") chatbot.set_trainer(ChatterBotCorpusTrainer) # 使用中文语料库训练它 chatbot.train("chatterbot.corpus.chinese") lineCounter = 1 # 开始对话 while True: print(chatbot.get_response(input("(" + str(lineCounter) + ") user:"))) lineCounter += 1 测试结果如下：

(1) user:你好

你好

(2) user:你叫什么名字

我叫ChatterBot。

(3) user:你多大了

你好

(4) user:你是男的还是女的

它是什么样的?

（5） user:长方形的，里面充满了文字

我也还不错

(6） user:什么还不错

那很好.

(7） user:那是什么

你想了解哪方面?

(8） user:关于计算机的

它是什么样的?

(9） user:我也不知道什么样子的

你想了解哪方面?

(10) user:关于程序员的

你使用什么语言呢?

（11） user:C++和python

我不怎么喜欢 Java.

(12） user:我又没问你喜不喜欢。你是程序员吗？

我是个程序员

(13） user:你使用什么语言？

我经常使用 Python,Java 和 C++ .

(14） user:我不怎么喜欢Java

什么使你心烦?

(15） user:我不心烦啊，我只是学你说话

我也不错.

(16） user:不错个啥？

那很好.

(17） user:你真蠢

你好

(18） user:好啊。你听说过图灵测试吗？

我对你的感情，是人类和bot之间独有的信任和友谊你可以把它叫做爱。

(19） user:这才没聊几句，就谈感情啊

还有其他美味的东西?

(20） user:没有了，已经被我吃完了

其他东西也行

(21） user:有个糖你要吗？

我对你的感情，是人类和bot之间独有的信任和友谊你可以把它叫做爱。

(22） user:你脑子有问题吧？

什么问题?

(23） user:回路有点长

谢谢

(24） user:真是无语了

我能帮你什么吗?

(25） user:拜托你回答的正常点

吃喝睡还有旅行。你喜欢旅行吗?

(26） user:喜欢啊。你呢？

太棒了，找个时间我们一起去周游世界吧！

(27） user:说的好听。你有钱吗？

我对你的感情，是人类和bot之间独有的信任和友谊你可以把它叫做爱。

(28） user:好了好了，我知道了。说点别的吧

说的好听。你有钱吗？

(29） user:我没钱，你有钱啊？

我也还不错

(30） user:你想去哪玩？

我想去有山有水的地方。你呢？

(31） user:和你一样吧。推荐个地方吧？

我想去有山有水的地方。你呢？

(32） user:我也想去有山有水的地方，你想去哪？

没钱哪都不去

(33） user:好吧好吧，你喜欢看电影吗？

爱丽丝漫游仙境

(34） user:这部电影怎么样？

没什么.

(35） user:说说嘛

我没钱，你有钱啊？

（36） user:别老谈钱，谈钱伤感情

我也还不错

(37） user:行了，我不想跟你聊了。拜拜

那你有没有想去玩的地方？

现在给出关于闲聊话题的正确率：我们认为（1）（2）（10）（11）（12）（13）（20）（22）（26）（30）（32）（33）（34）是正确的，正确率约为35.14%。其中（2）（30）（32）是经过我简单训练后才回答正确的，所以去除这些，如果用github上的原始语料库，正确率约为27.03%。

任务型

测试代码仍然使用上述代码。测试结果如下：

(1) user:帮我查询武汉到北京今天的火车票

是啊，这种天气出去玩再好不过了。

(2) user:帮我查询武汉火车站附近酒店

那很好.

(3) user:明天天气

是啊，这种天气出去玩再好不过了。

(4) user:查询明天上海迪士尼乐园余票

非常感谢你.

(5) user:查询武汉汉口站光谷未来之光的路线

它是什么样的?

(6) user:查询电脑里名为ChatterBot的文件夹的位置

今天天气真好

(7) user:查询本机操作系统类型

仙女座星系.

(8) user:帮我上网查询Talor Swift最新的一首歌叫什么名字

我叫ChatterBot。

(9) user:统计我已经问过你的文字字数

继续，问吧.

(10) user:查下明天横店影视城下午都有什么电影放映

我可以借用一杯糖吗?

很明显，正确率为0。

知识型

测试代码仍然使用上述代码。测试结果如下：

(1) user:中国有多大？

别老谈钱，谈钱伤感情

(2) user:中国的首都在哪里？

吃喝睡还有旅行。你喜欢旅行吗?

(3) user:中国的人口有多少？

我没钱，你有钱啊？

(4) user:中国第一大岛是哪个岛？

大不列颠联合王国

(5) user:中华人命共和国在哪一年成立？

1963

(6) user:安徽省省会是哪个城市？

大不列颠联合王国

(7) user:英国由哪几部分组成？

我对你的感情，是人类和bot之间独有的信任和友谊你可以把它叫做爱。

(8) user:第一个登上月球的人类叫什么名字？

我叫ChatterBot。

(9) user:什么是图灵测试

爱，在我的理解里，是对他人的接受，包容和理解。同时，也是对这个世界的和自己的接受包容和理解。

(10) user:winter的中文意思是什么

吃喝睡还有旅行。你喜欢旅行吗?

显然，正确率仍然为0。

数学和时间问题

下面给出测试代码：

# -*- coding: utf-8 -*- from chatterbot import ChatBot from chatterbot.trainers import ChatterBotCorpusTrainer chatbot = ChatBot( "Math & Time Bot", # 设置logicAdapter用以处理时间问题和数学问题 logic_adapters=[ "chatterbot.logic.MathematicalEvaluation", "chatterbot.logic.TimeLogicAdapter" ], # 设置输入输出Adapter input_adapter="chatterbot.input.VariableInputTypeAdapter", output_adapter="chatterbot.output.OutputAdapter" ) chatbot.set_trainer(ChatterBotCorpusTrainer) # 使用中文语料库训练它 chatbot.train("chatterbot.corpus.chinese") lineCounter = 1 # 每次用户执行一次循环 while True: try: print(chatbot.get_response(input("(" + str(lineCounter) + ") user:"))) lineCounter += 1 # 按ctrl-c 或 ctrl-d 键以退出 except (KeyboardInterrupt, EOFError, SystemExit): break

下面给出测试结果：

(1) user:4+9=?

4 + 9 = 13

(2) user:四加九等于多少

The current time is 09:32 PM

(3) user:现在什么时间

The current time is 09:33 PM

(4) user:1-1/2

( 0.5 ) = 0.5

(5) user:(1-1)/2

0.0 = 0.0

(6) user:3乘以5=

The current time is 09:33 PM

(7) user:What time is it?

The current time is 09:33 PM

(8) user:小明以1m/s的速度跑10秒，距离为多少米？

The current time is 09:35 PM

(9) user:2^8

( 10 ) = 10

(10) user:2^3=

可以看到正确的结果为（1）（3）（4）（7）（10）正确率为50%。

现在整体看下来测试结果并不理想，但也有可能是因为中文语料库资源太少所以才这样。那么我们再测一下英文吧。

英文闲聊

测试代码：

#!/usr/bin/python # -*- coding: utf-8 -*- from chatterbot import ChatBot from chatterbot.trainers import ChatterBotCorpusTrainer chatbot = ChatBot("myBot") chatbot.set_trainer(ChatterBotCorpusTrainer) # 使用英文语料库训练它 chatbot.train("chatterbot.corpus.english") lineCounter = 1 # 开始对话 while True: try: print(chatbot.get_response(input("(" + str(lineCounter) + ") user:"))) lineCounter += 1 # 按ctrl-c 或 ctrl-d 键以退出 except (KeyboardInterrupt, EOFError, SystemExit): break

测试结果：

(1) user:hello

(2) user:What's your name?

I am still young

(3) user:How old are you?

Who? Who is but a form following thefunction of what

(4) user:Where are you come from?

A man in a mask.

(5) user:Are you a boy?

Yes I am.

(6) user:Do you have a girlfriend?

i certainly have lost many friends over theyears.

(7) user:OK.Do you like dogs?

not especially. i am not into violence.

(8) user:Do you like sports?

not especially. i am not into violence.

(9) user:you are handsome

you are crazy

(10) user:It's a nice day today

only to other robots.

从上述结果来看，我们的机器人回答合适的只有（1）和（5），正确率20%。

英文任务型：

仍以上述代码进行测试。

测试结果如下：

(1) user:Inquire about the weather tomorrow

do you think thesouth was right?

(2) user:Pleasebook me a ticket from Hefei to Wuhai tomorrow

i prefer to nothurt your feelings.

(3) user:Checkthe hotel near Hefei South Station

Relationshipsare simple for me.

(4)user:Inquirey Disney Shanghai single day tickets

That's my name.

(5) user:Checkwhether the Shanghai Metro Line 1 has Nanjing West Road

Europe

(6) user:Howmany characters does "Iine" have?

Apple makeshardware and softwares to run on them. Microsft only makes OS. HP makes onlycomputers. These are just few names among hundred others.

(7) user:Helporder a Xiaomi phone from Jingdong

my favoritemovie is lord of the rings

(8) user:executethe command sudo reboot in the console

No.

(9) user:Printthe file on the desktop print.txt

andrew jackson.

(10) user:Playthe music "Heal the world"

Too much.

很不幸，正确率再次为0。

英文知识型

仍使用上述测试代码。

测试结果如下所示：

(1) user:Howlarge is China's land area?

vineland is anovel by thomas pynchon.

(2) user:Whichstate is Britain in?

an invitation toa burial

(3) user:Whatare the parts of England?

i'm not aphysicist, but i think this has something to do with heat, entropy, andconservation of energy, right?

(4) user:Howmany bits of an integet data in Python?

If theimplementation is hard to explain, it's a bad idea.

(5) user:Whichyear did the second world war break out?

its hard to saybut The ENIAC is regarded as the first computer. It was developed at Universityof Pennsylvania in 1946.

ChatterBot再次败北，正确率为0。

所以，综合以上各种测试，ChatterBot回答问题的正确率远低于50%，很多问题都为0。对于闲聊型问题还可以回答正确几个，但是对于任务型、知识型问题一窍不通，这应该是开源项目里没有设置相应的模块、语料库资源过少，也不能够主动上网查询。要想使其能够回答和提问基本吻合还需要大量的工作。另外，其速度也不是很理想，有的情况会有少许卡顿。

【本文地址】

公司简介

联系我们