Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

时间:2023-03-18 11:16:20


Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

RASA 官网

​https://rasa.com/​

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

Testing Your Assistant

Rasa开源允许您通过运行测试故事来验证和测试端到端的对话。此外,还可以分别测试对话管理和消息处理(NLU)。

Validating Data and Stories

验证数据和故事#

数据验证可验证您的域、NLU数据或故事数据中是否未出现错误或重大不一致。要验证数据,请运行以下命令:

rasa data validate

运行结果如下

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

如果将max_history 值传递给配置config.yml 文件中的一个或多个策略 ,提供这些值中的最小值,如下所示:

rasa data validate --max-history <max_history>

运行结果如下

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

如果数据验证导致错误,那么训练模型也可能失败或产生糟糕的性能,因此在训练模型之前运行此检查总是好的。通过包含–fail-on-warnings标志,此步骤将在指示更多次要问题的警告时失败。

运行 rasa data validate数据验证不会测试您的规则是否与您的故事一致。但是,在训练期间,RulePolicy会检查规则和故事之间的冲突。任何此类冲突都将中止训练。

要阅读有关验证器和所有可用选项的更多信息,请参阅rasa数据验证文档。

Writing Test Stories

在测试故事中测试您的训练模型是对您的助手在某些情况下的行为有信心的最佳方式。测试故事以修改后的故事格式编写,允许您提供完整的对话,并测试在给定特定用户输入的情况下,您的模型将以预期的方式运行。当您开始从用户对话中引入更复杂的故事时,这一点尤为重要。

测试故事与训练数据中的故事相似,但也包括用户消息。
以下是一些例子:

  • Basics
    tests/test_stories.yml
stories:
- story: A basic story test
steps:
- user: |
hello
intent: greet
- action: utter_ask_howcanhelp
- user: |
show me [chinese]{"entity": "cuisine"} restaurants
intent: inform
- action: utter_ask_location
- user: |
in [Paris]{"entity": "location"}
intent: inform
- action: utter_ask_price
  • Custom Actions
    tests/test_stories.yml
stories:
- story: A test where a custom action returns events
steps:
- user: |
hey
intent: greet
- action: my_custom_action
- slot_was_set:
- my_slot: "value added by custom action"
- action: utter_ask_age
- user: |
thanks
intent: thankyou
- action: utter_no_worries
  • Forms Happy Path
    tests/test_stories.yml
stories:
- story: A test story with a form
steps:
- user: |
hi
intent: greet
- action: utter_greet
- user: |
im looking for a restaurant
intent: request_restaurant
- action: restaurant_form
- active_loop: restaurant_form
- user: |
[afghan](cuisine) food
intent: inform
- action: restaurant_form
- active_loop: null
- action: utter_slots_values
- user: |
thanks
intent: thankyou
- action: utter_no_worries
  • Forms Unhappy Path

tests/test_stories.yml

stories:
- story: A test story with unexpected input during a form
steps:
- user: |
hi
intent: greet
- action: utter_greet
- user: |
im looking for a restaurant
intent: request_restaurant
- action: restaurant_form
- active_loop: restaurant_form
- user: |
How's the weather?
intent: chitchat
- action: utter_chitchat
- action: restaurant_form
- active_loop: null
- action: utter_slots_values
- user: |
thanks
intent: thankyou
- action: utter_no_worries

默认情况下,该命令将对任何名称以test_开头的文件中的故事运行测试。您还可以使用–stories参数提供特定的测试故事文件或目录。您可以通过运行以下命令来测试助手:

rasa test

test_stories.yml

#### This file contains tests to evaluate that your bot behaves as expected.
#### If you want to learn more, please see the docs: https://rasa.com/docs/rasa/testing-your-assistant

stories:
- story: happy path 1
steps:
- user: |
hello there!
intent: greet
- action: utter_greet
- user: |
amazing
intent: mood_great
- action: utter_happy

- story: happy path 2
steps:
- user: |
hello there!
intent: greet
- action: utter_greet
- user: |
amazing
intent: mood_great
- action: utter_happy
- user: |
bye-bye!
intent: goodbye
- action: utter_goodbye

- story: sad path 1
steps:
- user: |
hello
intent: greet
- action: utter_greet
- user: |
not good
intent: mood_unhappy
- action: utter_cheer_up
- action: utter_did_that_help
- user: |
yes
intent: affirm
- action: utter_happy

- story: sad path 2
steps:
- user: |
hello
intent: greet
- action: utter_greet
- user: |
not good
intent: mood_unhappy
- action: utter_cheer_up
- action: utter_did_that_help
- user: |
not really
intent: deny
- action: utter_goodbye

- story: sad path 3
steps:
- user: |
hi
intent: greet
- action: utter_greet
- user: |
very terrible
intent: mood_unhappy
- action: utter_cheer_up
- action: utter_did_that_help
- user: |
no
intent: deny
- action: utter_goodbye

- story: say goodbye
steps:
- user: |
bye-bye!
intent: goodbye
- action: utter_goodbye

- story: bot challenge
steps:
- user: |
are you a bot?
intent: bot_challenge
- action: utter_iamabot

运行结果如下

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant


Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant


Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant


intent_report.json

{
"deny": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 7,
"confused_with": {}
},
"mood_unhappy": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 14,
"confused_with": {}
},
"greet": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 13,
"confused_with": {}
},
"mood_great": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 14,
"confused_with": {}
},
"bot_challenge": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 4,
"confused_with": {}
},
"affirm": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 6,
"confused_with": {}
},
"goodbye": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 10,
"confused_with": {}
},
"accuracy": 1.0,
"macro avg": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 68
},
"weighted avg": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 68
}
}

story_report.json

{
"goodbye": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 2
},
"utter_cheer_up": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 3
},
"utter_did_that_help": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 3
},
"utter_iamabot": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 1
},
"mood_unhappy": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 3
},
"greet": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 5
},
"utter_greet": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 5
},
"mood_great": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 2
},
"deny": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 2
},
"bot_challenge": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 1
},
"utter_happy": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 3
},
"utter_goodbye": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 4
},
"affirm": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 1
},
"action_listen": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 16
},
"accuracy": 1.0,
"macro avg": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 51
},
"weighted avg": {
"precision": 1.0,
"recall": 1.0,
"f1-score": 1.0,
"support": 51
},
"conversation_accuracy": {
"accuracy": 1.0,
"correct": 7,
"with_warnings": 0,
"total": 7
}
}

会话测试只与包含的测试用例一样准确,因此您应该在改进助手的同时继续增加测试用例集。一个很好的经验法则是,你应该让你的测试故事代表真实对话的真实分布。RASAX使基于真实对话添加测试对话变得容易。

rasa test

有关更多配置选项,请参阅rasa测试的CLI文档。

测试自定义操作

Custom Actions自定义操作不会作为测试故事的一部分执行。如果您的自定义操作将任何事件附加到对话中,这必须反映在您的测试故事中(例如,通过向您的测试故事中添加slot_was_set事件)。

要测试自定义操作的代码,应该为它们编写单元测试,并将这些测试包括在CI/CD管道中。

Evaluating an NLU Model

除了测试故事外,还可以单独测试自然语言理解(NLU)模型。一旦您的助手部署到现实世界中,它将处理训练数据中未显示的消息。为了模拟这种情况,您应该始终留出部分数据用于测试。您可以使用以下方法将NLU数据拆分为训练集和测试集:

rasa data split nlu

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant


Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

test_data.yml

version: "3.0"
nlu:
- intent: bot_challenge
examples: |
- are you a bot?
- intent: affirm
examples: |
- of course
- intent: deny
examples: |
- no way
- n
- intent: goodbye
examples: |
- have a nice day
- cu
- intent: greet
examples: |
- let's go
- hi
- intent: mood_great
examples: |
- so perfect
- great
- wonderful
- intent: mood_unhappy
examples: |
- I'm so sad
- very sad
- so saad

training_data.yml

version: "3.0"
nlu:
- intent: bot_challenge
examples: |
- are you a human?
- am I talking to a human?
- am I talking to a bot?
- intent: affirm
examples: |
- correct
- indeed
- y
- that sounds good
- yes
- intent: deny
examples: |
- never
- I don't think so
- not really
- no
- don't like that
- intent: goodbye
examples: |
- cee you later
- good night
- good by
- goodbye
- bye bye
- see you around
- see you later
- bye
- intent: greet
examples: |
- hello there
- good afternoon
- good morning
- goodevening
- hey there
- goodmorning
- hey
- hey dude
- moin
- hello
- good evening
- intent: mood_great
examples: |
- super stoked
- I am going to save the world
- I am feeling very good
- so good
- I am great
- perfect
- extremely good
- amazing
- so so perfect
- I am amazing
- feeling like a king
- intent: mood_unhappy
examples: |
- my day was horrible
- I am disappointed
- not very good
- not good
- so sad
- unhappy
- I am sad
- I don't feel very well
- super sad
- extremly sad
- sad

接下来,您可以使用以下方法查看经过训练的NLU模型对生成的测试集数据的预测效果:

rasa test nlu
--nlu train_test_split/test_data.yml

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant


Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

要更广泛地测试模型,请使用交叉验证,它会自动创建多个训练/测试拆分:

rasa test nlu
--nlu data/nlu.yml
--cross-validation

运行结果如下

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

Transformer课程 业务对话机器人Rasa 3.x Testing Your Assistant

Rasa系列博客:

  • ​​业务对话机器人Rasa 3.x Internals及Rasa框架定制实战​​
  • ​​业务对话机器人Rasa核心算法DIET及TED论文详解​​
  • ​​业务对话机器人Rasa 3.x部署安装初体验​​
  • ​​业务对话机器人Rasa 3.x Playground​​
  • ​​业务对话机器人Rasa 3.x Command Line Interface​​
  • ​​业务对话机器人Rasa 3.x 命令 rasa shell 及rasa run​​
  • ​​业务对话机器人Rasa 3.x 命令rasa run actions、rasa test、rasa data split 、rasa data convert nlu ​​
  • ​​业务对话机器人Rasa 3.x 命令rasa data migrate、rasa data validate、rasa export、rasa evaluate markers、rasa x​​
  • ​​业务对话机器人Rasa 3.x 会话驱动开发(Conversation-Driven Development)​​
  • ​​业务对话机器人Rasa 3.x 生成自然语言理解NLU数据​​
  • ​​业务对话机器人Rasa 3.x 写入会话数据(Writing Conversation Data)​​
  • ​​业务对话机器人Rasa 3.x 聊天和FAQs常见问题解答​​
  • ​​业务对话机器人Rasa 3.x docker安装部署​​
  • ​​业务对话机器人Rasa 3.x Handling Business Logic​​
  • ​​业务对话机器人Rasa 3.x Fallback and Human Handoff​​
  • ​​业务对话机器人Rasa 3.x Handling Unexpected Input​​
  • ​​业务对话机器人Rasa 3.x Contextual Conversations​​
  • ​​业务对话机器人Rasa 3.x Reaching Out to the User​​
  • ​​业务对话机器人Rasa 3.x 微信+Rasa机器人实战案例​​
  • ​​业务对话机器人Rasa 3.x Tuning Your NLU Model​​
  • ​​业务对话机器人Rasa 3.x Tuning Your NLU Model(二)​​