Functionally testing chatbots (part 1)

Functionally testing chatbots (part 1)

This is the first post in a series of posts where I will talk about my experiences trying to functionally tests chatbots built with the Bot Framework. This first post will cover the roadblocks that I encountered when trying to create functional tests that were easily reproducable and automatable. The rest of the series will look at a framework for testing Bot Framework chatbots that I have recently been developing.

When I first started thinking about how to test chatbots that I’ve written I had the following thought:

Although the Bot Framework is very new, it should be straight forward to write functional tests because interacting with a bot is just sending HTTP requests to a web API.

Although, yes, you do interact with the bot via a web API over HTTP which already has proven methods of functionally testing; testing a chatbot is very different. Anything more than the most basic of chatbots will have conversation state and conversation flow to worry about. There is also the fact that chatbots can respond multiple times to a single user message and that after sending a message you don’t immediately get a response, you have to ask for any new responses since the last time you called.

I initially started writing unit tests using my usual trio of NUnit, NSubstitute and Fluent Assertions for testing, mocking and asserting respectively. These tests quickly became unwieldy and involved more setup than testing.

Mocking all of the dialog dependencies as well as the factory creating dialogs and everything that the DialogContext does quickly makes tests look and feel very bloated. Also, due to the way that the MessagesController (example of this in the second code block here) uses the Conversations static object, unit testing our controller is tricky and requires what I would consider more effort than it’s worth.

In bots that I have written my approach has been to treat dialogs like MVC/Web API controllers. What this means is that I try to keep them as thin as possible and only manage Bot Framework specific actions like responding to the user. Everything else is pushed down into service classes where I can unit test to my hearts content! Couple this approach with the difficulty in unit testing dialogs and the solitary controller responsible for calling the dialogs, I have opted to only cover them functional and end-to-end tests.

The one advantage to testing dialogs at a higher level means that a BDD style approach lends really nicely to conversation flow. Conversations can easily be expressed using a “Given, When, Then” syntax and this allows our tests to be easy to understand at a glance but also cover large portions of our conversation flow in one test.

Knowing that I wanted to use a BDD approach, I instantly added SpecFlow to a bot project and got to work working out how to write steps but then I discovered that SpecFlow currently doesn’t fully support async/await. Since the Bot Framework heavily relies on async/await, SpecFlow was no longer an option.

Similarly to the earlier unit test with masses of setup; even if SpecFlow had the async support that I needed, actually writing the tests in a succinct and clear way is still difficult. Let’s take the Sandwich Bot Sample as an example. This bot uses FormFlow to create a dialog that can create a sandwich order with a predefined set of steps. If we were to write an end to end test for a simple order in BDD style, it would have around 20 steps. Each one of those steps would either be sending a message or checking incoming messages for the content that we expect. We might receive multiple messages and have to check them all. Each one of those messages might have multiple attachments in the form of choice buttons. All of these buttons will have to be checked for the specific text that we want to assert on in our test.

This to me seemed like a there was something missing. I don’t want to keep writing code that checks all new messages from my bot for an attachment that has a button that has text matching a pattern. Also what do we do about the fact that a reply may not be instantly available after we send our message? Do we retry? Do we fail the test?

I’ve tried to take these questions and formulate them into a library that will ease the burden of testing chatbot conversation flow. It’s still a work in progress with lots of work still to be done but I believe that it can be useful for anyone writing chatbots with the intention of deploying them in enterprise where generally verification of new code is of high import.

My library can be found on GitHub but isn’t yet available as a NuGet as it’s not complete enough to publish (plus the name will probably change because the current one kinda sucks). The remaining posts in this series will be a look at how I have built this library, the reasons for certain architectural decisions and hopefully how it can be used for anyone building chatbots with the Bot Framework can ensure that it’s still doing what it’s supposed to.

One thought on “Functionally testing chatbots (part 1)

Leave a Reply

%d bloggers like this: