Over the past weeks, I’ve spent quite a bit of time exploring the new Microsoft Bot Framework. During that exploration, what becomes clear immediately is that the resulting application is somewhat rigid (like most applications). The conversation of the bot is coded directly into the bot, which means that a change to the conversation requires redeploying the bot. I suppose that’s to be expected when we consider that the underlying functionality of the bot is really just a web service. Anyone who has done web development work knows that when web services need to be changed, they need to be redeployed. Those folks also know how much of a challenge it can be to deploy apps inside of a corporate or enterprise environment because of the approvals and verifications that are often required.
I began to wonder if there were a way that we could build a bot that didn’t require recoding or redeployment. Could the Bot Framework be used as a foundation to create something that was dynamic and could be modified on the fly? What might that look like? If that could be done, certainly the bot could be updated more rapidly. For example, when there’s a weekend coupon at the pizza or sandwich shop, is it reasonable to think that the bot should be recoded and redeployed to accommodate that coupon? Is there an alternative?
As I started down this road, I began to think about what happens during a conversation. We ask questions, we interpret data, we make decisions, and we respond. Depending on the conversation, we also might incorporate external data. For example,
Person 1: “What’s the weather like outside?”
Person 2: “I don’t know. Should we look outside?”
Person 1: “Yes, I think so.”
Person 2: <opens window> “Oh, I see that it’s raining and windy.”
How to represent a conversation?
Ultimately, a conversation is a series of questions and responses that are linked together until the conversation is over. What if each question or response were represented as a step that was linked to another step, until there were no more steps? Some of those steps would need more information than others. For example, a statement would need to know the previous response, while a question might need to know acceptable or permissible answers from the other person in the conversation.
What about when we need external data? While there are a LOT of possibilities for external data, for my exploration, I decided to limit that external data to just web services. Web services are incredibly flexible, and by implementing one source of external data, I knew other sources would be relatively straightforward.
Finally, if there were a way to drive our conversation based on data, rather than based on some kind of hard coded processes, an administrator could have an editor that would allow them to update the conversation on the fly.
The editor
Although far from the meat and potatoes of the data driven bot idea, one of the most useful features is the ability to change the details of the conversation in real time. For this particular implementation, I leaned heavily on a demo version of GoJS Interactive Diagrams (http://www.gojs.net). These folks have a silly number of diagrams that can be used or customized to meet any needs.
I started my diagram with the IVR Tree (http://gojs.net/latest/samples/IVRtree.html) and then added in a palette and info pane (http://gojs.net/latest/samples/htmlInteraction.html). The result is an interactive diagram of the conversation where steps and details can be modified. Hosted in a standard MVC ASP.NET application, the editor looks like this:
Each of the nodes and links is clickable/draggable. When I click on a particular node, the info pane on the right is updated for that node. The details can be changed as necessary. Links can be dragged and dropped to change the order of the conversation. Finally, the Save button writes the changes back to the database to be used by the bot. The Payload/PayloadProperty functionality wasn’t added to this version, but would be easy enough to load in some kind of separate window/pane.
The data model
It took a couple iterations, but I finally settled on a model that looks like this:
ConversationTemplate: This is identifies the conversation. The idea is that either multiple conversations or multiple versions of conversations can all be stored using the same model. When a user hits the entry point for the bot, there’s some kind of property that indicates which Conversation Template to use.
Step: This is where all of the heavy lifting occurs. Each Step is just like as described above. Question, Display, ServiceCall, and Condition are all subtypes of Step that contain additional data about each particular step. Also note that the Step table has a relationship to itself. This is the mechanism by which the bot moves from one step to the next.
ResponseType: When asking a question, response validation is useful. This table maps that option to the Question. For this concept, the possible responses were limited to Text, Numbers, Choice (a list of options), or Boolean. However, I think this could absolutely be extended to include things like LUIS responses to make the conversation feel a bit more natural.
ResponseOption: When a question is asked with a ResponseType of Choice, navigation within the conversation may very well occur based on that choice. This table allows for the flow the conversation to be altered (moved to a different step) based on the option that was selected.
PayloadProperty: Throughout the conversation, it’s helpful to carry a data set along for the duration. This data might include input parameters from the user or the results of a web service call. We can then use this payload information as parameters or string values in our various steps.
Querying the tables with the sample data from the diagram above returns results like this:
The logic
The logic is really quite simple. It starts with a ConversationTemplate that identifies the first Step in the conversation. When a user initiates a conversation, the conversation starts and the bot executes the first Step. That code looks like this:
public async Task StartConversation(IDialogContext context, IAwaitable<object> argument) { ConversationDataModel db = new ConversationDataModel(); int conversationStartID = Int32.Parse(ConfigurationManager.AppSettings["ConversationStartID"]); ConversationTemplate template = db.ConversationTemplates.Find(conversationStartID); Step startingStep = template.Steps.First(s => s.StartingStep == true); context.PrivateConversationData.SetValue("CurrentStepID", startingStep.ID); context.PrivateConversationData.SetValue("Properties", new Dictionary<string, string>()); await startingStep.Execute(context); }
Execute is a virtual function that is overridden by each Step. The Step can control how it interacts with the user. For example, the ServiceCall Execute override looks like this:
public override async Task Execute(IDialogContext context) { Dictionary<string, string> properties = null; context.PrivateConversationData.TryGetValue("Properties", out properties); string updatedURL = InjectPropertyValues(URL, properties); HttpWebRequest webRequest = HttpWebRequest.CreateHttp(updatedURL); using (HttpWebResponse response = webRequest.GetResponse() as HttpWebResponse) { using (StreamReader reader = new StreamReader(response.GetResponseStream())) { string content = reader.ReadToEnd(); await ProcessResponse(context, content); } } }
Once the step has completed, it calls into the ProcessResponse function, which stores the PayloadProperty if necessary, determines the next step, and then executes that next step. If there is no next step, then the conversation ends. That function looks like this:
protected async Task ProcessResponse(IDialogContext context, string result) { ConversationDataModel db = new ConversationDataModel(); int currentStepID; context.PrivateConversationData.TryGetValue("CurrentStepID", out currentStepID); Step currentStep = db.Steps.Find(currentStepID); Dictionary<string, string> properties = null; context.PrivateConversationData.TryGetValue("Properties", out properties); if (currentStep.PayloadProperty != null) properties[currentStep.PayloadProperty.Name] = result.ToString(); Step nextStep = currentStep.GetNextStep(result, properties); if (nextStep == null) context.Done<object>(null); else { context.PrivateConversationData.SetValue("CurrentStepID", nextStep.ID); context.PrivateConversationData.SetValue("Properties", properties); await nextStep.Execute(context); } }
Demo
After creating a simple iframed web chat on an HTML page and deploying out to Azure (along with a bit of debugging), the bot works! It starts with Step ID 2 (from the data set displayed earlier), asking if I know the zip code. I tell it No, so it stores the answer and moves to Step ID 3, where it asks me the City. I provide the City information, which it stores, and then asks me the State per Step ID 4. Again, it stores the State information, and then checks to see if the user knew the zip code per Step ID 150. Then, since I did not know the zip code, it makes a call out to the city/state web service defined in Step ID 151. Finally, it moves to Step ID 7, parses the results and displays the information. Step ID 7 doesn’t have any further steps, so the conversation ends.
If I follow the other flow, where I do know the zip code, the conversation looks like this:
Source Code
The full source code is available on GitHub, https://github.com/JonathanHuss/DataDrivenBot. Feel free to browse or use it however you like.
Thoughts
What other types of steps could be utilized? Certainly we could do things like database steps, where the query is stored and executed. Depending on where the bot service is hosted, we might also be able to do things like file based steps, where we read JSON, XML, or some other content from a flat file. To incorporate a more natural conversation, we could utilize LUIS based steps or response options, where users wouldn’t have to provide exactly the correct response, but could just talk with the bot. What about a recommendation step, where the bot uses some kind of machine learning or data processing to make a recommendation to the user (“Similar users have found X, Y, and Z interesting. Would you like to see?”). What else can you think of?
By using a data based approach, does that provide other functionality or limits? For example, it seems like it’s quite a bit easier to modify the conversation. However, it also means that any time a new step type is needed, someone has to build out the infrastructure for that step type, rather than just writing the code directly. Is one better than the other?
What about multi language support? In a traditional web app, we’d put all of our user facing text in resource files and and then just tell the app which language to use. In this case, we’d need to move all of that text into a database and then reference the appropriate language. Is that an acceptable change?
Anything else that stands out as either an improvement or challenge? Could this approach be modified that be more helpful? What do you think? I’d love to hear your opinions and feedback!
Thank you for this, I have been thinking about the same things regarding my bots and your solution seems on the right track. Your article seems to clarify what has been troubling me in that we need proper tools to visualize the construction of Bot code. Looking at normal C# code is fine for web sites and Windows Forms, but it fails us when trying to model conversations.
The visualizations you created using gojs.net seem to be the answer.
Love the approach. This is very similar to the JSON Schema driven FormFlow that is in the Bot Framework. http://docs.botframework.com/en-us/csharp/builder/sdkreference/forms.html#jsonForms Initially it was purely data driven, but once people really started using it they wanted to be able to plug-in code as well for things like validation or dynamically defining a field. The pure data driven approach included things like regex for validation, but that is not always enough. This was solved by allowing the data to include code snippets that are dynamically compiled using Roslyn. This keeps the advantage of no redeploy and definition through a data file, but still allows the flexibility of arbitrary code. (Although the coding of the snippets is awkward because they are just strings in the data file.)