Unless you’ve been on another planet for the last year or so, you‘ll almost certainly have noticed that chatbots (and conversational agents in general) became quite popular during the course of 2016. It seems that every day a new start up or bot framework was launched, no doubt fuelled at least in part by a growth in the application of data science to language data, combined with a growing awareness in machine learning and AI techniques more generally. So it’s not surprising that we now see on a daily basis all manner of commentary on various aspects of chatbots, from marketing to design, development, commercialisation, etc.
But one topic that doesn’t seem to have received quite as much attention is that of evaluation. It seems that in our collective haste to join the chatbot party, we risk overlooking a key question: how do we know when the efforts we have invested in design and development have actually succeeded? What kind of metrics should be applied, and what constitutes success for a chatbot anyway?