Earlier this week I gave a talk called “Introduction to NLP” as part of a class I am currently teaching at the University of Notre Dame. This is an update of a talk I originally gave in 2010, whilst working for Endeca. I had intended to make a wholesale update to all the slides, but noticed that one of them was worth keeping verbatim: a snapshot of the state of the art back then (see slide 38). Less than a decade has passed since then (that’s a short time to me đ but there are some interesting and noticeable changes. For example, there is no word2vec, GloVe or fastText, or any of the neurally-inspired distributed representations and frameworks that are now so popular (let alone BERT, ELMo & the latest wave). Also no mention of sentiment analysis: maybe that was an oversight on my part, but I rather think that what we perceive as a commodity technology now was just not sufficiently mainstream back then.
Posts Tagged ‘Text analytics’
Introduction to Natural Language Processing (slideshow)
Posted in Text analytics, tagged natural language processing, NLP, Text analytics, text mining on January 22, 2019| Leave a Comment »
Book review: Deep Text by Tom Reamy
Posted in Information architecture, Search, Text analytics, tagged natural language processing, NLP, Text analytics, text mining on April 4, 2017| Leave a Comment »
When I started the London Text Analytics meetup group some seven years ago, âtext analyticsâ was a term used by few, and understood by even fewer. Apart from a handful of enthusiasts and academics (who preferred the label of ânatural language processingâ anyway), the field was either overlooked or ignored by most people. Even the advent of âbig dataâ – of which the vast majority was unstructured – did little to change perceptions.
But now, in these days of chatbot-fuelled AI mania, it seems everyone wants to be part of the action. The commercialisation and democratisation of hitherto academic subjects such as AI and machine learning have highlighted a need for practical skills that focus explicitly on the management of unstructured data. Career opportunities have inevitably followed, with job adverts now calling directly for skills in natural language processing and text mining. So the publication of Tom Reamyâs book  âDeep Text: Using Text Analytics to Conquer Information Overload, Get Real Value from Social Media, and Add Bigger Text to Big Dataâ is indeed well timed.
Extracting sentiment from healthcare survey data
Posted in Text analytics, tagged machine learning, natural language processing, sentiment analysis, Text analytics on January 20, 2015| Leave a Comment »
A short while ago I posted the slides to Despo Georgiou’s talk at the London Text Analytics meetup on Sentiment analysis: a comparison of four tools. Despo completed an internship at UXLabs in 2013-4, and I’m pleased to say that the paper we wrote documenting that work is due to be presented and published at the Science and Information Conference 2015, in London. The paper is co-authored with my IRSG colleague Andy MacFarlane and is available as a pdf, with the abstract appended below.
As always, comments and feedback welcome đ
ABSTRACT
Sentiment analysis is an emerging discipline with many analytical tools available. This project aimed to examine a number of tools regarding their suitability for healthcare data. A comparison between commercial and non-commercial tools was made using responses from an online survey which evaluated design changes made to a clinical information service. The commercial tools were Semantria and TheySay and the non-commercial tools were WEKA and Google Prediction API. Different approaches were followed for each tool to determine the polarity of each response (i.e. positive, negative or neutral). Overall, the non-commercial tools outperformed their commercial counterparts. However, due to the different features offered by the tools, specific recommendations are made for each. In addition, single-sentence responses were tested in isolation to determine the extent to which they more clearly express a single polarity. Further work can be done to establish the relationship between single-sentence responses and the sentiment they express.
MeetUp review: AnnoMarket – text analytics in the cloud
Posted in Events, Text analytics, tagged cloud computing, information extraction, natural language processing, Text analytics, text mining on February 13, 2014| 1 Comment »

Valentin Tablan kicks things off (photo: Hercules Fisherman)
After a brief hiatus Iâm pleased to say the London Text Analytics meetup resumed last night with an excellent set of talks from the participants in the AnnoMarket project. For those of you unfamiliar, this project is concerned with creating a cloud-based, open market for text analytics applications: a kind of NLP âapp storeâ, if you will. The caveat is that each app must be implemented as a GATE pipeline and conform to their packaging constraints, but as weâve discussed before, GATE is a pretty flexible platform that integrates well with 3rd party applications and services.
Sentiment analysis tools for non-coders?
Posted in Text analytics, tagged natural language processing, sentiment analysis, Text analytics, text mining on June 11, 2013| 7 Comments »
I have an intern who will shortly be starting a project to extract sentiment from free text survey responses from the healthcare domain. She doesn’t have much programming experience, so is ideally looking for a toolkit /platform that will allow her to experiment with various approaches with minimal coding (e.g. perhaps just some elementary scripting etc.).
Free is best, although a commercial product on a trial basis might work. Any suggestions?
Related Posts:
How do you measure the impact of tagging on retrieval?
Posted in Search, Text analytics, tagged evaluation, Information Retrieval, natural language processing, Precision and recall, Text analytics on May 28, 2012| 7 Comments »
A client of mine wants to measure the difference between manual tagging and auto-classification on unstructured documents, focusing in particular on its impact on retrieval (i.e. relevance ranking). Â At the moment they are considering two contrasting approaches:
How do you compare two text classifiers?
Posted in Text analytics, tagged natural language processing, NLP, Text analytics, text classifiers, text mining on April 27, 2012| 9 Comments »
I need to compare two text classifiers – one human, one machine. They are assigning multiple tags from an ontology. We have an initial corpus of ~700 records tagged by both classifiers. The goal is to measure the ‘value added’ by the human. However, we don’t yet have any ground truth data (i.e. agreed annotations).
Any ideas on how best to approach this problem in a commercial environment (i.e. quickly, simply, with minimum fuss), or indeed what’s possible?
I thought of measuring the absolute delta between the two profiles (regardless of polarity) to give a ceiling on the value added, and/or comparing the profile of tags added by each human coder against the centroid to give a crude measure of inter-coder agreement (and hence difficulty of the task). But neither really measures the ‘value added’ that I’m looking for, so I’m sure there must better solutions.
Suggestions, anyone? Or is this as far as we can go without ground truth data?
Text Analytics Summit Europe – highlights and reflections
Posted in Events, Text analytics, tagged Information Retrieval, natural language processing, NLP, sentiment analysis, Text analytics, text mining, User research on April 26, 2012| 4 Comments »
Earlier this week I had the privilege of attending the Text Analytics Summit Europe at the Royal Garden Hotel in Kensington. Some of you may of course recognise this hotel as the base for Justin Bieberâs recent visit to London, but sadly (or is that fortunately?) he didnât join us. Next time, maybe…
Still, the event was highly enjoyable, and served as visible testament of increasing maturity in the industry. When I did my PhD in natural language processing some *cough* years ago there really wasnât a lot happening outside of academia – the best youâd get in mentioning âNLPâ to someone was an assumption that youâd fallen victim to some new age psychobabble. So itâs great to see the discipline finally ‘going mainstreamâ and enjoying attention from a healthy cross section of society. Sadly I wasn’t able to attend the whole event, but hereâs a few of the standouts for me:
Text Analytics meetup
Posted in Text analytics, tagged natural language processing, NLP, Text analytics on February 29, 2012| Leave a Comment »
Interested in text analytics / natural language processing?
Then come to the networking event at the Goat Tavern in London on April 23. This event is co-located with the London Text Analytics Summit â so if you canât make it to the summit itself, join us in the evening for a drink or two and a chance to network with your peers in the NLP community. As with all meetings of the London Text Analytics group, attendance is free of charge â just sign up on the event page. And if that wasnât incentive enough, Iâve included the press release from our good friends at Text Analytics News below.
See you there!