Feeds:
Posts
Comments

Archive for the ‘Text analytics’ Category

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 26,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 10 sold-out performances for that many people to see it.

Click here to see the complete report.

Read Full Post »

Diana Maynard entertains the masses

Diana Maynard entertains the troops

Last week I had the privilege of organising the 13th meeting of the London Text Analytics group, which featured two excellent speakers: Despo Georgiou of Atos SE and Diana Maynard of Sheffield University. Despo’s talk described her internship at UXLabs where she compared a number of tools for analysing free-text survey responses (namely TheySay, Semantria, Google Prediction API and Weka). Diana’s talk focused on sentiment analysis applied to social media, and entertained the 70+ audience with all manner of insights based on her expertise of having worked on the topic for longer than just about anyone I know. Well done to both speakers!

(more…)

Read Full Post »

Expectation Maximization applied to a new sample of 100,000 sessions

In a previous post I discussed some initial investigations into the use of unsupervised learning techniques (i.e. clustering) to identify usage patterns in web search logs. As you may recall, we had some initial success in finding interesting patterns of user behaviour in the AOL log, but when we tried to extend this and replicate a previous study of the Excite log, things started to go somewhat awry. In this post, we investigate these issues, present the results of a revised procedure, and reflect on what they tell us about searcher behaviour.

(more…)

Read Full Post »

EM, 7 features

As I mentioned in a previous post I’ve recently been looking into the challenges of search log analysis and in particular the prospects for deriving a ‘taxonomy of search sessions’. The idea is that if we can find distinct, repeatable patterns of behaviour in search logs then we can use these to better understand user needs and therefore deliver a more effective user experience.

We’re not the first to attempt this of course – in fact the whole area of search log analysis has an academic literature which extends back at least a couple of decades. And it is quite topical right now, with both ElasticSearch and LucidWorks releasing their own logfile analysis tools (ELK and SiLK respectively). So in this post I’ll be discussing some of the challenges in our own work and sharing some of the initial findings.

(more…)

Read Full Post »

Over the last few months I have been working with Paul Clough and Elaine Toms of Sheffield University on a Google-funded project called ‘A Taxonomy of Search Sessions’. A session, in case you’re wondering, is defined as a period of continued usage between a user and a search application. So if you spend a while Googling for holiday destinations, that’s a session. Sessions are interesting because they form a convenient unit of interaction with which to study usage patterns, and these can provide insights that drive improved design and functionality.

(more…)

Read Full Post »

Valentin Tablan kicks things off (photo: Hercules Fisherman)

After a brief hiatus I’m pleased to say the London Text Analytics meetup resumed last night with an excellent set of talks from the participants in the AnnoMarket project. For those of you unfamiliar, this project is concerned with creating a cloud-based, open market for text analytics applications: a kind of NLP ‘app store’, if you will. The caveat is that each app must be implemented as a GATE pipeline and conform to their packaging constraints, but as we’ve discussed before, GATE is a pretty flexible platform that integrates well with 3rd party applications and services.

(more…)

Read Full Post »

« Newer Posts - Older Posts »