I’ve just received my complimentary copy of “Information Retrieval: Searching in the 21st Century“. This book is co-written with my colleagues on the BCS IRSG committee, and edited by Ayse Goker and John Davies. It has been several years in the making, so my thanks and congratulations go to Ayse and John in doing an excellent job in seeing this through from initial idea to finished volume.
My own chapter, co-written with my ex-Reuters colleague Mark Stevenson, is “The Role of Natural Language Processing in Information Retrieval: Search for Meaning and Structure”. The chapter is perhaps a little more academic in style than the material I would normally write, but hopefully in keeping with the scholarly traditions of the BCS and also consistent with other contributions in this volume.
The book is available from Amazon, and I hope to be able to offer a copy for review in Informer. In the meantime, I’ve appended Stephen Robertson‘s foreword below. If you’d like to volunteer to review the book for Informer, let me know.
In the forty years since I started working in the field, and indeed for some years before that (almost since Calvin Mooers coined the term information storage and retrieval in the 1950s), there have been a significant number of books on information retrieval. Even if we ignore the more specialist research monographs and the ‘readers’ of previously published papers, I can find on my shelves or in my mental library many books that attempt (probably with the IR student in mind) to construct a coherent and systematic way of defining and presenting information retrieval as a field of study and of application.
Often such a book is the work of a single author, or perhaps a pair working together. Such works can clearly have an advantage in respect of coherence; the field is necessarily presented from a single viewpoint. On the other hand, they can also suffer for the same reason. The IR field is rich (more so now than it has ever been), and it is difficult within a single viewpoint to do justice to this richness. Readers, on the other hand, have to be constructed out of the materials to hand: the published papers, each of which has taken its own view, probably with a much narrower field of vision, and different from that of the other chosen papers.
The present book attempts the tricky task of combining the breadth of vision of multiple authors with the coherence of a single integrated work. The richness of the field is apparent in the range of chapters: from formal mathematical modelling to user context, from parallel computation to semantic search.
The topics covered also vary greatly in their historical association with the field. Categorisation, for example, has been around as an IR technique for quite a long time – though Stuart Watt brings a new perspective. Mobile search (David Mountain, Hans Myrhaug and Ays¸e G¨oker), however, is a relatively recent development. The use of formal models (information retrieval models, Djoerd Hiemstra) goes back almost to the beginning, as does experimental evaluation (user-centred evaluation of information retrieval systems, Pia Borlund), though in both cases there have been huge changes in the past decade.
This same decade has witnessed the huge growth of the World Wide Web, and the developing dominance of web search engines (web information retrieval , Nick Craswell and David Hawking) as the glue which holds the web together. For many people today, IR is web search. It is true that there has been a huge amount of influence in both directions: search engines are largely based on techniques from both the IR research community and from previous operational systems, while IR research and
practice in other environments has learnt a great deal from the forcing-house that is the web search space. This dominance of the web as the domain of interest is well reflected in many of the chapters in the present volume.
It is important, however, to remember that IR is not all about web search, and that the web space presents both problems and opportunities which differ from those in other domains. The desktop, the enterprise, specialist collections such as scientific papers are all examples of different domains for which search functionality is a fundamental requirement. There are references to several of these throughout the book, but specific domains with their own chapters are multimedia resource discovery
(Stefan R¨uger) and image users’ needs and searching behaviour (Stina Westman). The user theme is taken further in the context and information retrieval (Ays¸e G¨oker, Hans Myrhaug and Ralf Bierig). More generic problem areas are addressed in cross-language information retrieval (Daqing He and Jianqiang Wang), in semantic search (John Davies, Alistair Duke and Atanas Kiryakov) and in the chapter on natural language processing (Tony Russell-Rose and Mark Stevenson). Finally, a chapter
on performance issues and parallelism (Andrew MacFarlane) addresses more technical computing concerns.
Information retrieval, from being the rather arcane subject in which I did my masters degree forty years ago, has become one of the defining technologies of the twenty first century. I believe the present book does justice to this status.
Stephen Robertson, 2008