In case you missed it, here are details of the latest issue of Informer, which just came out this week. This is again one of our bigger issues, with over twenty articles including conference reviews, feature articles, news and updates from the world of search and information retrieval. Of particular note is the call for a new editor – this is a great opportunity so if you’re interested or would like to know more, drop us a line. For further details see the Informer website. If you fancy becoming a contributor, get in touch!
Informer: Spring 2023 Issue Out Now
In the April issue
To start with IRSG affairs, our Chairman Udo Krushwitz comments on some aspects of the very successful ECIR 2023 conference and our Secretary, Steven Zimmerman has an update on Committee Membership following the 2022 AGM held in November following the Search Solutions 2022 event. The ECIR 2023 conference was a great success, and there are excellent reviews of the Conference itself and the Industry Day which is a popular feature of ECIR. ECIR is also the venue for the presentation of the Karen Spark Jones Award, and I am grateful to William Wang for allowing me to interview him (sadly remotely!) about the way that his career has developed. Clearly a very worthy recipient of the Award. The role of managing the Awards process for the KSJ Award 2023 has now passed from Professor Jochen Leidner to Dr Haiming Liu.
Another Award that IRSG is associated with is the Tony Kent Strix Award, managed by the UKeiG. In 2022 there were, exceptionally, two winners, Professor Ian Ouinis and Dr Ryen White, and Graham McDonald has written a report on their lectures.
On the subject of conferences, IRSG has published an invitation to run one-day events for the Group, details of which can be found on the IRSG web site.
Andy MacFarlane provides a list of forthcoming events (with my thanks for meeting a difficult deadline). Among them are Search Solutions 2023 and (though not an IRSG event) CIKM 2023 in Birmingham in October.
The last few months have seen a torrent of announcements and pontifications about the evolution of LLMs and associated applications. I’ve included a report of an Alan Turing Institute conference I attended in late February as a way of demonstrating the rate of evolution of these technologies. In my tail-end slot I offer some reflections on the speed of development of AIGC applications against the speed of research and research publication. Editors always have an in-tray and I’ve included a few items from mine that you might find of interest.
As a demonstration of ChatGPT in operation, I had invited Steve Zimmerman to write about his transition from academic to practitioner and in this issue you will find both Steve’s original text and the ChatGPT summary version.
Another perspective on the work of search practitioners is a fascinating account by Christoffer Stjernlöf of IR software development for e-commerce search, where high relevance is absolutely essential and yet very challenging to achieve. To complement a feature on e-commerce search I have written a review of the first English edition of Understanding Search Engines by Professor Dirk Lewandowski, amazingly the first book on this topic to be published, 28 years after the launch of Alta Vista. But well worth waiting for!!
In the October issue…
The next issue of Informer will close for copy contributions on 13 October and be published towards the end of the month with full details of the Search Solutions 2023 programme
It will also definitely and absolutely be my last issue! So if you have always looked at Informer and thought you could do a better job than me (very likely!) this is your opportunity to demonstrate your skills. As regular readers are aware, my take on IR life is very much from a practitioner perspective, so perhaps it might be a good time to pass the baton to someone with an alternative perspective.
We had hoped to introduce a new web platform for Informer this year but for a range of reasons this has not happened. It would be ideal to have the 2024 Editor in place by early September at the latest so that we could jointly work on the options for the future based on how the October issue is put together even if there is not time to make a change for the October issue itself. If you would like to chat about what the Editorship entails then contact me on martin.white@intranetfocus.com
The Editor’s in-tray
A small miscellany of search-related items that have arrived in my in-tray recently that you might be interested in
Microsoft Search Hero Mastermind Group
This is a new and very enterprising training course for Microsoft search managers that has a mix of tuition and mentoring spread out over a three month period. The course has been developed by Agnes Molnar, Search Explained. The course is spread out over a three month period.
What Is ChatGPT Doing … and Why Does It Work?
Stephen Wolfram has written a very detailed account of the mathematics and algorithms behind LLMs which is so long that it is now available as a paperback book. It is very well written and although some of the sections slightly passed over my head it provides a level of detail that is missing from the many (many!) academic papers that have been published over the last few months which assume a PhD-level of knowledge. Details of the book are on the web site.
IBM videos on LLMs and Machine Learning
I have been both educated and fascinated by some recent short (around 10 minutes) videos of the science behind language models. The fascination is that the IBM presenters write backwards on a transparent sheet. You have to watch the video to see what I mean, but the videos themselves are a model of clarity and communication
Can You Trust Large Language Models
Foundation Models and Fair Use
This arXiv paper from a team at the Center for Research on Foundation Models is a paper you should at least be aware of even if you currently don’t have the time to work through 61 pages and a great many citations. This paper is just on fair use under US copyright law, which is somewhat different in detail from other jurisdictions. Nevertheless it raises some very important issues that arise from the content-related aspects of LLMs rather than the technology. I can do no better than to reproduce the conclusions to the paper.
“We reviewed U.S. fair use standards and analyzed the risks of foundation models when evaluated against those standards in a number of concrete scenarios with real model artifacts. Additionally, we also discussed mitigation strategies and their respective strengths and limitations. As the law is murky and evolving, our goal is to delineate the legal landscape and present an exciting research agenda that will improve model quality overall, further our understanding of foundation models, and help make models more in line with fair use doctrine. By pursuing mitigation strategies that can respect the ethics and legal standards of intellectual property law, machine learning researchers can help shape the law going forward. But we emphasize that even if fair use is met to the fullest, the impacts to some data creators will be large. We suggest that further work is needed to identify policies that can effectively manage and mitigate these impacts, where the technical mitigation strategies we propose here will fundamentally fall short. We hope that this guide will be useful to machine learning researchers and practitioners, as well as lawyers, judges, and policymakers thinking about these issues.”
CIKM 2023 Birmingham 21-25 October 2023
The 31st ACM International Conference on Information and Knowledge Management (CIKM) will be held in Birmingham on 21-25 October. The last time this conference was held in Europe was 2018! It is not a BCS event but many members of IRSG are involved in the event.
The General Chairs are
Ingo Frommholz, University of Wolverhampton, UK
Frank Hopfgartner, University of Koblenz, Germany
Mark Lee, University of Birmingham, UK
Michael Oakes, Independent Researcher, UK
The Informer Editor is acting as Chair of the Sponsorship Committee. Rumours that I get a percentage from the income I generate are sadly a hallucination.
Search Solutions 2023 London, 21/22 November
A mark-your-diary item for the annual Search Solutions 2023 conference. It will be held at the BCS London office in Moorgate and will be on-site only. Tuesday 21 will be Tutorials Day and Wednesday 22 will be Conference Day. There will be call for papers and tutorials in early May and this will be posted on the BCS IRSG web site . In association with the Conference there will also be the Search Industry Awards.
Karen Spärck Jones (KSJ) Award – 2023 timeline for nominations
[Author Haiming Liu]
Professor Jochen Leidner, the current chair of Karen Spärck Jones (KSJ) Award, presented the trophy to the 2022 KSJ award winner, Professor William Wang, who delivered his keynote presentation at ECIR 2023.
Since Jochen has chaired three years of the KSJ Award, and 2022 is the last year of his term I am now the Chair of the Awards Committee for the 2023-2025 period. You can contact me at h.liu@soton.ac.uk
The call for nominations of KSJ Award 2023 will be out soon. Detailed information about the award and the nomination can also be found on the BCS IRSG KSJ Award page: https://www.bcs.org/membership-and-registrations/member-communities/information-retrieval-specialist-group/awards/karen-spaerck-jones-award/
IRSG 2022-2023 AGM and Committee Elections
The IRSG Annual General Meeting (AGM) took place immediately after Search Solutions 2022, where the committee election results were announced. The draft 2021 minutes can be found here, with 2021 AGM confirmed minutes found here. The full list of current committee members is now available on the IRSG governance page. All committee positions were filled unopposed.
Newly appointed ordinary committee members include Monica Paramita (Sheffield University), Sean MacAvaney (Glasgow University), and David Rau (University of Amsterdam). Reappointed committee members are Udo Kruschwitz (Chair), Ingo Frommholz (Treasurer), and Haiming Liu (Membership Secretary). We wish to thank outgoing committee member Krisztian Balog (ECIR 2022 Committee Member) for their service. Full details of the BCS IRSG committee are provided on our governance page.
A reminder our elections take place every Autumn, so please watch our governance page and the IR listserv to which you can subscribe to for future election announcements. We are very keen to have new members on our committee.
Steve Zimmerman (IRSG Secretary)
Academia and the Enterprise – Steve Zimmerman
Academia and the Enterprise
It is an honour to be asked by a highly respected contributor to the enterprise search community to share my journey from academia into the enterprise. Admittedly, it has been an unusual journey, so perhaps it’s best to say a bit about where things are at the exact moment before diving into the details.
Now
Currently, I am a Senior Data Scientist in the NLP team at a large multinational, and there has never been a more interesting time to work in search and NLP. This is a strong statement given my journey into search and NLP, which began 10 years ago, has always been fascinating. So what makes this journey even more fascinating now? Probably not surprising to you, the latest generation of large language models (LLMs) is what has made the work even more interesting. A former colleague told me about ChatGPT on December 1st and said it will be as big as Google.
Alan Turing Institute Conference on LLMs February 2023 – a historical perspective?
[Author Martin White]
This symposium, organized by the Alan Turing Institute, was held at the IET, Savoy Place on 23 February and attracted around 350 delegates, including what seemed to be the entire UK machine learning research community. There were seven presentations and a panel session that I was not able to stay for.
(Note from the Editor. Two months seems to be a lifetime in LLM world and I thought twice about including this! However it does highlight the very proactive role that the Alan Turing Institute is taking on behalf of the UK AI community)
I came away with pages of notes made at the symposium but as I have worked through them I have decided not to report on a paper-by-paper basis but instead to synthesize what to me were some (certainly not all!) of the take-aways of the day.
ECIR 2023 Industry Day
[Author Michael Upshall}
The European Conference on Information Retrieval (ECIR) seems to have emerged from the pandemic stronger than ever. This year, in Dublin, saw the highest number of attendees ever, I’m told, at over 380 in-person and virtual attendees. I’m not a computing academic, so I don’t sit through the academic presentations, but the last day of the conference is termed the Industry Day, and is designed to straddle the divide between the academy and the real world. This year, there were around 60 stalwarts who stayed the course for a very full day of no fewer than 13 presentations. What makes the industry day exceptional is the range and number of questions asked by the audience: there is an informal air to the event that, I think, encourages discussion. This was a conference where the questions were not to display the questioner’s knowledge, but to provide a vital reality check. Have you tried this with users? What do you do about fake news? Is there a feedback loop once it goes live?
Call for proposals for IRSG one-day events in 2023
The Information Retrieval Specialist Group (IRSG) of the BCS invites proposals for the organisation of one day events supported by BCS. Proposals will be evaluated based on the organisational and financial plans and benefits to the Information Retrieval community.
Important dates
* Submission deadline for this round: 19-May-2023
* Notification: 02-Jun-2023
Relevance under uncertainty – the commercial realities of IR development
[Author Christoffer Stjernlöf]
Christoffer Stjernlöf is a software engineer and site reliability engineering manager at FactFinder (previously Loop54), which provides on-site search, navigation, and 1:1 personalisation for e-commerce businesses. Christoffer has studied information technology and computer science at the Royal Institute of Technology in Stockholm. He likes applying wisdom across domains and reusing existing solutions to tough problems in unexpected ways.
Relevance Under Uncertainty – How Loop54 does software engineering to advance relevance
Loop54 (on the market under the name FactFinder Infinity) is a technology that integrates with e-commerce stores and determines based on visitor interactions, in real time, which the most relevant products are for each individual user at every moment. It attempts to perform the function a really good salesperson would if you step into a brick-and-mortar store: figure out as quickly as possible exactly what you are interested in and guide you directly to that. Just as with a really good salesperson, the visitor is not meant to notice that anything out of the ordinary happened. This is not the business of definitive rights and wrongs, but ever so many shades of roughly correct.
John Carmack put it fairly well when he said about neural networks that “It is interesting that things still train even when various parts are pretty wrong — as long as the sign is right most of the time, progress is often made.”
Tony Kent Strix Award 2022 lectures
[Author Graham McDonald]
The 2022 annual memorial lecture for the International Tony Kent Strix Award hosted by the UK electronic information Group (UKeiG) in partnership with the International Society for Knowledge Organisation UK (ISKO UK) , the Royal Society of Chemical Information and Computer Applications Group (RSC CICAG) and the British Computer Society Information Retrieval Specialist Group (BCS IRSG), was held on Thursday February 23rd 2023, hosted by Dion Lindsay
The award is given in recognition of outstanding practical innovation or achievements in the field of information retrieval. This year, in a break from tradition and accounting for the fact that there was no award given in 2021, there were two recipients of the award: Professor Iadh Ounis (Professor of Information Retrieval at the University of Glasgow) and Dr Ryen White (General Manager and Partner Research Director at Microsoft Research), who is also an alumni of the University of Glasgow Information Retrieval Group. The online event was very well attended and regarded as a great success by the organisers.
Professor Ounis’ talk, titled ‘Perspectives on Experimentation and Reproducibility in Information Retrieval: Then and Now’, discussed the challenges of (and the real need for) reproducing experimental findings in the modern neural information retrieval era. The talk provided great insights into the complex information retrieval pipelines and the long dependency chains that exist between artifacts of modern information retrieval systems. Professor Ounis particularly noted the need, in this modern age, for more granular reproduction methods that can fully replicate all of the ingredients that contribute to the core advancements in information retrieval systems. The talk also provided an overview of how the PyTerrier information retrieval platform, developed at the University of Glasgow, can simplify the process of constructing and replicating modern neural retrieval architectures, by simplifying the process of constructing complex IR pipelines and combining modular system components using standard Python operators and expressions.
Dr White’s talk, titled ‘Intelligent Futures in Task Assistance’, provided an overview of Dr White’s contributions to understanding user interactions in search systems and his goal of providing a better experience for search engine users. In particular, Dr White discussed the importance of productivity assistance and task driven information retrieval in today’s modern era of digital assistants. Dr white provided insights into the necessity of systems to be able to decompose complex tasks to automatically identify and prioritise microtasks, and schedule activities through task duration estimation. The talk discussed the main lessons that have been learned from research on task intelligence and provided numerous insights into the potential future directions of artificially intelligent digital assistants.
Both of the talks were packed with many insights and interesting reflections on the developments that have led to today’s intelligent information age, and what is in store for the future. The slides from the talks are available from Professor Ounis’ and Dr White’s websites (links above) and the video recording of the talks will be made available from UKeiG . I highly recommend checking them out. .
[Graham McDonald is a lecturer in Information Retrieval at the School of Computing Science, University of Glasgow. His research interests include responsible and fair information retrieval, sensitivity-aware search, and active-leaning strategies in decision support systems for document review tasks]
Understanding Search Engines – Dirk Lewandowski
[Author Martin White]
It is remarkable that this book is unique in its coverage of the development of the technology, business and impact of web search. The web search engines play such an important role in our lives and our business activities that we take them for granted. It’s not just a lack of books but a lack of research papers as well, other than those that look at elements of the search process. Last year I published a history of enterprise search and trying to confirm dates and vendors and technical developments was a far from easy exercise. The development of online retrieval services was expertly documented by Charles Bourne and Trudi Bellardo Hahn in 2002 and Stephen Arnold’s interviews with the pioneers of online and web search give a sense of the pace of change from 2008 -2012 but the focus is primarily on enterprise search.
The arrival of this English edition of a book originally published in German by an author with an outstanding reputation in the science and the use of web search is very timely given the launch of so many AI-Generated Content (AIGC) applications, all of them offering search-by-prompt rather than search-by-query. Dirk Lewandowski is a Professor of Department of Information, Hamburg University of Applied Sciences and from 2013-2020 was Editor-in-Chief of the Aslib Journal of Information Management. Read more…
Events Spring 2023
One Day Events
Search Solutions 2023: The groups annual industry focused event, includes a tutorial day. 21-22 November 2023.
ECIR 2023 Dublin – conference report
[Author Gregor Donabauer]
[Gregor Donabauer is a PhD student and research assistant at the chair of Information Science at the University of Regensburg (Germany). Before that he completed his Bachelor degree in Information Science/Business Information Systems and his Master degree in Information Science, both at the University of Regensburg. His main interests are on Natural Language Processing and Machine Learning.
Last week I attended the 45th edition of the European Conference on Information Retrieval (ECIR) which was hosted in Dublin, Ireland. Given that this was only my second in-person conference I have ever attended it was great to bump into several familiar faces throughout the week. One reason for that could be that this year’s ECIR was the biggest ECIR of all time with a record-breaking number of more than 400 registered attendees. The high level of interest was evident on-site, as more than 300 people attended the keynote talks and receptions.
Karen Spärck Jones Award 2023 – Timeline for nominations
Professor Jochen Leidner has been the Chair of the KSJ Awards panel since 2019. As his term of office has now come to a conclusion Dr. Haiming Liu (University of Southampton) is taking on the role for the next three years. The Award now alternates between ECIR and the Annual Conference of the European Chapter of the Association for Computational Linguistics promote integration between the IR and NLP communities. Karen Spärck Jones was an active member of both communities.
The call for nominations of KSJ Award 2023 will be out soon. Detailed information about the award and the nomination can also be found on the BCS IRSG KSJ Award page. The 2023 Award lecture will take place at EACL 2024, the location of which has not yet been confirmed
Timeline for the 2023 Award:
- 1 September 2023 — closing date for nominations;
- 8 September 2023 — deadline for support letters;
- 8 December 2023 — notification of the prize recipient;
- April 2024 — recipient presents keynote at EACL 2024.
IRSG would like to thank Microsoft Research Cambridge, the generous sponsors of the Award.
For further information please contact Dr Haiming Liu (KSJ Award Chair 2023-2025), h.liu@soton.ac.uk
An interview with William Wang – KSJ Award Winner 2022
[Author Martin White]
I asked William Wang, who at ECIR 2023 was presented with the Karen Spark Jones Award for 2022, if he could respond to a series of questions about his background and career. I am most grateful to William for the care he put into his replies.
What were your aspirations at high school?
I have been interested in computers since my father bought me an Intel-586 desktop in elementary school. During my junior high and high school years, I was passionate about writing HTML, PHP, and ASP for building websites that provide knowledge for online games. Creating online resources for gamers was a great way to share knowledge and expertise, and it can help fellow gamers improve their skills and enjoy the game even more. It requires a lot of hard work and dedication. Since then, I have become interested in building better technology to provide people with better access to knowledg Read more…
From Udo Kruschwitz – BCS IRSG Chair
Welcome to our Spring 2023 edition of Informer! Have we seen you in Dublin? Not inconceivable given it was the biggest ECIR ever in many respects including number of attendees (more than 400 registrations!). It felt as if everbody was there … Well, in case you missed the conference there is plenty of reading material in this newsletter including a conference report by Gregor Donabauer as well as a review of Industry Day by Michael Upshall, not to forget the interview with William Wang, the winner of the Karen Spärck Jones Award 2022 and a keynote speaker at this year’s ECIR.
In fact, Martin has done all his magic once again to compile a new edition of Informer that feels bigger and more varied than ever. I let you explore it rather than pointing out each contribution I particularly like (of which there are many).
Let me just highlight one item of interest to our group in particular. The CORE conference rankings are currently being updated and it is very reassuring that the compilation of material to support the case for ECIR to remain an A-ranked conference is in the safe hands of Sean MacAvaney, Lecturer at the University of Glasgow and newly elected IRSG committee member. This task is not an easy one and involves much manual work such as the collection of statistics ranging from citation counts to identifying key members of the IR community and comparison with competitor conferences.
The last month has however also been a very sad one as the wider AI / NLP / IR community lost three key members, Chris Cieri, Dragomir Radev and Yorick Wilks. Their contributions and good spirit will not be forgotten but in each individual case it feels like the end of an era.
Search Solutions 2022 Conference report
[Author Martin White)
Search Solutions is managed by the Information Retrieval Specialist Group of the British Computer Society and is the only broad-spectrum search event outside of the USA. The conference was held at the BCS London office on 23 November, preceded by a day of Tutorials.
The conference was held at the BCS London office on 23 November, the first on-site Search Solutions event since 2019! This is a brief summary of the presentations, with a link to the author and also links to research papers and web sites mentioned by the authors in the course of their presentation. Heavy note taking!
The conference opened with a presentation by Natasha den Dekker on the approach being taken by LexisNexis to understand the expectations of users and the extent to which the search applications meet them. In the process Natasha gave a very good introduction to user research, describing the differences between behavioural and attitudinal techniques. with an emphasis on the benefits and challenges of A/B testing. She also highlighted the importance of diary studies, which take a lot of effort to set up and execute but bring substantial rewards in understanding the day-by-day use of a search application. (See also https://www.nngroup.com/articles/guide-ux-research-methods/)
The next paper was presented by Amy Walduck over a Zoom link from Brisbane, Australia. Amy started with a moving acknowledgement of the debt that Queensland owed to its antecendents. Amy described a topographical approach to understanding large-scale user logs of over 8 million searches a year on the Library Catalogue, all based on open source software and open data that had been redacted to remove any personal information. Amy remarked that there had been a steady trend over the last few years of queries being framed as questions, in particular ‘How’ and ‘What’ question formats. The software application was constructed with open source software.
After a break Brammert Ottens (Spotify) outlined the search strategy that had been adopted by the company, supporting both text and voice search. He framed his presentation around Mindsets (Focused, Open and Exploratory) and Intents (Listen, Organise, Share and Fact Check). Spotify are fortunate in being able to follow the history of a search as it has data on what the user then listened to and for how long, making it easier (but still very challenging at scale) to optimize the search experience. (See also https://dl.acm.org/doi/10.1145/3290605.3300529)
Another large scale search implementation was described by Mohamed Yahya from Bloomberg. He focused on recent efforts to develop question answering functionality, with the criterion that the outcome has to be correct at the time of presentation and explainable. The target was high precision rather than high recall. The system took a view on whether the question was answerable, given the scope of the repository, and if there was not adequate confidence the response was presented as a display of results rather than a narrative text response.
Of course, when it comes to scale Google takes the accolade. Filip Radinski talked about the increasingly blurred boundary between search and recommendation, focusing on the challenges of searching for film information based on soft attributes, such as scary, uplifting and boring. This comes down to the issue of subjectivity, which Filip discussed in terms of degree, semantic and compositional. Filip reflected on a number of overarching issues in his paper, including transparency (data, model and algorithm) and the lack of an adequate range of corpora to work on natural language search. (See also https://arxiv.org/abs/2205.09403 )
After lunch Farhad Shokraneh gave a quite impassioned paper about the problems that systematic searching gives rise to in a paper entitled ‘Futures of Systematic Searching’ in which the plural was not a spelling mistake! Farad started out describing the process of setting up a systematic review and the challenges of coping with a situation where the review process was in effect invalidated because of one or more research papers being published since the original scope of the review had been finalized. He emphasized that it was not just a matter of rerunning the search as more recent research might require the scope and strategy to be reconsidered. Another issue he mentioned when a machine learning routine decided to downgrade the relevance of papers that did not have an abstract. Farad concluded by presenting four versions of the future of systematic reviews. (See also https://www.sciencedirect.com/science/article/pii/S266730532200031X )
Gavin Moore (University Hospitals Coventry & Warwickshire NHS Trust) continued the healthcare theme with an application that he and Andrew Doyle had developed to be able to store and search clinical guidelines. I know from a project I carried out a few years ago for a major hospital that this is far from a trivial challenge as there are both Trust and NHS wide guidelines which up until March 2022 were maintained by NICE. The solution was based on the Google app and was an excellent example of how a very effective search solution could be developed with very limited resources.
The final session of the day was on enterprise search, which started out with Cedric UImer and Julien Massiera giving a demonstration of integrating Spacy into the Datafari open source application to give an enhanced semantic search capability, including entity extraction and refinement. (See also https://irsg.bcs.org/informer/2022/11/the-evolution-of-datafari-a-european-open-source-enterprise-search-application-cedric-ulmer-ceo/)
This was followed by Paul Lewis describing a project that he and his colleagues at Pureinsights were working on at the Publications Office of the European Commission. Currently this is working in just two languages (English and French) but in time will be expanded to most, if not all, of the official EU languages. What was notable about this implementation was the use of a knowledge graph developed out of the Oracle RDF repository, together with a quite complex content processing stack to deliver a very high-quality search experience. Both this presentation and the previous one from Datafari highlighted the move towards hybrid search applications built on a stack of individual components.
The conference concluded with a number of lightning presentations, each lasting five minutes, from Andy Neill and Richard Giazzi ( the Thompson Reuters HighQ deal support application), René Kreigler (OpenSource Connections) on the effective management of e-commerce search and Sean MacAvaney (University of Glasgow) on rethinking reranking. Cedric Ulmer reminded everyone of the four freedoms of open source software, namely the freedom to use, the freedom to distribute, the freedom to modify and the freedom to understand (exemplary documentation).
Next up were the Search Industry Awards, managed by Tony Russell-Rose.
The winners were
Best Search User Experience – Reza Rawassizadeh and Yi Rong working on ODSearch at Boston University https://paperswithcode.com/paper/odsearch-a-fast-and-resource-efficient-on
Most Promising Startup Giotto – Matteo Caorsi Chief Technology Officer https://compliance.giotto.ai/
Search Professional of the Year Adam Tocock The Hillingdon Hospitals Library Services
Best paper at Search Solutions 2022 (voted by the audience) Filip Radinski, with Farad Shokraneh and Phil Lewis tied for second place
ChatGPT take on Academia and Enterprise
The following is the version authored by ChatGPT of Steven Zimmerman’s article titled ‘Academia and the Enterprise‘. This version is notably shorter than the original article, and the facts that remain are all correct. However, ChatGPT has removed many anecdotes, including the mention of SkyNet (should we be concerned). Which version do you prefer?
Academia and the Enterprise
It is an honour to be asked by a highly respected contributor to the enterprise search community to share my journey from academia to the enterprise. Admittedly, my journey has been unusual, so perhaps it’s best to provide some context before diving into the details.
Now
I am currently a Senior Data Scientist in the NLP team at a large multinational, and I can confidently say that there has never been a more interesting time to work in search and NLP. My journey into this field began 10 years ago, and it has always been fascinating, but the latest generation of large language models (LLMs) has made the work even more interesting.
A former colleague introduced me to ChatGPT on December 1st and claimed that it would be as big as Google. Now, just over four months later, I tend to agree with this assessment. The initial impact of ChatGPT is so significant that even South Park recently aired an entire episode about its powers and related dangers, co-written with ChatGPT nonetheless. It’s noteworthy that there is yet to be an episode devoted to the release of Google.
Admittedly, there is nothing that new with respect to ChatGPT as it builds upon an existing body of research in the space of generative AI. While there has been buzz around models like DALLE and deep fakes in recent years, ChatGPT is the first generative LLM that has garnered mass attention and permitted easy interaction.
Personally, I was blown away by ChatGPT as it was the first AI-based interactive dialogue system that had a feeling of being “real”. However, I quickly realised that there were big holes in many of the legitimate sounding responses it gave, which those in the business of AI and NLP refer to as “hallucination”. This raises questions about whether we should place so much belief in a capability from which its designers caution us that it will “hallucinate” from time to time.
For me, this question ties directly back to my academic research in the somewhat recent emergent field of interactive information retrieval (IIR), which focused on risk mitigation of harms on the Web. Due to this latest technology, there has never been a greater potential for harm, and paradoxically there has never been a greater potential for benefit. It turns out that there has never been a more important time for IIR to play a role in the development of methods and evaluation approaches for the safe usage of this capability. ChatGPT opens up many new research avenues to explore, and the research possibilities on the Web and in the Enterprise are not only massive but also highly important.
Before Now
It may interest some of you to know that I come from a family of computer scientists who have worked for large tech companies. However, I was initially hesitant to follow in their footsteps due to their gruelling work hours. Nevertheless, I found myself working in computing after finishing undergrad when job opportunities were scarce. While working as a contractor in various menial jobs, I took a few computing courses at Northeastern in Boston and soon found myself working full-time as a programmer at a large financial company.
After five years in technology, I took a break to explore the possibility of pursuing graduate studies in atmospheric physics at Cornell. After a couple of years of studying the fundamentals of atmospheric science, I realised that I was more interested in the computing aspects and less interested in deriving the fluid dynamics of the atmosphere. Though I developed my abilities to solve difficult problems independently while at Cornell, I no longer felt excited about an academic career in atmospheric sciences.
Around 2013, I first heard about NLP and the emerging field of data science through a well-known article on the topic, which sparked a flame in me. A well-timed life event led me to relocate to England, and I had the opportunity to join a newly created MSc programme that focused on NLP and search. At the London Text Analytics meetup, which was co-run by Udo Kruschwitz and Tony Russell-Rose, I connected with many companies that were hiring, including the small startup in a garage in Belsize Park that I interned at between my first and second year of my MSc. That startup has now grown into a much larger company called Signal AI.
After completing my MSc, I found full-time work in the data science team of a large newspaper, where I developed document classification pipelines and prototype recommender engines. Timing played an important role here too; Udo Kruschwitz contacted me about an ESRC-funded research grant that looked at human rights in the digital age, which aligned with my concerns about online misinformation campaigns. Specifically, I was very concerned about the false claims surrounding the Brexit referendum. This led me to focus my PhD research on harm mitigation on the Web, initially on hate speech mitigation but then pivoting towards the consideration of the human in the system.
Around the time I submitted my paper on this topic to LREC for review, I attended the Autumn School for Information Retrieval and Foraging (ASIRF) at Dagstuhl and read Daniel Kahneman’s “Thinking, Fast and Slow”, lent to me by a fellow PhD student in the Psychology department who researched judgement and decision making in medicine. Attendance at ASIRF introduced me to many great researchers, most notably David Elsweiler, who lectured on the fundamentals of IIR studies. The book and the Autumn school were the foundation for a rapid update to my PhD research plan to include the consideration of the human in the system. This shift in research led to co-authored papers with David Elsweiler and the aforementioned PhD student (Alistiar Thorpe).
Concurrently with my PhD research, my advisor encouraged me to explore avenues in the private sector. He connected me with an enterprise search expert at a large energy company in London, which led to an internship that took place during my PhD. This internship transitioned to my current full-time role as a search and NLP researcher in the private sector. My research is predominantly in the private sector and heavily focused on enterprise search. Applications of NLP and search have interested me from the first day I set foot in the field.
I close with some key learnings from my experience.
When considering an advanced degree in Search/NLP
- It’s beneficial to take an interdisciplinary approach to your research. While my core research was in computer science, it also considered a broad set of fields. In today’s world, we can’t afford to take a narrow view of the problems we face.
- Pursuing a PhD is a massive commitment, and I strongly advise against self-funding.
- While ideology can be a great motivator for research, it’s important to be prepared to let it go. My experiences with hate speech research taught me a lot about this matter.
For those pursuing or recently enrolled in a PhD program, here are some helpful tips:
- Dive into hands-on work early on in your PhD. Start building experiments and aim to publish your findings as soon as possible.
- Consider applying for a doctoral consortium, such as the one offered by SIGIR. This is a fantastic opportunity to connect with other researchers in your field and gain valuable experience.
- Attend summer schools to expand your knowledge base and build connections with potential co-authors. For example, both the ASIRF and the summer school for Bounded Rationality at the Max Planck Institute for Human Development are great options.
- Consider doing an internship or placement at a company to get a sense of whether academia, the private sector, or a combination of the two is the right fit for you.
When it comes to choosing between academia and industry, it’s important to understand that it’s a spectrum, and you need to find what’s right for you after your PhD. There are several considerations and possibilities to keep in mind:
- Evaluation is much more straightforward in academia than in the private sector. Academia offers greater experimental control, while industry has many moving parts and people to work with.
- Pure industry jobs tend to pay more, but pure academia offers more freedom (although this freedom has eroded in recent years).
- Industry also offers the opportunity to investigate interesting research problems in search and NLP, but the problem is typically business-driven, making it easier to define.
- Some private sector companies offer research positions that allocate some time for academic work outside of the company.
- It’s common for individuals with full-time academic appointments to do side research in the private sector.
- It’s possible to work in the private sector and still maintain an academic affiliation to conduct research on the side.
- If you’re interested in a full-time academic appointment, it’s important to talk to people in that field and fully understand the responsibilities involved, which are quite different from a PhD or post-doc. You’ll also have to create course syllabuses, teaching slides, grade assignments, and do administrative work.
And finally – from the Editor
I’ve included a conference report in this issue of Informer on the Alan Turing Institute conference on LLMs (held in London on 23 February) to give an indication of the speed of development of LLMs and related applications (ChatGPT, Microsoft Copilot and oh so many more!) over the last few months.
The opportunities for research into the performance and possibilities of LLMs (I’m using this in a very generic way) are both colossal and essential if we are to get the best from this technology and avoid the worst it has to offer. It has struck me that the publication of this research is not keeping up with the speed of development. Even in journals that pride themselves on early publication the papers have a historical perspective which is interesting but of questionable long term value. There is also the challenge of finding peer reviewers that have an appropriate level of expertise in the topics.
_____________________________________________________________________
Opportunities for Authors
If you are an expert in information retrieval or any aspect of search who has strong writing skills, we invite you to contribute to Informer. Please send an article proposal to us at: irsg@bcs.org.
For more information about the BCS IRSG, please go to: http://irsg.bcs.org/about.php
About Informer
Informer is the quarterly newsletter of the BCS Information Retrieval Specialist Group (IRSG). Its aim is to provide insights and inspiration to researchers and professionals working in all aspects of search and information retrieval. Our articles provide accessible and timely coverage of important topics, ranging from focused, practical advice, to concise overviews of broader topics, and to deeper, research-oriented articles and opinion pieces.
The IRSG is a Specialist Group of BCS. Its mission is to provide a focus for the European IR community, facilitate communication between researchers and practitioners and promote the adoption of IR research within industry. We host a major European conference (ECIR) and provide an associated programme of workshops, seminars and events. The IRSG is free to join via the BCS website, which provides access to further IR articles, events and resources.
BCS is the industry body for IT professionals. With members in over 100 countries around the world, BCS is the leading professional and learned society in the field of computers and information systems.
_____________________________________________________________________
Visit Informer at http://irsg.bcs.org/informer/
If you have comments, questions, or suggestions for Informer, please contact us at irsg@bcs.org.uk
Leave a comment