I am recruiting sponsored or self-funded PhD students who wish to undertake projects in natural language processing and UX with focus on information retrieval, including the projects listed below.
Note that these topics are based on MSc level project proposals, but most have the scope and ambition to be scalable to PhD level work. Moreover, they are merely ideas at this stage, so feel free to adapt / enhance them to accommodate your own ideas and interests. Note also that this list is not exhaustive: we have other project ideas and proposals which aren’t quite ready for public dissemination.
If you are a self-funded student considering a PhD in any of the topics below please take a look at the further information and/or email me to discuss.
1. NLP projects
1.1 Automated search strategy generation for recruiters
Finding the right information at the right time is a constant challenge. Sometimes, a few keywords in a search box is good enough. But there are times when a more rigorous, precise approach is needed. Up to now, the traditional solution has been to use ‘advanced search’ or specialist ‘line-by-line’ query builders. However, these require the use of complex Boolean expressions and offer limited support for error checking or optimization.
The goal of the project is to investigate ways in which automated support might be provided for generating search strategies from human-centric media. For example, given a natural language job description, can we automatically generate an effective search strategy that finds social profiles of suitable candidates? What kinds of terms should be extracted, and how should they be structured? How might we evaluate its performance? Further details:
- T.G. Russell-Rose & J. Chamberlain “Searching for talent: The information retrieval challenges of recruitment professionals”, Business Information Review, Vol. 33(1) 40–48, March 2016.
- https://insights.dice.com/report/build-better-boolean-searches-strings/
- https://www.2dsearch.com/news/2018/9/11/a-new-way-to-view-your-old-searches
1.2 Automated search strategy generation for students/researchers
The goal of the project is to investigate ways in which automated support might be provided to generate a search strategy from a research question. The sub elements of this problem include:
- Filtering out research questions that are not in some sense (TBD) well-formed
- Parsing the research question into the key facets (e.g. term extraction with some sort of thresholding applied?)
- Generating candidate terms for each facet
- Creating a conjunction of the disjunctions (we assume fairly simple structures here, i.e. no deep nesting or negation)
- Rendering the above search strategy on the canvas, with an appropriate visualization
- Selecting the appropriate database(s) and executing the search
The deliverable should be a working prototype and some evaluation of its performance (test data can be provided).
1.3 Generating query suggestions using Linked Open Data
One form of support for complex search tasks is through the use of automated query suggestions. You can see an example of this in practice at https://2dsearch.com/. However, the current infrastructure would be significantly more valuable if it included resources such as:
- Mesh Entry Terms (https://id.nlm.nih.gov/mesh/)
- Wikidata (https://www.wikidata.org/wiki/Wikidata:Data_access#SPARQL_endpoints)
- ConceptNet (http://conceptnet.io/)
These are publicly accessible as SPARQL endpoints. The goal of this project is to investigate a variety of such resources, extract query suggestions from them and then evaluate them to find an optimal combination. We have plenty of test data for this. This would be a great project for someone who knows Python, data science and knowledge representation. Further details:
- Tony Russell-Rose, Phil Gooch and Udo Kruschwitz, Interactive query expansion for professional search applications. Business Information Review, 02663821211034079, (2021).
1.4 Combining set retrieval with ranked retrieval
Visual searching has the potential to transform the way users think about query formulation and how set retrieval itself should work. Crucially, it also has the ability to incorporate aspects of ranked retrieval into the framework. For example, the use of a two-dimensional canvas allows users to create queries that embody a combination of both approaches: a clear semantics represented by the visual layout, and a relevance ranking as represented by the spatial layout.
However, this has never been tested in practice. The goal of this project is to investigate ways in which set retrieval might be combined with ranked retrieval, at both an analytical level and at an empirical level, by integrating this functionality with a specific search engine that supports both operations (test data can be provided). Further details:
- MacFarlane, Andrew, Tony Russell-Rose, and Farhad Shokraneh. “Search Strategy Formulation for Systematic Reviews: issues, challenges and opportunities.” arXiv preprint arXiv:2112.09424 (2021).
- A. MacFarlane and T.G. Russell-Rose, “Search Strategy Formulation: A Framework for Learning”, Proceedings of 4th Spanish Conference in Information Retrieval, Grenada, 14-16 June 2016.
- Tony Russell-Rose and Jon Chamberlain. “An Open-Access Platform for Transparent and Reproducible Structured Searching”. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). ACM, New York, NY, USA, 1293-1296. DOI: https://doi.org/10.1145/3331184.3331394
1.5 Contextual language modelling for query suggestions
Word embedding methods provide a simple and practical representation of lexical semantics that can be learned in an unsupervised fashion from unannotated corpora. The fact that these methods discover meaning representations automatically makes them attractive as a research tool where our objective is to generate related terms or search query suggestions.
The goal of this project is to investigate the state of the art in distributional models and use them to generate query suggestions then evaluate them to find an optimal combination. We have plenty of test data for this. This would be a great project for someone with an interest in Python, data science and NLP. Further details:
- Tony Russell-Rose, Phil Gooch and Udo Kruschwitz, Interactive query expansion for professional search applications. Business Information Review, 02663821211034079, (2021).
2. UX projects
2.1 Designing an ‘iPlayer’ for search strategies
Complex search tasks rarely start from a blank canvas. In fact, many search problems are variations on an existing theme. In healthcare, for example, information specialists draw on repositories such as the ISSG Search Filters as a source of best practice and predefined templates (or ‘search strategies’). In cases such as these, it is possible to parse such templates and render them directly on a visual canvas. But by rendering them instantaneously we miss a crucial opportunity to help the user understand and interact with that content.
The goal of this project is to design an ‘iPlayer’ for search strategies, allowing them to be downloaded and ‘played’ on demand as executable specifications that ‘build’ in front of the user, from a blank canvas to a completed search strategy. Such a player could provide vital insight into the efficacy and semantics of legacy search strategies, showing in real time how they were constructed, allowing the user to pause and interrogate them at various stages, and, crucially, highlighting the various ways in which they could be optimised, extended, or re-used (test data can be provided). Further details:
- Tony Russell-Rose and Farhad Shohkraneh. “Designing the Structured Search Experience: Rethinking the Query-Builder Paradigm.” Weave: Journal of Library User Experience 3.1 (2020).
- Tony Russell-Rose and Andrew MacFarlane, “Towards Explainability in Professional Search“. Proceedings of the 3rd International Workshop on ExplainAble Recommendation and Search (EARS 2020), in the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). ACM, July 2020, Xi’an, China.
2.2 A universal language for search
It is common practice in many professions (such as healthcare and legal research) to search across multiple databases in parallel. However, each database has its own user interface and query syntax to learn, and this constitutes a significant source of inefficiency and error. In cases such as these, the ability to apply a common approach (or ‘universal search strategy’) and have it automatically ‘translated’ for each database would be highly attractive. However, it does raise the philosophical question: what exactly might a universal language for information needs look like? What kinds of conceptual structures must it accommodate? Is it simply an aggregation of formalisms for the databases it serves, or is there some parsimonious subset that offers the necessary expressive power without superfluous syntactic baggage? The project would suit a student who likes to think big about universals and combine this with a programme of empirical work investigating and reviewing existing representations for search strategies and information needs. Further details:
- T.G. Russell-Rose, J. Chamberlain and F. Shohkraneh “A visual approach to query formulation for systematic search”. Proceedings of the 2019 Conference on Human Information Interaction & Retrieval. ACM, Glasgow, UK, March 10-14.
- Tony Russell-Rose, Jon Chamberlain & Leif Azzopardi, “Information retrieval in the workplace: A comparison of professional search practices“. Information Processing & Management Volume 54, Issue 6, November 2018, Pages 1042-1057. https://doi.org/10.1016/j.ipm.2018.07.003
2.3 Evaluating novel approaches to structured search
The traditional approach to structured searching is to use form-based query builders, such as Clinical Trials Search Portal (http://apps.who.int/trialsearch/AdvSearch.aspx) or PubMed (https://www.ncbi.nlm.nih.gov/pubmed/advanced). However, these require the use of complex Boolean expressions and their output is often compromised by errors and inefficiencies.
Visual searching offers a radical alternative: instead of entering Boolean strings into one-dimensional search boxes, queries can be formulated by combining objects on a visual canvas. This eliminates syntactic errors, makes the semantics more transparent, and offers new ways to share templates and best practices. However, it is vital that a such approaches be formally evaluated, so that end users can have confidence in the efficacy of the product and the approach. The goal of this project is to formally evaluate visual searching approaches from a user-centric perspective. We would aim to measure the usability and accuracy of the approach and compare the results with those of a traditional form-based query builder. Further details:
- T.G. Russell-Rose, J. Chamberlain and U. Kruschwitz “Rethinking advanced search: a new approach to complex query formulation“. Proceedings of the 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14-18.
- Tony Russell-Rose and Farhad Shohkraneh. “Designing the Structured Search Experience: Rethinking the Query-Builder Paradigm.” Weave: Journal of Library User Experience 3.1 (2020).
2.4 Evaluating interactive query suggestions
Knowledge workers such as patent agents, legal researchers and information professionals undertake work tasks in which search forms a core part of their duties. These search tasks are often complex and time-consuming and require specialist expertise to formulate accurate search strategies. Interactive features such as query expansion can play a key part in supporting these tasks. However, we currently have no human-centric baseline against which to measure the performance of such approaches.
The aim of this project is to evaluate the performance of query suggestion algorithms by means of controlled experiments. This will require recruitment of users with varying backgrounds and varying degrees of experience. We would aim to measure outcomes using a variety of metrics, both human-centric and system-centric. Further details:
- Tony Russell-Rose, Phil Gooch and Udo Kruschwitz, Interactive query expansion for professional search applications. Business Information Review, 02663821211034079, (2021).
- Tony Russell-Rose, Jon Chamberlain & Leif Azzopardi, “Information retrieval in the workplace: A comparison of professional search practices“. Information Processing & Management Volume 54, Issue 6, November 2018, Pages 1042-1057. https://doi.org/10.1016/j.ipm.2018.07.003
Leave a comment