Regular readers of this blog will know that over the last few months I’ve been looking in detail at the process of search strategy formulation, i.e. the various ways in which professionals go about solving the problem of resolving complex information needs.
Some professions (e.g. recruitment professionals) employ complex search queries to address sourcing needs, generating queries such as this:
(“business analyst” or “systems analyst” or “system analyst” or “data analyst” or “requirements analyst” or “functional analyst”) and crystal and report* and analy* and data near analy* and not inventory and not retail and not (ecommerce or “e-commerce” or b2b or b2c)
This particular query is designed to retrieve candidates who match a typical client brief. As you can see, it’s essentially a complex Boolean expression, and the challenge of creating and optimising such expressions is the subject of a number of social media forums.
Other professions adopt a different approach. Healthcare professionals, particularly those that are involved in the creation of systematic (literature) reviews, tend to adopt a line by line approach such as this (the published Medline strategy for Oral protein calorie supplementation for children with chronic disease):
- randomized controlled trial.pt.
- controlled clinical trial.pt.
- randomized.ab.
- placebo.ab.
- clinical trials as topic.sh.
- randomly.ab.
- trial.ti.
- 1 or 2 or 3 or 4 or 5 or 6 or 7
- (animals not (humans and animals)).sh.
- 8 not 9
- exp Child/
- ADOLESCENT/
- exp infant/
- child hospitalized/
- adolescent hospitalized/
- (child$ or infant$ or toddler$ or adolescen$ or teenage$).tw.
- or/11-16
- Child Nutrition Sciences/
- exp Dietary Proteins/
- Dietary Supplements/
- Dietetics/
- or/18-21
- exp Infant, Newborn/
- exp Overweight/
- exp Eating Disorders/
- Athletes/
- exp Sports/
- exp Pregnancy/
- exp Viruses/
- (newborn$ or obes$ or “eating disorder$” or pregnan$ or childbirth or virus$ or influenza).tw.
- or/23-30
- 10 and 17 and 22
- 32 not 31
In this type of formalism, the search strategy is built up incrementally, as a set of discrete expressions which are referred to by line number and combined using various operators. This type of procedural approach has the advantage that strategies can be built up using techniques such as successive fractions, building blocks, and so on. It also allows the searcher to review the number of results returned at each step, and refine the expression accordingly.
Over the last few months I’ve got used to seeing some quite complex search strategies, often extending over a hundred lines or more. However, a few things about the formalism still strike as being a bit odd.
Firstly, the use of logical statements connected via numbered lines above does rather remind me of first generation BASIC. I’m not saying that the language didn’t have its place, but several decades on we’d like to think we now have recourse to rather more structured approaches. But more to the point, what’s happening with all those line numbers – are they really the best way to organize a collection of logical expressions? Just when we most need a principled mechanism for structuring our approach, it seems we are forced to rely on something as arbitrary as a line number. As any undergraduate computer scientist will tell you, the liberal use of such ‘goto’ statements is indeed considered harmful.
Secondly, and continuing the programming language metaphor, I wonder just how much support there is for constructing expressions that are syntactically correct and semantically transparent. A well-designed (programming) language, for example, should support concepts such as:
- Encapsulation: the concept whereby data and functions are packed into a single component. To a degree, this is true of the line by line approach above, but it is compromised by the lack of facility for naming and invoking discrete elements of computation (other than by an arbitrary number).
- Abstraction: the ability to generalize from a set of behaviours, e.g. the use of a template which can be populated for a given instance. In the example above we can see that lines 11 to 17 are probably intended to express the population element of the PICO process. So why not abstract this component out? That way, when we need to (re)use it, it could be instantiated on a case by case basis, e.g. male adults in strategy X, female infants in strategy Y, and so on. (OK, I know that some people equate abstraction with hiding implementation details, but I think the generalization sense is more pertinent here).
Likewise, I can imagine cases where we would want our search strategy to encompass other concepts such as inheritance, modularity, etc. So I am left thinking: why has the design of search strategies apparently changed so little when programming languages have changed so rapidly?
Of course, if you’re writing the control software for an Airbus 320 you might argue that you need tools and approaches that deal with a few orders of complexity more than your ‘average’ search strategy. But both endeavours are trying to find elegant and parsimonious ways to express complex logical constructs, both are concerned with syntactic correctness, and both need semantic transparency and pragmatic effectiveness. I wonder – is this formalism a bit like the QWERTY keyboard – a flawed and outdated design, but one that is ubiquitous by little more than convention?
Nice piece Tony – thanks!
As to really harmful search strategies, I kept in my files for many years a search command that would tilt the Dialog online information service. I discovered it as a student (having free access to try out the otherwise very expensive Dialog service). Basically, the set of commands generated a set with so many postings that Dialog tilted, came out with an error message and logged me off. The nice thing was that they apologized and said that they would not charge for all of that session. A very useful set of commands to have around in case you’d just spent a lot of credit with Dialog.
Best
– Birger
Interesting thoughts Tony! I wonder whether we should move away from following textual programming language conventions for specifying and manipulating search strategies and towards a following visual programming language conventions? Hopefully we’d still be able to make it easy (or easier at least) for searchers to specify and re-use their query terms. But, this way, we might finally acknowledge that while it’s possible to learn to ‘think in Boolean,’ this shouldn’t really be necessary for constructing search strategies. There are existing visual search tools out there. But have we really tried to create ones that explicitly support search strategy formation and editing?
Thanks for stopping by, Stephann. I’d agree totally about the need to re-think how we articulate and represent search strategies (watch this space for a future post on this topic 🙂 Like the QWERTY keyboard though, I think people just get used to it and it’s probably not in most vendors’ interests to change that.
What kind of search tools did you have in mind when you way there are already ones out there?
[…] Each line consists of a series of keywords, operators and controlled vocabulary terms, which are connected via logical operators and Boolean expressions. The glue that binds all this together is the line numbering (a mechanism not entirely dissimilar to that used in early programming languages such as Unstructured BASIC). […]