A short while ago I undertook a card sorting exercise for a major eCommerce retailer, involving some 200+ items in their product catalogue. Now, there’s been a lot written about card sorting recently, including some excellent overview articles, so in this post I shall focus solely on the experiences that were unique to our exercise.
In particular, we had to deal with a product taxonomy that:
- contained over 200 product items, arranged across 3-4 levels;
- used 10 top level categories that were relatively ‘fixed’
- was organised mainly by product type, but in some cases by product location (typically the room in which the product would be used);
- was polyhierarchical, in that some content items appeared in multiple places.
Our intention was to run a card sort with ~15 participants (which is considered by some to be the minimum number), with each 90 minute session organised as follows:
- Perform a brief inverse card sort or ‘findability test’;
- Perform a closed card sort of the content items using the 10 top-level categories;
- Elicit feedback from the participants on their ‘Top 10’ most sought items (which could be used as probes for a subsequent findability test)

Card sorting (http://www.usability.gov/design/cardsort.html)
Anyway, here is the twist. A key part of our brief was to review the whole taxonomy, i.e. all 200+ items. For this reason, using traditional card sorting techniques wasn’t going to be practical (at least, not in a regular 60-90 minute session). Instead, we decided to use a relatively new technique known as the Delphi method. In this approach, the first participant creates an initial sort, and then the remaining participants review and modify that sort. This imposes a much lower mental load on the remaining participants, rendering a sort of 200+ items a much more tractable proposition (in theory). The other part of the brief was to work within the constraints of the existing top-level categorisation, and for this reason we used a closed sort.
The other twist was that we took the opportunity to use both group-based and individual sorting. Now, there are various pros and cons to group vs. individual sorting, such as groups will give you richer qualitative feedback (think-aloud commentary, etc.), whereas individuals will give you more data points (which, depending on your analysis method, can give you greater confidence in the output). But our primary reason for using groups was that the success of the Delphi method is very dependent on the outcome of the initial sort, so running the first few sessions using groups of 3 would reduce the likelihood of an outlier initial sort propagating through the remaining sessions.
So, that was the plan. We ran the card sort itself in a fairly traditional fashion, i.e. content items printed on cards, with participants being allowed to create new cards, set aside existing ones, highlight problematic ones, create new sub-groups etc. What follows below are my reflections on the experience; in particular what we learnt and what we’d change were we to repeat the exercise:
- The more I think about it, the more I am inclined to agree with Donna Spencer that closed sorts are really only appropriate for narrowly focussed content categorisation tasks. Evidently in our case the client wanted us to analyse all 200+ items including their current top-level categorisation, but my response to this now would be “Is your primary goal to develop an optimal IA, or validate an existing one?” If the former, then an open sort is the way to go, if the latter, then a full-scale findability test seems to be most appropriate. But in neither case does a closed sort seem the right approach.
- Related to the point above, I think it is so important to begin with the end in mind. I made great use of Donna’s Excel template for card sorting analysis – it not only produces great results but also forces you to think more deeply about your overall methodology. It does have a limit of only 200 cards, which I had to creatively work around in this case, but that limit does reflect a maximum you should attempt to sort in your typical 60-90 minute session (plus some headroom). In hindsight I think a better approach is to find some way of aggregating items so that the overall number becomes more tractable.
- Did the Delphi method work, in the sense of providing a better IA using fewer resources? I think the jury is out on this. Evidently, I can only generalise so far from just one example of its use. What I would say is that the initial sort is crucial, so it makes sense to use a group for that. But we also witnessed a fair degree of oscillation, whereby one participant would suggest a certain modification, and then the next one would remove it, only for the following participant to put it back. But overall I’m not convinced that by the end we had elicited the same depth of analysis that you’d expect from using a true open sort. It seemed that the participants spent more time reviewing the existing structure than thinking about how they would ideally (re)create it. So my instinct is that an aggregate model generated from a large number of interative open sorts is more likely to produce a more representative IA than the Delphi-inspired incremental approach. That said, Celeste’s Lyn Paul’s paper presents a much more scientific analysis of this issue than my anecdotal assessment, so I think you should read her evaluation before forming any judgement. Bear in mind that we should differentiate also between the approach and the execution – I freely admit that I would do things differently next time around, so we shouldn’t infer from my experience an unduly negative assessment of the technique itself.
- I mentioned at the start that this taxonomy was polyhierarchical, in the sense that it has the same content item appearing in multiple places. What is the best way of dealing with this? If “product X” appears in four places in the current taxonomy, should I create four cards, thus reflecting the true taxonomy, or just one, and let the statistical analysis show where (and how many times) it should appear? My instinct was to do the former, but I’d like to understand the rationale for choosing one approach over another.
- Don’t forget to number the cards! It’ll make your subsequent analysis so much easier. Use pencil and write lightly on the back, in such a way as participants won’t notice. And write the label in the middle at the top, so that cards can be nested easily with their labels still visible.
- Draft appropriate templates in advance for your data collection. With multiple participants you need to automate this as much as possible, so spend some time creating whatever proformas or templates you think are going to be needed. I created templates for recording the outputs of both the findability test and the ‘Top 10’ exercise (in addition to using Donna’s spreadsheet for the card sort).
- Alternatively, consider using one of a number of online tools for card sorting. Initially I was a little sceptical of these, as most are chargeable services, and with remote testing you’re going to miss out on so much qualitative feedback. But I have been reasonably impressed with Websort – it gets down to business very quickly and greatly simplifies the remote card sorting experience. We’ll be using this tool for our IA work on the Ergonomics Society website.
Thanks for an interesting read. I was particularly interested to hear that participants spent more time reviewing than recreating the sort.
We faced (and continue to face) the exact same challenges with clients. So much so that we built our own IA validation tool for the “findability” test you described above. (http://www.optimalworkshop.com/treejack.htm if interested).
Sam
[…] from just one particular category, and building her confidence in the effectiveness of doing so. Card sorting is one way to discover with the help from users, what the natural and meaningful categories should […]