18 May 2006


Bias in indexing

The greatest advantage that indexing processes have over automated (computer-only) processes is the human component. Of course, as someone who has worked with humans before, you probably recognize there can be imperfections.

I was reading Struck by Lightning: The Curious World of Probabilities earlier this week, in which the author writes of biases in scientific studies. I realized that these same biases occur with indexes and indexers as well, and I wondered if I can list them all.

(The biggest bias in indexing isn't one of the index at all, but rather the limitations on what the authors write. For example, if a book on art history didn't include information about Vincent van Gogh, I would expect van Gogh to be missing from the index; this absence might be caused by an author bias. However, I am going to focus on biases that affect indexing decisions themselves.)

Inclusion bias. Indexers may demonstrate a bias by including more entries related to subjects that appear more interesting or important to that indexer. For example, I live in Boston, and so I might consider Boston-related topics to be less trivial (more important) than the average indexer; consequently, documentation that includes information about Boston is more likely to appear in my index. I imagine inclusion bias is a common phenomenon in documentation that includes information about contentious social issues -- immigration, tobacco legislation, energy policy -- because the drive to communicate one's ideas on these issues is stronger. I also believe that inclusion bias is not entirely subconscious, and that indexers may purposefully choose to declare their ideas with asymmetric inclusion. It should be noted, however, that biased inclusion would not necessarily provide insight into the indexer's opinion on the subject; creating an entry like "death penalty morality" does not clearly demonstrate whether the indexer actually disagrees with capital punishment.

Noninclusion bias. Similar to inclusion bias, indexers might feel that certain mentions in the text are not worth including in the index because of their personal interests or beliefs. Unlike inclusion bias, however, I suspect noninclusion bias does not appear in regards to contentious issues; conflict is going to be indexed as long as the indexer recognized the conflict has value. Instead, an indexer is likely to exclude things that "seem obvious"; rarely are these tidbits of information controversial. For example, an indexer who is very familiar with computers is likely to exclude "obvious computer things," subjectively speaking; you probably won't find "keyboard, definition of" in such a book.

Familiarity (unfamiliarity) bias. When an indexer is particularly interested in or knowledgeable about a subject, the indexer is likely to create more entry points for the same content than another indexer might. For example, an indexer who is familiar with "Rollerblading" might realize that Rollerblade is a brand name, and that the actual items are called inline skates. This indexer is more likely to include "inline skates" as an entry. Unfamiliarity bias would be opposite, in that multiple entry points are not provided because the indexer doesn't think of them, or perhaps doesn’t know they exist.

Positive value bias. An indexer who has reason to make certain content more accessible for readers to find (and read) is likely to create more entry points for that idea. At the extreme, the indexer will overload access by using multiple categorical and overlapping subtopics, where those subcategories are at a higher granularity than the information itself. For the generic topic of "immigration," for example, an indexer might include categorical entries like "Hispanic immigrants," "European immigrants," and "Asian immigrants," as well as overlapping topics like "Asian immigrants," "Chinese immigrants," and "Taiwanese immigrants," with all of them pointing to "immigration" in general.

There are three types of positive value bias. Personal positive value bias is demonstrated when the indexer himself believes that the information is of greater-than-average value. Environment-based positive value bias is demonstrated when the index is swayed by environment forces, such as social pressures, political pressures, pre-existing media bias, and so on. Finally, other-based positive value bias is demonstrated when the index bows to pressures imposed by the author, client, manager, or sales market (i.e., the person paying the indexer for the job). Although it can be argued that this last type of bias is not the indexer's bias, strictly speaking the indexer can choose to fight any bias forced upon him. For example, a client who instructs the indexer to "index all the names in this book" might interpret this instruction as some kind of market bias, and thus refuse to follow this guideline. In reality, however, most indexers do accept the pressures placed upon them by the work environment, and thus in my opinion take on the responsibility and ethical consequences of this choice.

Negative value bias. It's possible for an indexer to provide fewer entry points for content that he feels is not of great importance to readers -- the direct opposite of positive value bias -- but the reasons for limiting access to that content are probably not related to indexer's perceived value of that content for readers. Instead, indexers are likely to limit access to content when there is a significant amount of similar content in the book, and as such including access to those ideas would either bulk up the index unnecessarily or waste a lot of the indexer's time. For example, if an indexer were faced with a 40-page table of computer terms, it's unlikely that each term would be heavily indexed, even if such indexing were possible and even helpful to readers.

For this reason, I believe that there are three kinds of negative value bias: time-based negative value bias, in which the indexer skimps on providing access in an effort to save time; financially motivated negative value bias, in which the indexer skimps on providing access in an effort to earn or save money; and logistical negative value bias, in which the indexer skimps on providing access in response to logistical issues like software limitations, file size requirements, page count requirements, controlled vocabulary limitations, and the like.

Topic combination (lumper's) bias. This bias is exhibited by indexers who are likely to combine otherwise dissimilar ideas because they find this "lumping together" of ideas to be aesthetically pleasing or especially useful. This kind of bias is visible in the ratio between locators (page numbers) and subentries, in that entries are more likely to have multiple locators than multiple subentries, on average. For example, an entry like "death penalty, 35, 65, 95" shows that the indexer believes the content on these three pages is similar enough that subentries are not required or useful. Topics that start with the same words might also be combined in a more general topic (such as combining "school lunches" and "school cafeterias" into a combined "school meals.") It is worth noting that some kinds of audiences or documentation subjects may tend toward topic combination bias; for this reason, it may be difficult to recognize lumper's bias.

Topic separation (splitter's) bias. This bias is exhibited by indexers who are likely to separate otherwise similar ideas because they find this "splitting apart" of ideas to be aesthetically pleasing or especially useful. As with lumper's bias, splitter's bias is represented by the ratio of locators to subentries throughout an index, in that splitters are likely to create more subentries than would other indexers, on average. It is worth noting that some kinds of audiences or documentation subjects may tend toward topic separation bias; for this reason, it may be difficult to recognize splitter's bias.

These are all the biases I've found or experienced. If you think there's another kind of bias that indexers exhibit, let me know.

The remaining question is this: Is it wrong for an indexer to have bias? That is, should indexers study their own tendencies and work to avoid them? I don't think it's that simple. The artistry that an indexer can demonstrate is fueled by these biases -- experiences, opinions, backgrounds, interpretations -- and perhaps should even be encouraged. An indexer's strengths come from his understanding of not just the material, but also his perceptions of the audience, the publication environment, and the audience's environments. Further, indexers who know and love certain subjects are going to be drawn to them, just as many readers are; these biases aren't handicaps so much as commonalities shared between indexers and readers. Biases will hurt indexers working on unfamiliar materials in unfamiliar media, but under those conditions the biases are the least of our worries; when the indexer is working without proper knowledge, the higher possibility of bad judgment or error is a much greater concern.

If anything, indexers should be aware of their biases because they can serve as strengths -- especially in comparison to what computers attempt to do.

Labels: , , ,

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?