Seth Maislin's Indexing Blog: The detailed analysis of indexing mistakes

04 July 2006

The detailed analysis of indexing mistakes

In linguistics, the analysis of error is one means of learning how we cognitively process language. For example, when someone accidentally misspeaks "unplugged the phone" as "unphugged the plone," we discover that both the speaker is a visual learner (because he switched the P blends in the phrase, despite their different sounds) and that the speaker processes language in its component sounds. On the contrary, a speaker who says "unphoned my plug" processes language in morphemes (e.g., root words), and a speaker who says "unplugged my feet" is an aural learner (because phone and feet start with the same f sound). There seems to be an infinity of spoken-language errors possible, including absences, duplications, inclusions, misalignments, substitutions, and transpositions of letters, sounds, morphemes, words, and phrases.

When I evaluate an index, my job is to look for mistakes. As a now-experienced indexer who himself has made mistakes, I know that I can learn much about how an indexer thinks (or doesn't think) by analyzing her errors and accidents. And as with speech, there are innumerable kinds of mistakes available for the unwary indexer: absences, duplications, inclusions, misalignments, misrepresentations, and missortings of page numbers, letters, words, structures, and ideas.

Consider the incorrect page number, such as when content on page 42 is indexed as if it were on page 44. This kind of error tells us that the indexer did not attend properly to detail, perhaps because the working environment (deadlines, tools, etc.) was less than ideal. When a page range appears simplified to a single number, such as when 42-45 appears simply as 42, I am more likely to consider the indexer lazy instead of scatterbrained, though again it is also possible to blame the working environment (including client demands).

Entries that appear in an index but have no value to readers (e.g., the inclusion of passing mentions and other trivia) demonstrate the indexer's ignorance of the audience, or of the indexing process itself. Entries that fail to appear in an index but should (e.g., the under-indexing of a concept) demonstrate either the indexer's ignorance of the audience, the indexer's ignorance of the subject content, or a sloppy or otherwise rushed working process.

Awkward categorizations, such as entries that are mistakenly combined or that doesn't relate well to their subentries, are a clear sign that the indexer misunderstands the content or is too new to indexing to understand how structure is supposed to work. For example, an indexer who creates

American
....Idol (television program), 56
....Red Cross (organization), 341

doesn't think of indexing as a practice of making ideas accessible, but rather as a concordance of words without meaning. Under no circumstances should American Idol or American Red Cross have been broken into halves, let alone combined. Since categorization can be subtle, however, evaluators can learn something interesting about indexers by looking closely at their choices:

writing
....as artistic skill, 84
....fiction vs. nonfiction, 62

In this example, the first subentry defines writing as a trade; it's clear the indexer is comfortable with the idea of a writer. The second subentry defines writing as a process, with a start and finish, such that the process (or journey) of writing could be different when you're writing fiction instead of nonfiction. Analysis of this entry tells us that the indexer doesn't recognize or appreciate the difference between writing (trade) and writing (process). Is the indexer revealing her inner disdain for writers, does she believe that all writers are the same no matter what they produce, or does she simply know nothing about the writing life?

One of the big challenges for indexers is to provide the language that readers will need to find the content they're looking for. When an indexer either offers language that no one will look up or omits the terms that readers prefer, she is demonstrating an ignorance of the audience or of the content, or hinting that the overall indexing process or environment is inadequate. Further, when the indexer fails to provide access from an already existing category entry (for example, if the index has an entry for "writing, fiction vs. nonfiction" but fails to provide the cross reference "See also author" when there are author entries), she tells us clearly that she is unfamiliar with the material. No other combination of errors speaks of subject ignorance as clearly; by failing to connect existing concepts, the indexer shows us gaps in her knowledge of the information map.

There are several kinds of text errors. Misspellings and other typographical errors are a sign of carelessness or insufficient tools. Accidental missortings are a sign of ignorance, poor tools, accelerated schedules, or a failure of communication among publication staff. Ambiguous terms that aren't clarified are caused by indexers who are too limited in their thinking or their assumptions about the audience, indexers who don't know the material, and authors who failed to communicate the ideas clearly enough for the indexer to understand. Finally, odd grammatical choices usually signal a poor production process, such as when two indexes are combined automatically with insufficient editing effort, or a brand new indexer with no formal training.

Before concluding, I would be amiss to ignore errors of formatting. A failure to use consistent styles signals a deficit in tools or attention, whereas awkward or unreadable decisions regarding indentations, margins, and column widths are a big sign that the index designer (who is not necessarily the indexer) has no clear idea whatsoever how indexes work. Missing continued lines communicate the same thing. (On the other hand, exceptional use of formatting, such as the isolated use of italics within a textual label, is a clear sign that the indexer really does understand both the audience and how they approach the index.)

Ignorance, sloppiness, indifference, and confusion: these are shortcomings even a professionally trained, experienced indexer might have, but thankfully they often manifest as isolated exceptions in her practice of creating quality work. But when a single kind of mistake appears multiple times throughout an index -- numerous misspellings, huge inconsistencies of language, globally insufficient access, awkward structures -- we need to be concerned. When we see these, we have an obligation to analyze the indexer. By properly arming ourselves with this knowledge, we can determine for ourselves if the indexer was the wrong choice for a particular project, struggled with the challenges of inferior tools, or simply had a bad day.

Meanwhile, if indexes written by different indexers are plagued by the same exact problem, it's unmistakably clear that the problem is in the systemically faulty publication process: ridiculous deadlines, uncooperative authors, uncaring editors, poor style guides, and so on. In other words, you shouldn't evaluate indexes in isolation. Instead, look at the work of other indexers for the same publisher, as well as the work of other publishers by the same indexer.

Okay, but what if the index is essentially perfect, with no errors at all? Can we still learn something? Yes, we can. The absence of all error tells us something very important about the indexer: She's being underpaid.

Labels: human factors, indexing process, misspellings and other errors

# posted by taxonomist @ 11:04 AM

Comments: Post a Comment

<< Home

Seth Maislin's Indexing Blog

04 July 2006

The detailed analysis of indexing mistakes

About Me

Relevant Links

Some Blogs Seth Might Visit

archives