10 July 2006



Many indexers in the business are frustrated that search engines are getting so much publicity and credit, when in fact indexes are, well, more effective!

Usability testing has shown that people prefer indexes over search when it comes to accuracy and comprehensiveness, and yet the same tests show that people prefer search as a technology. In other words, people prefer search engines even though they unanimously agree that indexes are more accurate. This is not that different from the person who insists on lifting the heavy box himself, even though someone stronger and better able has volunteered. "No, no, I'll do it myself," says the searcher.

And then he injures himself. Silly, silly person.

What is at stake, apparently, is something more psychological or emotional. Search engines may offer users a sense of power and control, or a sense of speed, that indexes don't. Further, indexes seem so much more complicated when you glance at them -- words, words everywhere -- and in comparison search is so simple: an empty box. Just type a word and bingo! If you were to stop and look at this behavior you'd realize that there's something subsconscious going on; rationality is losing to some deeper sense of emotionality and self. Search simply feels right in a way that using an index does not, at least not instinctively.

Some indexers take this news with a strong sense of pessimism, seeing this "shift toward the emotional" as paralleling our current lifestyle of sensationalist news and entertainment. They believe that indexes will become extinct in most practical circumstances, because search engines are psychologically preferred -- not to mention faster, cheaper, online, and scalable.

These pessimists aren't wrong.

However, I contend that the pessimists are also looking at the situation completely upside-down. Ask yourself what makes a search engine effective or likeable at all -- that is, what does Google have that seems to draw a majority of Web users not only to the Google.com website but also to license Google technology at their own sites -- and you'll realize that there's indexing on the back end. People don't call it "indexing," necessarily, but the intellectual, rational processes that comprise indexing are still taking place.

The difference, however, is that a search company like Google doesn't really look at the individual words and their instances. Instead, the designers of Google search (and other tools) are looking at how people respond to these words. They are looking at behavioral patterns, and using those patterns to do the indexing for them.

My brother and I used to play a game at ballparks. One of us, when it was his turn, would attempt to turn as many heads as possible without speaking. My brother would turn his head and look over his shoulder casually, then allow his eyes to lock on something imaginary but far behind all the people sitting behind us. He'd tap my on the shoulder and get me to look; I'd play along. He'd point. I'd point, and he'd correct me. Then he'd stand up. And so on. After a while, some of those people who can see us directly in front of them would be curious to know what we're looking at, and they'd turn their heads to see. This would inspire other people to turn their heads, and so on. If we'd done our job well -- it was a game of timing as well as body language -- we could get hundreds of people to look behind them, at nothing.

This kind of behavior explains the popularity of some really stupid websites. Get enough people to visit your website, and Google will acknowledge that there's something about this website worth looking at. Then more people will look at it. Internet-based fads occur weekly, from paparazzi photos to cool advertisements.

If a human were indexing this, the indexer might think, "This isn't so important that it needs to be found a million times." That human is right. But the meta-human looks at what all the humans are already doing and thinks, "There is a cultural need for this content."

For a back-of-the-book indexer to break into the world of mass search, he'll have to give up the words and instead figure out the rules -- linguistic as well as social -- behind how these words are being used. Those rules, which govern how we find things (and not what we find), don't describe the indexing we know at all.

If indexing as we indexers know it is going to survive, we'll have to find that nifty middle ground between the words and the people. It should be easy, given that we already do this, but so far we haven't managed to break into this field at all. Hopefully we'll evolve.

In the next generation, we'll index the indexes.

(Brian Pinkerton developed the first full-text retrieval search engine back in 1994. "Picture this," he explained. "A customer walks into a huge travel outfitters store, with every type of item, for vacations anywhere in the world, looks at the guy who works there, and blurts out, 'Travel.' Now where's that sales clerk supposed to begin?")

Labels: , ,

04 July 2006


The detailed analysis of indexing mistakes

In linguistics, the analysis of error is one means of learning how we cognitively process language. For example, when someone accidentally misspeaks "unplugged the phone" as "unphugged the plone," we discover that both the speaker is a visual learner (because he switched the P blends in the phrase, despite their different sounds) and that the speaker processes language in its component sounds. On the contrary, a speaker who says "unphoned my plug" processes language in morphemes (e.g., root words), and a speaker who says "unplugged my feet" is an aural learner (because phone and feet start with the same f sound). There seems to be an infinity of spoken-language errors possible, including absences, duplications, inclusions, misalignments, substitutions, and transpositions of letters, sounds, morphemes, words, and phrases.

When I evaluate an index, my job is to look for mistakes. As a now-experienced indexer who himself has made mistakes, I know that I can learn much about how an indexer thinks (or doesn't think) by analyzing her errors and accidents. And as with speech, there are innumerable kinds of mistakes available for the unwary indexer: absences, duplications, inclusions, misalignments, misrepresentations, and missortings of page numbers, letters, words, structures, and ideas.

Consider the incorrect page number, such as when content on page 42 is indexed as if it were on page 44. This kind of error tells us that the indexer did not attend properly to detail, perhaps because the working environment (deadlines, tools, etc.) was less than ideal. When a page range appears simplified to a single number, such as when 42-45 appears simply as 42, I am more likely to consider the indexer lazy instead of scatterbrained, though again it is also possible to blame the working environment (including client demands).

Entries that appear in an index but have no value to readers (e.g., the inclusion of passing mentions and other trivia) demonstrate the indexer's ignorance of the audience, or of the indexing process itself. Entries that fail to appear in an index but should (e.g., the under-indexing of a concept) demonstrate either the indexer's ignorance of the audience, the indexer's ignorance of the subject content, or a sloppy or otherwise rushed working process.

Awkward categorizations, such as entries that are mistakenly combined or that doesn't relate well to their subentries, are a clear sign that the indexer misunderstands the content or is too new to indexing to understand how structure is supposed to work. For example, an indexer who creates

....Idol (television program), 56
....Red Cross (organization), 341

doesn't think of indexing as a practice of making ideas accessible, but rather as a concordance of words without meaning. Under no circumstances should American Idol or American Red Cross have been broken into halves, let alone combined. Since categorization can be subtle, however, evaluators can learn something interesting about indexers by looking closely at their choices:

....as artistic skill, 84
....fiction vs. nonfiction, 62

In this example, the first subentry defines writing as a trade; it's clear the indexer is comfortable with the idea of a writer. The second subentry defines writing as a process, with a start and finish, such that the process (or journey) of writing could be different when you're writing fiction instead of nonfiction. Analysis of this entry tells us that the indexer doesn't recognize or appreciate the difference between writing (trade) and writing (process). Is the indexer revealing her inner disdain for writers, does she believe that all writers are the same no matter what they produce, or does she simply know nothing about the writing life?

One of the big challenges for indexers is to provide the language that readers will need to find the content they're looking for. When an indexer either offers language that no one will look up or omits the terms that readers prefer, she is demonstrating an ignorance of the audience or of the content, or hinting that the overall indexing process or environment is inadequate. Further, when the indexer fails to provide access from an already existing category entry (for example, if the index has an entry for "writing, fiction vs. nonfiction" but fails to provide the cross reference "See also author" when there are author entries), she tells us clearly that she is unfamiliar with the material. No other combination of errors speaks of subject ignorance as clearly; by failing to connect existing concepts, the indexer shows us gaps in her knowledge of the information map.

There are several kinds of text errors. Misspellings and other typographical errors are a sign of carelessness or insufficient tools. Accidental missortings are a sign of ignorance, poor tools, accelerated schedules, or a failure of communication among publication staff. Ambiguous terms that aren't clarified are caused by indexers who are too limited in their thinking or their assumptions about the audience, indexers who don't know the material, and authors who failed to communicate the ideas clearly enough for the indexer to understand. Finally, odd grammatical choices usually signal a poor production process, such as when two indexes are combined automatically with insufficient editing effort, or a brand new indexer with no formal training.

Before concluding, I would be amiss to ignore errors of formatting. A failure to use consistent styles signals a deficit in tools or attention, whereas awkward or unreadable decisions regarding indentations, margins, and column widths are a big sign that the index designer (who is not necessarily the indexer) has no clear idea whatsoever how indexes work. Missing continued lines communicate the same thing. (On the other hand, exceptional use of formatting, such as the isolated use of italics within a textual label, is a clear sign that the indexer really does understand both the audience and how they approach the index.)

Ignorance, sloppiness, indifference, and confusion: these are shortcomings even a professionally trained, experienced indexer might have, but thankfully they often manifest as isolated exceptions in her practice of creating quality work. But when a single kind of mistake appears multiple times throughout an index -- numerous misspellings, huge inconsistencies of language, globally insufficient access, awkward structures -- we need to be concerned. When we see these, we have an obligation to analyze the indexer. By properly arming ourselves with this knowledge, we can determine for ourselves if the indexer was the wrong choice for a particular project, struggled with the challenges of inferior tools, or simply had a bad day.

Meanwhile, if indexes written by different indexers are plagued by the same exact problem, it's unmistakably clear that the problem is in the systemically faulty publication process: ridiculous deadlines, uncooperative authors, uncaring editors, poor style guides, and so on. In other words, you shouldn't evaluate indexes in isolation. Instead, look at the work of other indexers for the same publisher, as well as the work of other publishers by the same indexer.

Okay, but what if the index is essentially perfect, with no errors at all? Can we still learn something? Yes, we can. The absence of all error tells us something very important about the indexer: She's being underpaid.

Labels: , ,

This page is powered by Blogger. Isn't yours?