Seth Maislin's Indexing Blog

02 August 2006

We're lost without an information education

A few years ago, one chapter of the American Society of Indexers created a bumper sticker: "If you don't teach your children about indexing, who will?" Now that my daughter is old enough to repeat words if I ask her to -- "Say cucumber." "Kuku." -- I tried one of my job words on her.

"Say index," I said.
"Icks," she replied.

Given how my colleague Rachel taught her three-year-old to recite how bad a book is if it doesn't have an index, it seems I have some work to do. My daughter shouldn't respond to indexing with icks.

In the United States, children learn about indexes when they are old enough to visit the school library and get instruction on how to use its resources. And while many of the printed card catalogs of my youth have been replaced with computer systems, students are still taught how to use the indexes in the backs of some books. After that, their indexing education is complete. They probably never talk about indexing with the librarian again.

Though brief, even this index education is extremely important. Instinctively, children unfamiliar with indexes will look up information just as adults use a dictionary to look up spellings. For example, if you think deceive is spelled decieve, you'll go to the dictionary to look up decieve. Not finding it, you'll look for a neighboring word that looks somewhat similar, and discover the correct "deceive." In other words, you'll enter the dictionary looking for one word, but be satisfied with another. This is how children use indexes, too. They'll look up "Civil War," not find it, and be satisfied with "civil engineering." Then, of course, they'll fail.

(It is worth noting that adults demonstrate this behavior with indexes, too. I might attempt to look up "potatoes" in a cookbook, yet be satisfied with a result of "potatoes and yams.")

Meanwhile, adults don't instinctively understand the metaphor of things inside things inside things. The well-known marushka dolls, in which a large bowling-pin-shaped doll holds a smaller doll that holds another doll, and so on, is endlessly fascinating for children. As adults, we're fascinated by the plots to suspense novels. Each step along our way -- an uncovered doll, a turned page -- is built upon the past in a linear way. We follow events, from first to last, in linear sequence, and we succeed.

Hierarchical organization, in contrast, has no obvious place in human existence. To survive, it's enough to separate things into only two groups at a time: dangerous vs. safe, edible vs. inedible, alive vs. dead, something we like vs. something we don't like, family vs. nonfamily. As intelligent creatures we might create a few more categories at a time -- family, co-workers, non-work friends, acquaintances, strangers -- but rarely do we construct them into layers like "people I know > people I like > people I like to work with." Layering is completely unnecessary in our daily lives. Perhaps it is for this reason that human beings cannot instinctively organize things in a hierarchical way -- in the same way we can't tell the (very big) spatial difference between one million miles and one billion miles. To do these things, we need training.

You know, we don't do math naturally, either. Our instincts tell us the difference between one item, two items, a few items, many items, and very many items, but that's it. We also understand more and fewer. But we don't have an instinct that tells us how to add or multiply, let alone solve calculus problems. (If you don't believe me, then I dare you to cut a pizza or a cake into five equal slices without making a mistake.)

Today, we have math classes. Before math was taught as its own course, certain elements of math were taught within specific subjects. Shipbuilders and shoemakers learned enough math to do their jobs, and that was it. The idea of teaching math independent of application must have seemed very strange. What good is shipbuilders' math to shoemakers? But eventually, the math-proficient individuals in each field spoke to one another and discovered exactly what they had in common: a need to add numbers together, a need to calculate weight, and a need for geometry. Now math is an integral part of standardized testing, which means students aren't allowed to graduate from school without proving themselves in basic math skills, separate from their application.

So why aren't we teaching information the way we teach math? Information classification exists in every field of human exploration, from literature (divisions of author style or message) to sales (styles of negotiation), and from biology (life classifications) to auto mechanics (systems of function). If a student is going to learn anything about anything, he should learn a little something about how information itself fits together.

The impact a basic, application-independent information education can have is astounding. As an example, consider driving directions. In general, we give directions to people in a linear order, something that makes sense given how we travel. Here is how you can get to the post office near my home: "(1) Take route 95 until exit 26. (2) Take route 2 East until exit 59. (3) Take route 60 into Arlington Centre. (4) Turn left onto Massachusetts Avenue. (5) After three blocks, turn right onto Court Street. (6) The post office is on your left at the end of the street." As I said before, you don't need information hierarchy to survive; following these linear directions is quite easy. But suppose you make a wrong turn, or miss your exit? To find your way back to the path I provided, you need to know something about the geographic layers that make up these regions: "greater Boston > north Boston suburbs > town of Arlington > Arlington Centre area > Court Street." You need a hierarchical knowledge of the area! Put another way, what many of us refer to as "a great sense of direction" is actually "a deep understanding of relevant geographical hierarchies." That's why someone who knows their way around New York City will get lost in the woods: they learned how NYC streets fit together (NYC > Manhattan > Upper East Side > etc.) but learned nothing about forests. Get my point? Sense of direction is taught and learned.

It's time for us to start teaching information construction in schools. We're lost without it.

Labels: hierarchical organization, human factors, training in indexing

# posted by taxonomist @ 3:44 PM 0 comments

18 May 2006

Bias in indexing

The greatest advantage that indexing processes have over automated (computer-only) processes is the human component. Of course, as someone who has worked with humans before, you probably recognize there can be imperfections.

I was reading Struck by Lightning: The Curious World of Probabilities earlier this week, in which the author writes of biases in scientific studies. I realized that these same biases occur with indexes and indexers as well, and I wondered if I can list them all.

(The biggest bias in indexing isn't one of the index at all, but rather the limitations on what the authors write. For example, if a book on art history didn't include information about Vincent van Gogh, I would expect van Gogh to be missing from the index; this absence might be caused by an author bias. However, I am going to focus on biases that affect indexing decisions themselves.)

Inclusion bias. Indexers may demonstrate a bias by including more entries related to subjects that appear more interesting or important to that indexer. For example, I live in Boston, and so I might consider Boston-related topics to be less trivial (more important) than the average indexer; consequently, documentation that includes information about Boston is more likely to appear in my index. I imagine inclusion bias is a common phenomenon in documentation that includes information about contentious social issues -- immigration, tobacco legislation, energy policy -- because the drive to communicate one's ideas on these issues is stronger. I also believe that inclusion bias is not entirely subconscious, and that indexers may purposefully choose to declare their ideas with asymmetric inclusion. It should be noted, however, that biased inclusion would not necessarily provide insight into the indexer's opinion on the subject; creating an entry like "death penalty morality" does not clearly demonstrate whether the indexer actually disagrees with capital punishment.

Noninclusion bias. Similar to inclusion bias, indexers might feel that certain mentions in the text are not worth including in the index because of their personal interests or beliefs. Unlike inclusion bias, however, I suspect noninclusion bias does not appear in regards to contentious issues; conflict is going to be indexed as long as the indexer recognized the conflict has value. Instead, an indexer is likely to exclude things that "seem obvious"; rarely are these tidbits of information controversial. For example, an indexer who is very familiar with computers is likely to exclude "obvious computer things," subjectively speaking; you probably won't find "keyboard, definition of" in such a book.

Familiarity (unfamiliarity) bias. When an indexer is particularly interested in or knowledgeable about a subject, the indexer is likely to create more entry points for the same content than another indexer might. For example, an indexer who is familiar with "Rollerblading" might realize that Rollerblade is a brand name, and that the actual items are called inline skates. This indexer is more likely to include "inline skates" as an entry. Unfamiliarity bias would be opposite, in that multiple entry points are not provided because the indexer doesn't think of them, or perhaps doesn’t know they exist.

Positive value bias. An indexer who has reason to make certain content more accessible for readers to find (and read) is likely to create more entry points for that idea. At the extreme, the indexer will overload access by using multiple categorical and overlapping subtopics, where those subcategories are at a higher granularity than the information itself. For the generic topic of "immigration," for example, an indexer might include categorical entries like "Hispanic immigrants," "European immigrants," and "Asian immigrants," as well as overlapping topics like "Asian immigrants," "Chinese immigrants," and "Taiwanese immigrants," with all of them pointing to "immigration" in general.

There are three types of positive value bias. Personal positive value bias is demonstrated when the indexer himself believes that the information is of greater-than-average value. Environment-based positive value bias is demonstrated when the index is swayed by environment forces, such as social pressures, political pressures, pre-existing media bias, and so on. Finally, other-based positive value bias is demonstrated when the index bows to pressures imposed by the author, client, manager, or sales market (i.e., the person paying the indexer for the job). Although it can be argued that this last type of bias is not the indexer's bias, strictly speaking the indexer can choose to fight any bias forced upon him. For example, a client who instructs the indexer to "index all the names in this book" might interpret this instruction as some kind of market bias, and thus refuse to follow this guideline. In reality, however, most indexers do accept the pressures placed upon them by the work environment, and thus in my opinion take on the responsibility and ethical consequences of this choice.

Negative value bias. It's possible for an indexer to provide fewer entry points for content that he feels is not of great importance to readers -- the direct opposite of positive value bias -- but the reasons for limiting access to that content are probably not related to indexer's perceived value of that content for readers. Instead, indexers are likely to limit access to content when there is a significant amount of similar content in the book, and as such including access to those ideas would either bulk up the index unnecessarily or waste a lot of the indexer's time. For example, if an indexer were faced with a 40-page table of computer terms, it's unlikely that each term would be heavily indexed, even if such indexing were possible and even helpful to readers.

For this reason, I believe that there are three kinds of negative value bias: time-based negative value bias, in which the indexer skimps on providing access in an effort to save time; financially motivated negative value bias, in which the indexer skimps on providing access in an effort to earn or save money; and logistical negative value bias, in which the indexer skimps on providing access in response to logistical issues like software limitations, file size requirements, page count requirements, controlled vocabulary limitations, and the like.

Topic combination (lumper's) bias. This bias is exhibited by indexers who are likely to combine otherwise dissimilar ideas because they find this "lumping together" of ideas to be aesthetically pleasing or especially useful. This kind of bias is visible in the ratio between locators (page numbers) and subentries, in that entries are more likely to have multiple locators than multiple subentries, on average. For example, an entry like "death penalty, 35, 65, 95" shows that the indexer believes the content on these three pages is similar enough that subentries are not required or useful. Topics that start with the same words might also be combined in a more general topic (such as combining "school lunches" and "school cafeterias" into a combined "school meals.") It is worth noting that some kinds of audiences or documentation subjects may tend toward topic combination bias; for this reason, it may be difficult to recognize lumper's bias.

Topic separation (splitter's) bias. This bias is exhibited by indexers who are likely to separate otherwise similar ideas because they find this "splitting apart" of ideas to be aesthetically pleasing or especially useful. As with lumper's bias, splitter's bias is represented by the ratio of locators to subentries throughout an index, in that splitters are likely to create more subentries than would other indexers, on average. It is worth noting that some kinds of audiences or documentation subjects may tend toward topic separation bias; for this reason, it may be difficult to recognize splitter's bias.

These are all the biases I've found or experienced. If you think there's another kind of bias that indexers exhibit, let me know.

The remaining question is this: Is it wrong for an indexer to have bias? That is, should indexers study their own tendencies and work to avoid them? I don't think it's that simple. The artistry that an indexer can demonstrate is fueled by these biases -- experiences, opinions, backgrounds, interpretations -- and perhaps should even be encouraged. An indexer's strengths come from his understanding of not just the material, but also his perceptions of the audience, the publication environment, and the audience's environments. Further, indexers who know and love certain subjects are going to be drawn to them, just as many readers are; these biases aren't handicaps so much as commonalities shared between indexers and readers. Biases will hurt indexers working on unfamiliar materials in unfamiliar media, but under those conditions the biases are the least of our worries; when the indexer is working without proper knowledge, the higher possibility of bad judgment or error is a much greater concern.

If anything, indexers should be aware of their biases because they can serve as strengths -- especially in comparison to what computers attempt to do.

Labels: hierarchical organization, human factors, indexing process, misspellings and other errors

# posted by taxonomist @ 9:55 PM 0 comments

Seth Maislin's Indexing Blog

02 August 2006

We're lost without an information education

18 May 2006

Bias in indexing

About Me

Relevant Links

Some Blogs Seth Might Visit

archives