17 March 2007

 

Notes on automatic indexing

"Automated indexing software" is, according to the common definition, software that analyzes text and produces an index without human involvement. I'm a firm believer that the technology doesn't exist, and that a human being is required to write an index. Thus I don't use the software, and I also don't recommend it.

There are those who advocate it, arguing that it's "not as bad as an indexer would have you think." These people are often coming from the standpoint that automatic software is faster and cheaper, and they're right. Thus the issue surrounds quality.

I believe that good automatic indexes will exist once there's good artificial intelligence, something that presently doesn't exist. In very limited circumstances, however, it does; a machine can easily cull capitalized words from a textbook to create an approximation of an index of names -- although, again, the machine isn't going to differentiate between names like "David Kelley" and places like "San Francisco," since they are both of the same format and used the same way. It also won't know that "Bill Clinton" is also "William Jefferson Clinton." And certainly it can't tell when the name is being mentioned in an unuseful and trivial way, as are the names in this paragraph! So imagine the problems trying to get a machine to parse full sentences of ideas and recognizing the core ideas, the important terms, and the relationships between related concepts throughout the entire text.

FYI, those who advocate automatic software, however, would argue that the machine gets "close enough" so that a human being can edit the resulting product. However, expert evaluators unanimously agree that the software fails; those who disagree are likely those who are sufficiently ignorant of indexing in the first place such that they are unable to determine the quality differences.

Oh, I should mention that there are software programs that human indexers use to simplify and speed up the mechanics of the index process. For example, it would be silly to disallow a computer to alphabetize the entries, reformat the index, and manipulate page numbers. There are a few software packages that do this exclusively, which are considered top of the line; other applications that have indexing capabilities, such as Microsoft Word and Adobe FrameMaker, have some of these capabilities, with notable limitations.

For information on the various software available, see http://www.asindexing.org/site/software.shtml. If you have feedback, especially differing opinions, I'd love to hear them. Write me at seth@maislin.com.

(This article was originally published in 2002 and 2004 -- and it's still 100% accurate.)

Labels: ,


Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?