24 March 2006
The granularity of an online "page number"
When writing a hyperlinked index (where hyperlinks are used instead of page numbers), to what should those links point?
Some people think they should point to the section title in which the information is provided; other people like to point right to the specific word used in the index. The "answer," obviously, is that hyperlinked index entries should take the readers to where the information is, right? The problem -- the reason there's this question of "where do I point my entries" in the first place is that readers of hypertext might find themselves bounced somewhere they don't understand. How many times have you followed a link, only to find yourself fiddling around with the scroll bar to figure out where you ended up? Following a hyperlink is like being blindfolded and transported to an unknown destination.
What you may not know is that book indexes aren't much different. :-)
Think about how book indexes actually work, and you realize that direct readers to the page on which the information starts. An entry like "buoyancy, 164" tells the reader to look somewhere on page 164; an entry like "global harmony, 164-167" tells the reader to start looking somewhere on page 164. The granularity of an index is defined as the smallest unit of area that can pointed to. For printed indexes, this area is the page number. Rarely will you find locators that use fractional or qualified page numbers like 164-1/2 or 164top. (There are such things as qualified locators, like 164f, which might point to the footnote on page 164, but even in the books in which they're used they comprise only a small number of all locators used.)
If you follow the standards of the industry, then, the granularity of a printed index is one physical page. For this reason, books that have lots of words on a page -- big pages, narrow margins, tiny print -- are less friendly to book indexers. It's like telling someone that there's a needle in that 164th haystack over there. Maybe we should count our blessings that someone bothered to number the haystacks, but ideally this is where the book designer starts earning her salary. Book pages don't have to look like haystacks -- more accurately, wordstacks -- if the book has legible headings and subheadings. Books can be written with quickly visible landmarks within the pages, like italics and boldface, larger and smaller font sizes, headings and callouts, footnotes, and so on. Going back to the blindfolded analogy, there's no reason we have to drop our readers into deserts of information, when we can drop them in a place surrounded by location clues and navigational signs, like at a train station.
On the Web, however, there is no such thing as a printed page. Web pages can be any length, from tiny pop-up windows with only a sentence fragment of information within, to long scrolls of endless paragraphs and images. Additionally, you don't have to direct the reader to just the page any more, but rather you can deposit him anywhere within the page. The granularity of a Web page is a word! You can send someone into the middle of a paragraph.
When you have tiny little windows of information, using that window as a destination is a no-brainer: the reader arrives at a single sentence of information, which is what he needs. It doesn't matter if you point him to the beginning, middle, or end of that sentence, because it's all they get to read. Pointing someone to an isolated window of information -- what Web authors call "chunks" -- is as easy as looking into a food pantry that contains only a single can. But when you have longer pages, and you have the ability to point someone to any spot within those longer pages, you have a decision to make. And it's a decision that didn't exist in the printed world, with its larger granularity.
The solution is to connect the text of the index entry with the text of the documentation. Not the meaning, but the actual words. If the index entries are written to almost identically match those of the documentation, then the reader won't mind as much because it won't look like a desert. They'll have exactly the landmark they need right in front of them. The entry "cancer, prevention of," for example, could point directly to this line without a problem:
... cessation of smoking. In fact, many physicians are well aware that one way to prevent cancer is to quit ...
That's because the words of your index entry, which are cancer and prevention, appear almost verbatim in that line of text. And if this information were part of a section titled "Using Peer Pressure to Help Patients Quit Smoking," then you really wouldn't want to point to the heading for context. That's because it's unclear to the reader that you're actually directing him to information about cancer or prevention. You're making them work at it.
And then there's the other situation. Using the same sentence and heading as above, where should the indexer point readers who look up the entry "smoking, how to quit"? Clearly they should go right to the heading. If they went to the line that talked about physicians, they wouldn't know where they are.
Our original question here was this: When writing a hyperlinked index (where hyperlinks are used instead of page numbers), to what should those links point? Clearly the only way to answer this question comprehensively is to suggest that the language of hyperlink indexes has two contexts: the index entry itself and the destination location. These two contexts need to work together. And as we saw, the same is true with the printed book: having arrived at page 164, how quickly can you find the idea you were looking for?
Looking this closely at hyperlinked indexes only emphasizes something we need for all indexing: use index entries that match the documentation text. If you have to write a slightly longer entry, that's okay. Instead of "cigarettes," use "cigarette smoking, quitting." Instead of "social networks," use "social networks and peer pressure." The people who work with search engines and Internet marketing are familiar with the term trigger words, which refers to visible language that matches the mental language of the searcher. If you're thinking of the words "white elephant," then a result of "pale pachyderm" doesn't work because it doesn't trigger your sense of recognition.
So the next time there's a white elephant in haystack 164, be sure to tell someone as explicitly as possible.
Some people think they should point to the section title in which the information is provided; other people like to point right to the specific word used in the index. The "answer," obviously, is that hyperlinked index entries should take the readers to where the information is, right? The problem -- the reason there's this question of "where do I point my entries" in the first place is that readers of hypertext might find themselves bounced somewhere they don't understand. How many times have you followed a link, only to find yourself fiddling around with the scroll bar to figure out where you ended up? Following a hyperlink is like being blindfolded and transported to an unknown destination.
What you may not know is that book indexes aren't much different. :-)
Think about how book indexes actually work, and you realize that direct readers to the page on which the information starts. An entry like "buoyancy, 164" tells the reader to look somewhere on page 164; an entry like "global harmony, 164-167" tells the reader to start looking somewhere on page 164. The granularity of an index is defined as the smallest unit of area that can pointed to. For printed indexes, this area is the page number. Rarely will you find locators that use fractional or qualified page numbers like 164-1/2 or 164top. (There are such things as qualified locators, like 164f, which might point to the footnote on page 164, but even in the books in which they're used they comprise only a small number of all locators used.)
If you follow the standards of the industry, then, the granularity of a printed index is one physical page. For this reason, books that have lots of words on a page -- big pages, narrow margins, tiny print -- are less friendly to book indexers. It's like telling someone that there's a needle in that 164th haystack over there. Maybe we should count our blessings that someone bothered to number the haystacks, but ideally this is where the book designer starts earning her salary. Book pages don't have to look like haystacks -- more accurately, wordstacks -- if the book has legible headings and subheadings. Books can be written with quickly visible landmarks within the pages, like italics and boldface, larger and smaller font sizes, headings and callouts, footnotes, and so on. Going back to the blindfolded analogy, there's no reason we have to drop our readers into deserts of information, when we can drop them in a place surrounded by location clues and navigational signs, like at a train station.
On the Web, however, there is no such thing as a printed page. Web pages can be any length, from tiny pop-up windows with only a sentence fragment of information within, to long scrolls of endless paragraphs and images. Additionally, you don't have to direct the reader to just the page any more, but rather you can deposit him anywhere within the page. The granularity of a Web page is a word! You can send someone into the middle of a paragraph.
When you have tiny little windows of information, using that window as a destination is a no-brainer: the reader arrives at a single sentence of information, which is what he needs. It doesn't matter if you point him to the beginning, middle, or end of that sentence, because it's all they get to read. Pointing someone to an isolated window of information -- what Web authors call "chunks" -- is as easy as looking into a food pantry that contains only a single can. But when you have longer pages, and you have the ability to point someone to any spot within those longer pages, you have a decision to make. And it's a decision that didn't exist in the printed world, with its larger granularity.
The solution is to connect the text of the index entry with the text of the documentation. Not the meaning, but the actual words. If the index entries are written to almost identically match those of the documentation, then the reader won't mind as much because it won't look like a desert. They'll have exactly the landmark they need right in front of them. The entry "cancer, prevention of," for example, could point directly to this line without a problem:
... cessation of smoking. In fact, many physicians are well aware that one way to prevent cancer is to quit ...
That's because the words of your index entry, which are cancer and prevention, appear almost verbatim in that line of text. And if this information were part of a section titled "Using Peer Pressure to Help Patients Quit Smoking," then you really wouldn't want to point to the heading for context. That's because it's unclear to the reader that you're actually directing him to information about cancer or prevention. You're making them work at it.
And then there's the other situation. Using the same sentence and heading as above, where should the indexer point readers who look up the entry "smoking, how to quit"? Clearly they should go right to the heading. If they went to the line that talked about physicians, they wouldn't know where they are.
Our original question here was this: When writing a hyperlinked index (where hyperlinks are used instead of page numbers), to what should those links point? Clearly the only way to answer this question comprehensively is to suggest that the language of hyperlink indexes has two contexts: the index entry itself and the destination location. These two contexts need to work together. And as we saw, the same is true with the printed book: having arrived at page 164, how quickly can you find the idea you were looking for?
Looking this closely at hyperlinked indexes only emphasizes something we need for all indexing: use index entries that match the documentation text. If you have to write a slightly longer entry, that's okay. Instead of "cigarettes," use "cigarette smoking, quitting." Instead of "social networks," use "social networks and peer pressure." The people who work with search engines and Internet marketing are familiar with the term trigger words, which refers to visible language that matches the mental language of the searcher. If you're thinking of the words "white elephant," then a result of "pale pachyderm" doesn't work because it doesn't trigger your sense of recognition.
So the next time there's a white elephant in haystack 164, be sure to tell someone as explicitly as possible.
Labels: books, indexing process, keywording, pages and page ranges, web indexing