<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-22121434</id><updated>2011-07-07T18:32:42.792-05:00</updated><category term='indexing tools'/><category term='cross references'/><category term='misspellings and other errors'/><category term='indexing humor'/><category term='American Society of Indexing'/><category term='search engines'/><category term='findability'/><category term='books'/><category term='Microsoft Word indexing'/><category term='social algorithms'/><category term='hierarchical organization'/><category term='privacy'/><category term='spamming and similar behaviors'/><category term='business of indexing'/><category term='fun with indexing'/><category term='Google'/><category term='pages and page ranges'/><category term='human factors'/><category term='information architecture process'/><category term='keywording'/><category term='future of indexing'/><category term='power of information'/><category term='content management'/><category term='training in indexing'/><category term='indexing process'/><category term='embedded indexing'/><category term='cataloguing'/><category term='web indexing'/><title type='text'>Seth Maislin's Indexing Blog</title><subtitle type='html'>A public exploration of indexing and information storage, retrieval, naming, and categorization. A spontaneous collection of ideas about ideas. Yet another website by Seth Maislin (like http://taxonomist.tripod.com).</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>52</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-22121434.post-487793039172570554</id><published>2007-10-14T20:58:00.000-05:00</published><updated>2007-10-14T21:10:37.624-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='future of indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='indexing tools'/><category scheme='http://www.blogger.com/atom/ns#' term='business of indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='keywording'/><title type='text'>Human-computer hybrids, in indexing</title><content type='html'>I recently completed an (as-of-yet unreviewed) article for &lt;em&gt;Information - Wissenschaft &amp;amp; Praxis &lt;/em&gt;(IWP), the premier German journal on information science. The topic was the intersection of computer-based indexing and human indexing, and how these two approaches to indexing are unequal but in many ways compatible.&lt;br /&gt;&lt;br /&gt;The biggest challenge in writing the article comes from the simple fact that I'm a human indexer, and that I believe that automatic indexing fails every important quality test. On the other hand, since it's unlikely we're going to have people typing away to index the World Wide Web (see my entry &lt;a href="http://maislin.blogspot.com/2006/12/needle-in-haystack-with-100000000.html"&gt;"A needle in a haystack with 100,000,000 blades"&lt;/a&gt;), it seems we're going to need something faster than human fingers and brains to get the job done.&lt;br /&gt;&lt;br /&gt;I'm not going to repeat the article's ideas here, except to say that I tried to give an even-handed view of automatic indexing -- even as I tend to rip it to shreds in this blog when I can. The distinction is that people constantly overestimate when it's necessary. Automatic indexing is overused and misused.&lt;br /&gt;&lt;br /&gt;Still, I thought my loyal readers might knowing that even on this, there are two valid opinions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-487793039172570554?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/487793039172570554/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=487793039172570554&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/487793039172570554'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/487793039172570554'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/10/human-computer-hybrids-in-indexing.html' title='Human-computer hybrids, in indexing'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-7791023395235051827</id><published>2007-09-11T11:33:00.000-05:00</published><updated>2007-09-11T11:48:46.119-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='business of indexing'/><title type='text'>External deadline forces</title><content type='html'>&lt;p&gt;"I need this book in hand by Friday because..."&lt;/p&gt;&lt;p&gt;Outside of the natural production process there are, definitely, many different kinds of external circumstances that impact the timing and schedules of indexing. Many are sector- or medium-specific, but of course there are indexers who work among several of these and thus feel the impact all year. Here are some examples that I know: &lt;/p&gt;&lt;ul&gt;&lt;li&gt;Textbooks that are used in American public schools tend to appear in time for the Texas and California state adoption processes. If a book isn't published on time to be reviewed by the school officials in these states, it's unlikely that the book will be used in public schools at all.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;College textbooks need to be on the shelves in time for traditional semester beginnings, in September and January.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Books that are budgeted for one year are pushed to get finished during that budget (fiscal) year, to avoid (a) losing the opportunity to spend money already allocated for the publishing process, and (b) spending money needed in the next year. This impacts the indexers around U.S. Thanksgiving.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Software books targeted toward the general public need to be first to market to catch the wave of early sales; these schedules are irregular but can be predicted by looking at the various technologies that are coming out. For example, we're still near the beginning of the Windows Vista wave, since the new operating system was only recently released.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;General-readership books based on cultural events (news items, holidays, anniversaries) are similar to software books, in that being first to market matters equally.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Politics is a special kind of cultural event in that it's ongoing. Books on politics tend to appear in advance of events that are potentially influential in the political world. For example, books about presidential candidates tend to appear in parallel with their campaigns: early books to define the brand, later books to strengthen the message, and post-election books to analyze the results and consequencies. Other than elections, books related to policy making, international relations, and larger political issues (like national security and environmental conservation). Corporate politics can fall into this category as well, though these publications may double as marketing and promotion documents.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Professional conferences occur in clusters (lots in the summer, for example), and so publications that are relevant to conference events tend to get published (and re-published) in clusters.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;New printing and publishing technologies, which the layperson doesn't hear about, can drive new publications in a way similar to first-to-market publishing. For example, when CD envelopes were first made available in books, there was a market-driven desire to include CDs with more books. Most printing technologies are small variations on what exists today, but when a new possibility exists, it's a trend that some publishers chase right away. For example, if the quality of color rendering took a small leap forward, books where color is particularly critical (art, medical imaging, etc.) would appear more frequently for a while.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;If you have more to suggest, let me know.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-7791023395235051827?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/7791023395235051827/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=7791023395235051827&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/7791023395235051827'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/7791023395235051827'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/09/external-deadline-forces.html' title='External deadline forces'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-591218288468026181</id><published>2007-07-24T23:48:00.001-05:00</published><updated>2007-07-25T00:40:38.428-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='fun with indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='books'/><title type='text'>Books I really, really, really want to index</title><content type='html'>Perhaps because most of my indexing work is on books that are rather typical for nonfiction reference books -- technical titles like &lt;em&gt;Measurement, Analysis, and Control Using JMP&lt;/em&gt; and resource guides like &lt;em&gt;Dx/Rx: Colorectal Cancer &lt;/em&gt;-- I jump for joy when I get something so off the beaten path that I renew my love for this job. For example, I recently completed the index for &lt;em&gt;First Position, &lt;/em&gt;a collection of biographies of ballet dancers; more recently, I indexed &lt;em&gt;Sensual Knits &lt;/em&gt;and &lt;em&gt;Sensual Crochet,&lt;/em&gt; both beautifully photographed books of designs and patterns.&lt;br /&gt;&lt;br /&gt;But now, working in the wee hours of the night, I find myself fantasizing about the books that I really, really, really want to index, books that are just asking to be written so that I, Seth Maislin, can be assigned their indexes. So here's my wish list:&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;u&gt;Chihuahuas for Dummies&lt;/u&gt;&lt;/em&gt;&lt;br /&gt;Don't laugh. You probably have no idea just how far-reaching &lt;a href="http://www.dummies.com/"&gt;the Dummies series&lt;/a&gt; has become since its long-ago inception as a series for computer use. There's &lt;em&gt;Fantasy Football for Dummies, &lt;/em&gt;a book about imaginary sports playing; &lt;em&gt;Stretching for Dummies, &lt;/em&gt;a book about limbering up, perhaps in advance of reading &lt;em&gt;Sex for Dummies&lt;/em&gt;; &lt;em&gt;Guitar for Dummies, Bass Guitar for Dummies, &lt;/em&gt;and the upcoming &lt;em&gt;Rock Guitar for Dummies, &lt;/em&gt;which I have to believe compete with each other somehow; and &lt;em&gt;Jewish Cooking for Dummies, &lt;/em&gt;a book that, dare I say it, would make me feel guilty to own. Nevertheless, let me make myself clear here. &lt;a href="http://www.dummies.com/WileyCDA/DummiesTitle/productCd-0764552848,subcat-PETS.html"&gt;&lt;em&gt;Chihuahuas for Dummies&lt;/em&gt;&lt;/a&gt;&lt;em&gt; &lt;/em&gt;&lt;strong&gt;is a real book.&lt;/strong&gt; I want to index the next edition, you see, because I'm dying to see what changes.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;u&gt;10.9 Seconds: The Joey Chestnut Story&lt;/u&gt;&lt;/em&gt;&lt;br /&gt;(see &lt;a href="http://origin.mercurynews.com/valley/ci_6297731"&gt;http://origin.mercurynews.com/valley/ci_6297731&lt;/a&gt; to get the joke) Yes, this book is my own invention, but the fun part about indexing sports books is that they are so completely self-reverential. (Yes, &lt;em&gt;reverential,&lt;/em&gt; not &lt;em&gt;referential.&lt;/em&gt;) Written by sports geeks for sports geeks, the authors' language captures the awe-hubris-humor combination achieved by fans and record-breakers when it comes to the sport that is most of their life. It doesn't matter what the sport is, either, so I'm all for those esoteric things like Ultimate Frisbee (I was offered such a book once) and so on. I recently indexed the comprehensive &lt;em&gt;Chasing the Hunter's Dream, &lt;/em&gt;a directory of hunting opportunities around the world. This book included both descriptions of "dream hunts" -- think lion hunts in Africa -- and an entire section in the back dedicated to recipes, including a few meals for squirrels -- I mean, &lt;em&gt;of &lt;/em&gt;squirrels. And I mentioned &lt;em&gt;First Position &lt;/em&gt;in my intro, where at times I felt like I was reading an artist's diary.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;u&gt;How to Work My Body: A Manual&lt;/u&gt;&lt;/em&gt;&lt;br /&gt;There are a number of sex books out there -- including one for Dummies -- and most of them have indexes. I just finished indexing &lt;em&gt;Him &lt;/em&gt;and &lt;em&gt;Her, &lt;/em&gt;short and photograph-filled manuals of the sexes, along with instructions to make them work. And I do mean "work": the book about men attempts to explain why they tend not to do chores around the house. (Oh come on, you didn't think I'd use an erotic example of "work", did you? :-) These books, produced by the same group of people who made &lt;em&gt;Sensual Crochet,&lt;/em&gt; were a joy of sex to index, especially once I realized that most of the anatomy-filled books that I index are about abnormal anatomy: prostate disorders (&lt;em&gt;100 Questions and Answers About Prostate Diseases&lt;/em&gt;), gunshot wounds (&lt;em&gt;Criminal Investigation, 2nd edition&lt;/em&gt;), and the like. And unlike the traditionally polite sex-instruction book, &lt;em&gt;Him &lt;/em&gt;and &lt;em&gt;Her &lt;/em&gt;are more about the art than the words -- something that, for eunuchs at least, would make the indexing go much faster.&lt;br /&gt;&lt;br /&gt;&lt;u&gt;The user manual to anything only cool people own&lt;/u&gt;&lt;br /&gt;I had the honor of indexing the user manual to the Class E series Mercedes-Benz automobile. This full-color production was totally awesome; I spent a lot of time trying to convince myself that reading the manual long before the car's official release was as envy-worthy as owning the car itself. (For many months my friends and family joked that I should paid in cars instead of dollars.) I've indexed the manuals to software applications before, but I have more memories from editing the user guide to a long-since-extinct universal remote control ... and I'm talking back when these things were large control panels. So what other cutting-edge production is taking place? I missed indexing the iPhone manual, but maybe someday I'll get to index the field guide for a nasty-looking military weapon.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&lt;u&gt;Instructions to the 1040 Form&lt;/u&gt;&lt;/em&gt;&lt;br /&gt;Indexing gets so little press, but that doesn't stop me from wanting to index something that's so popular or high-profile that I can't feel proud. I'll never be a household name, but if I had landed that one magical indexing project with the U.S. Internal Revenue Service, my work might have reached every household. They really were looking for someone, at least for a little while. Even the newest Harry Potter book isn't as popular. Which reminds me: is someone out there indexing Rowland's books? If not, there ought to be. &lt;em&gt;The Unauthorized Index of Harry Potter &lt;/em&gt;would be a big seller ... despite use of the word &lt;em&gt;index&lt;/em&gt; in the title. Move over, back-of-the-book indexing. We're on the cover now.&lt;br /&gt;&lt;br /&gt;Okay, I'm starting to drool.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-591218288468026181?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/591218288468026181/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=591218288468026181&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/591218288468026181'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/591218288468026181'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/07/books-i-really-really-really-want-to.html' title='Books I really, really, really want to index'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-6537869959092364704</id><published>2007-07-09T16:28:00.000-05:00</published><updated>2007-07-09T16:31:21.377-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Microsoft Word indexing'/><title type='text'>Printing Word documents with XE fields visible</title><content type='html'>&lt;span style="font-size:78%;"&gt;(This is taken from my &lt;a href="http://taxonomist.tripod.com/indexing/wordproblems.html"&gt;Word Indexing FAQ&lt;/a&gt;.)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Can I Print My Documents with the XE Fields Visible?&lt;br /&gt;&lt;/em&gt;&lt;br /&gt;Yes, you can. Microsoft Word can make all hidden-text codes visible, whether they're for indexing or not. Go into Page Setup (available from the File menu) and look for something that says "print display codes" or "print hidden text" or something like that. Until you uncheck that box in the future, all of your codes will show up in your printouts.&lt;br /&gt;&lt;br /&gt;Be aware that printing with your indexing fields visible will affect the pagination. Don't write an index using your hard copy this way.&lt;br /&gt;&lt;br /&gt;On a related note, remember that you can track changes when you work. Every time you insert, edit, or delete an XE field, you'll get a note in the margins. These marginal callouts can speed up your ability to find your XEs, although it might also clutter up your work. Use Tools &gt; Track Changes to turn that feature on.  Additionally, these changes can be made visible when printing as well, using a similar process as described above. Keeping XEs invisible but marginal notes visible allows you to see the index pointers without messing up the pagination. Be warned, however, that if changes are already being tracked, don't turn that feature off! You could lose that information for good. Instead, use the View menu to make those changes visible.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-6537869959092364704?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/6537869959092364704/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=6537869959092364704&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/6537869959092364704'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/6537869959092364704'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/07/printing-word-documents-with-xe-fields.html' title='Printing Word documents with XE fields visible'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-8906507508503122907</id><published>2007-06-27T09:33:00.000-05:00</published><updated>2007-06-27T09:43:23.817-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='human factors'/><category scheme='http://www.blogger.com/atom/ns#' term='fun with indexing'/><title type='text'>Technology that ignores indexing</title><content type='html'>It's true, I'm including a link to this video because, frankly, it made me laugh. Witness a &lt;a href="http://www.tubearoo.com/articles/87148/Microsoft_Surface_Parody.html"&gt;satire of Microsoft Surface&lt;/a&gt;. Yes, it's funny, but I found myself thinking about how technologies are so often designed to create needs, not to meet existing needs. I mean, it's fascinating to imagine a table that has a computer screen as a surface, but what about building height adjusters into the legs so they don't wobble? And as I continued to think about this, I discovered I have two reasons to put this video in my blog.&lt;br /&gt;&lt;br /&gt;First, there's the opening sequence in which someone is looking at digital photos and videos scattered across the tabletop. With his fingers, the Surface user can move them around, open then, and even video the videos. In other words, the engineers of this expensive table have managed to reproduce the &lt;em&gt;worst part of photographs:&lt;/em&gt; the pile of undifferentiated images. If someone came to you and dumped a box of photographs on your table, would you be happy? Now, what if all those photographs were digital? This is technology that completely ignores indexing. Compared to tools like Picasa, which puts &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;metadata&lt;/span&gt; to work, this product does its best to create an interface option where &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;metadata&lt;/span&gt; is ignored. And if you're one of those "old-timers" who longs for the physical-contact nostalgia of long-ago days of printed photographs, such that you might think shuffling through a pile of photographs would be fun, think again. Remember, these are &lt;em&gt;digital &lt;/em&gt;photographs. They have no width and no weight.&lt;br /&gt;&lt;br /&gt;Second, there's the reality that in real life, we use table tops as horizontal storage surfaces. Whether you're a neat freak who has only a magazine or a coaster rack on top, or you're more like me and live with your tables essentially &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_2"&gt;camouflaged&lt;/span&gt; by life's detritus, either way you've essentially buried your workspace. In other words, this tool seems to forget the environment in which we look things up. The &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_3"&gt;voiceover&lt;/span&gt; in the ad jokes about the convenience of a handheld machine in comparison to this table, but I'd like to suggest that this table would make more sense as a vertical hang-on-the-wall &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_4"&gt;flat-screen TV&lt;/span&gt;. Take a lesson from the many-years-old television industry: there is no market for a horizontal television.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-8906507508503122907?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/8906507508503122907/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=8906507508503122907&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/8906507508503122907'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/8906507508503122907'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/06/technology-that-ignores-indexing.html' title='Technology that ignores indexing'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-904273756011483380</id><published>2007-06-20T10:11:00.000-05:00</published><updated>2007-06-20T10:37:24.531-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='fun with indexing'/><title type='text'>Indexing has at least one fan</title><content type='html'>It seems &lt;a href="http://www.soltys.ca/coredump/2007/06/indexing-blog.html"&gt;my blog got noticed&lt;/a&gt; just the other day, which is pretty neat. It seems I have at least one fan ... other than myself, of course.&lt;br /&gt;&lt;br /&gt;Why does the field of indexing have so few fans outside of the profession itself? I heard many stories from and about thankful authors who swear by the quality of the indexes written by professionals. My favorite is something like this: "Until I looked at that index, I didn't even know I &lt;em&gt;wrote &lt;/em&gt;all that!" I'm talking about something more general.&lt;br /&gt;&lt;br /&gt;No, we're not firefighters, bursting through burning walls to save people we've never met, but I like to think that we make the world a better place anyway. We're the traffic cops of information, tour guides for books, instructors and librarians, a taut rope in the rough seas of data storms....&lt;br /&gt;&lt;br /&gt;If you know anything about indexing -- not index&lt;em&gt;es,&lt;/em&gt; but index&lt;em&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;ing&lt;/span&gt; &lt;/em&gt;-- then you know it's not a boring profession. Think about the public perception of lawyers, and how we don't consider that profession boring, and yet the reality of law is lots of books, lots of reading, lots of research. Those "exciting moments" brought to you on the television, along with the anxiety and intrigue of any moral or ethical battles regarding the implementation of law, represent only a small piece of the whole system. There is a lot of boredom in &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;lawyering&lt;/span&gt;. No, the part of the law that brings so many students into the law schools (other than the potential for income, perhaps) is the idea that law governs our every-day lives, the sociological analogy to science.&lt;br /&gt;&lt;br /&gt;Indexing is the process of analyzing and re-representing information, the lifeblood of everything we do. It's the &lt;em&gt;Matrix; &lt;/em&gt;we hold the &lt;em&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;Da&lt;/span&gt; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;Vinci&lt;/span&gt; Code. &lt;/em&gt;We are responsible for getting data from one place to another in an efficient format, so that we can actually talk. Whenever you get frustrated by a failure to communicate, remember that an indexer can change that.&lt;br /&gt;&lt;br /&gt;Maybe the reason indexing seems so boring is that the word is so inexorably tied to that alphabetized, indented thing you see in the back of books. Despite the applications indexing has for the Web, in search, and with taxonomy, people associate what we do with good old-fashioned paper. And gosh, they've been around, like, forever, so of course they're as boring as dirt -- note that dirt is not boring to some people -- and a whole lot less inspiring of nostalgia. Such a shame.&lt;br /&gt;&lt;br /&gt;Maybe we need a movie, the way &lt;em&gt;Top Gun &lt;/em&gt;got people signing up for the U.S. Air Force. Here's one. It's called &lt;em&gt;Cross.&lt;/em&gt; Jack Hannah, Agency "prep consultant" who can find out anything about anyone, is double-crossed when a routine inside investigation of an agent turns out to have the exact same life Jack has. Part &lt;em&gt;No Way Out &lt;/em&gt;and&lt;em&gt; &lt;/em&gt;part &lt;em&gt;Blow Up, Cross&lt;/em&gt; follows Jack on a dangerous journey into government archives to answer what should have been a simple question: Who is the real Jack Hannah?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-904273756011483380?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/904273756011483380/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=904273756011483380&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/904273756011483380'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/904273756011483380'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/06/indexing-has-at-least-one-fan.html' title='Indexing has at least one fan'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-8963768725914588437</id><published>2007-06-09T13:40:00.000-05:00</published><updated>2007-06-09T13:58:53.828-05:00</updated><title type='text'>Exemplary indexes</title><content type='html'>Historically, the &lt;a href="http://www.asindexing.org/site/WilsonAward.shtml"&gt;H. W. Wilson Award&lt;/a&gt; has been given to indexers of scholarly books, often because of the very complications and challenges you're talking about. What makes Do Mi &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;Stauber&lt;/span&gt;, who won &lt;a href="http://www.asindexing.org/site/PR20070531.shtml"&gt;this year's award&lt;/a&gt;, so great at her work is that she has a knack at doing this without slowing down very much. I like to think I have the same knack when it comes to technical and reference books.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.asindexing.org/site/sigs.shtml#scholar"&gt;Scholarly indexing&lt;/a&gt; is WAY hard. I recently accepted a book I didn't realize was scholarly, tried to index it myself, and realized almost immediately that I was in way over my head. (Note that I'm talking about that irrational fear an indexer experiences at the start of every project, but rather something quite objective: an inability to understand the sentences and paragraphs well enough to parse them into &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;indexable&lt;/span&gt; ideas.) I subcontracted the index to another indexer, someone who specializes (or at least doesn't mind) scholarly works, and the result was great.&lt;br /&gt;&lt;br /&gt;By the way, you need to see the &lt;a href="http://www.amazon.com/gp/sitbv3/reader/002-4081374-7163235?ie=UTF8&amp;p=S0JH&amp;amp;asin=0231137486"&gt;award-winning book's index&lt;/a&gt; to really understand what I'm talking about.&lt;br /&gt;&lt;br /&gt;Scholarly works are exceptionally difficult, even if you know the basic subject matter, because of how they are written. Many scholarly publishers underpay their indexers, too, because scholarly books rarely have large audiences: they're library-books-to-be, really, put there for students and faculty. Given that a book won't sell well, publishers are often reluctant to put more money into the production process. However, for the kind of book that &lt;a href="http://www.domistauberindexing.com/"&gt;Do Mi&lt;/a&gt; indexed -- and even the one I gave to someone else -- the indexer had better be making closer to $6/page (U.S.). In comparison, I think $4/p is reasonable for the average technical book, like a book on mathematics or computer programming. See, a technical book requires expertise in or a strongly sympathetic understanding about the subject, whereas scholarly books require a tremendous amount of time spent synthesizing what's in there. Think poetry and "Shakespeare," not of prose and "John &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;Grisham&lt;/span&gt;." :-)&lt;br /&gt;&lt;br /&gt;But the H. W. Wilson Award can be given to indexers of other kinds of books, including technical. What makes the award possible is an exemplary show of knowledge and cunning, something that many technical books don't allow for. You also need the kind of working environment in which a publisher won't chop your index down to size, use a lousy design, or force you to complete the job too quickly to produce an exemplary product -- the kinds of things that are more likely to happen in technical fields than scholarly, in fact. But even a coffee table book can win the award, if the index shows that extra something special. :-)&lt;br /&gt;&lt;br /&gt;Given the kinds of things I index -- and the circumstances in which I index them -- I often think the only way I would win the Wilson Award is if I wrote the book myself, specifically for the purpose of making an awesome index. For example, maybe I would write a book that would require me to use &lt;a href="http://taxonomist.tripod.com/indexing/liungman.html"&gt;symbols as entries&lt;/a&gt;. :-) Then again, I'm still trying to write a &lt;a href="http://maislin.blogspot.com/search?q=mysterious"&gt;mystery index&lt;/a&gt;, too. I wonder if &lt;em&gt;that &lt;/em&gt;would win a Wilson...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-8963768725914588437?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/8963768725914588437/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=8963768725914588437&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/8963768725914588437'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/8963768725914588437'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/06/exemplary-indexes.html' title='Exemplary indexes'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-5232678195205015342</id><published>2007-05-29T20:10:00.000-05:00</published><updated>2007-05-29T20:18:28.956-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='business of indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='misspellings and other errors'/><title type='text'>Throughput in indexing</title><content type='html'>&lt;p&gt;I gave a presentation last year (at the &lt;a href="http://www.asindexing.org/site/conferences/conf2006/index.shtml"&gt;Toronto conference of the American Society of Indexers&lt;/a&gt;) about money. I synthesized some statistics to come up with something I hadn't seen expressed before.&lt;/p&gt;&lt;p&gt;According to the 2004 survey of ASI members (all numbers in U.S. dollars):&lt;/p&gt;&lt;ul&gt;&lt;li&gt;median per-page rate: $3.26 to $3.50 &lt;/li&gt;&lt;li&gt;median hourly rate: $30 to $40 &lt;/li&gt;&lt;li&gt;median per-entry rate: $0.70 to $0.79&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Note that MEDIAN rates do not necessarily match an indexer's lifestyle, workload, or typical projects. For example, some indexers work exclusively on the kinds of projects that earn more (or less) than the median. In other words, these are NOT target numbers; rather, they are reflective of the variety of everything that indexers do.&lt;/p&gt;&lt;p&gt;Synthesizing these numbers:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;typical indexing speed: 10 pages per hour &lt;em&gt;or &lt;/em&gt;45 entries per hour&lt;/li&gt;&lt;li&gt;typical index density: 4.5 entries per page&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;From the survey:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;average annual income: $33,325 &lt;/li&gt;&lt;li&gt;part-timers (&lt;32&gt;&lt;li&gt;full-timers (40+ h/wk) in survey: 12% median&lt;/li&gt;&lt;li&gt;income for full-timers: $45k-$49k [from 2000 survey]&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Synthesis:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;To make $45k/year at $4/page = 11,125 indexable pages per year&lt;br /&gt;= thirty-seven 300-page books per year&lt;br /&gt;&lt;/li&gt;&lt;li&gt;At 10 pages/hour, you must index for 185 six-hour days/year &lt;/li&gt;&lt;li&gt;At 20 pages/hour, you must index for 100 six-hour days/year&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;If you want to make more money, focus on throughput: projects that are easier for you, more effective indexing tools, improved marketing, stronger client relationships, etc. In fact, the reason that advanced indexers tend to make more money is that they have been given the opportunity to build these skills: speed, marketing, relationships. For example, indexing a single book for a single author might be short-term lucrative, but building relationships with the author's institution is more lucrative in the long term. Also, experience clearly counts toward speed, too, while short-cutting quality can seriously damage relationships.&lt;/p&gt;&lt;p&gt;If you think of your career in terms of throughput, you might think about your day-to-day tasks differently. For example, there have been debates among indexers regarding the sharing of book mistakes caught (like misspellings); when thinking about throughput, sending such mistakes (a) slows you down, but (b) improves repeat business. On the other hand, when you've got a client who provides you with only one book a year, it's all loss, and no trade-off, in terms of income.&lt;/p&gt;&lt;p&gt;Finally, when I gave this presentation I made it clear that income isn't the only reason we're doing what we do. After all, there are more lucrative professions out there in the world. If you're earning a ton of money but destroying your health, sacrificing your happiness, hurting your family, or failing yourself in some other important way, then please reconsider your priorities.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-5232678195205015342?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/5232678195205015342/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=5232678195205015342&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/5232678195205015342'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/5232678195205015342'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/05/throughput-in-indexing.html' title='Throughput in indexing'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-7945768904002359716</id><published>2007-03-25T18:38:00.000-05:00</published><updated>2007-03-25T19:04:22.867-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing process'/><category scheme='http://www.blogger.com/atom/ns#' term='future of indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='human factors'/><category scheme='http://www.blogger.com/atom/ns#' term='business of indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='search engines'/><title type='text'>Indexers indexing infinitely ... like monkeys</title><content type='html'>Three ideas have merged.&lt;br /&gt;&lt;br /&gt;First, there's the idea I published last December as &lt;a href="http://maislin.blogspot.com/2006_12_01_archive.html#7116072763820819135"&gt;"A needle in a haystack with 100,000,000 blades,"&lt;/a&gt; where I argued how the Web, or an approximation thereof, could be indexed by humans for a reasonable amount of money.&lt;br /&gt;&lt;br /&gt;Second, there's &lt;em&gt;The New York Times&lt;/em&gt; article &lt;a href="http://www.nytimes.com/2007/03/25/business/yourmoney/25Stream.html"&gt;"Artificial Intelligence, With Help From the Humans,"&lt;/a&gt; in which we learn that the Amazon Mechanical Turk service subcontracts human workers to perform tasks that are especially challenging for computers to accomplish, such as matching images to textual descriptions. For some jobs, &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;Turkworkers&lt;/span&gt;&lt;/span&gt; might make one penny per transaction.&lt;br /&gt;&lt;br /&gt;And finally, there's the &lt;a href="http://en.wikipedia.org/wiki/Infinite_monkey_theorem"&gt;infinite money theorem&lt;/a&gt;, which states that a monkey hitting keys at a typewriter for an &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;infinite&lt;/span&gt; amount of time will "almost surely" type the complete works or Shakespeare, or something similar. I first heard this ideas as a "million monkeys and million years," but I bet the math's a bit different. After all, "infinite" is much bigger than a million million.&lt;br /&gt;&lt;br /&gt;Putting these ideas together seems to provide a rather obvious solution: third-world indexers. After all, if it costs only a nickel to get someone to write a few keywords for something, we can get a lot of indexing done very cheaply; I say "third world" because no indexer I've ever known is willing to work for a penny per word.&lt;br /&gt;&lt;br /&gt;The indexing industry is facing the very real possibility that our workload will be taken from us and delivered to those in economies that allow lower prices. But what if we went a step further and, instead of looking for less expensive indexers with good qualifications, we decided to look for dirt cheap indexers with no qualification other than time to waste? What if, I ask, we asked monkeys to pound away at their keyboards?&lt;br /&gt;&lt;br /&gt;I find the idea amusing but too close to the truth. After all, the intelligence behind Google is the social intelligence, the uneven and culturally biased workings of millions of Internet users plugging away at their disparate tasks. What Mechanical Turk has going for it, then, is the human decision making at the back end. Whereas most search engines look for better and greater stores of &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;metadata&lt;/span&gt;&lt;/span&gt; with which to judge content, one man in a back room can make smarter decisions upon command. No, the real problem is that today's human intelligence is worth only pennies per word. Computers do their best, and humans sweep up afterwards. Our natural intelligence isn't worth a whole lot, I guess.&lt;br /&gt;&lt;br /&gt;That's how we know computers are smart. Computers own us monkeys.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-7945768904002359716?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/7945768904002359716/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=7945768904002359716&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/7945768904002359716'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/7945768904002359716'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/03/indexers-indexing-infinitely-like.html' title='Indexers indexing infinitely ... like monkeys'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-3987181895920351972</id><published>2007-03-24T10:29:00.000-05:00</published><updated>2007-03-24T12:34:02.779-05:00</updated><title type='text'>The passive-aggressive bullies of the information world</title><content type='html'>&lt;p&gt;An indexer, while building an index of historical documents for a small township on Cape Cod, Massachusetts, came across an old diary written during the American Civil War. She scanned the pages, filled with small and semi-illegibly handwritten words, and realized that nothing important had been written.&lt;br /&gt;&lt;br /&gt;The diary went &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;unindexed&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;This anecdote, shared by indexer &lt;a href="http://www.marisol.com/rowland.htm"&gt;Marilyn Rowland&lt;/a&gt; at the March 24 (2007) meeting of the &lt;a href="http://www.newenglandindexers.org/"&gt;New England Chapter of the American Society of Indexers&lt;/a&gt;, struck me as surprisingly uncomfortable. Certainly I agree that when something seems unimportant to the indexer, it should not be indexed; in fact, I've claimed many times within this blog that one of the biggest failings of computer-generated lists and search engine algorithms is that they cannot identify the true value (or correctness) of content, even when using social algorithms.&lt;br /&gt;&lt;br /&gt;Still, not indexing &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;someone's&lt;/span&gt; diary? This sounds passive-aggressive. So does this instruction: "Don't index the names of everyone in that photograph. Mention these two important people, and don't bother with the rest."&lt;br /&gt;&lt;br /&gt;Just as scientists are often accused of sacrificing ethics and social responsibility in favor of "pure scientific exploration" (the temptation &lt;a href="http://www.puaf.umd.edu/IPPP/Fall97Report/cloning.htm"&gt;to clone human beings&lt;/a&gt; is a fun example), so might indexers be accused of excessive marginalization or trivialization of content. It may be human nature to filter out everything we don't need to survive or enjoy ourselves in our lives, but it is an indexer's nature to impose these filters upon future users. In other words, indexers are responsible -- on a daily basis -- for rewriting history.&lt;br /&gt;&lt;br /&gt;Everything we create in our lives -- email messages to diaries, family snapshots to oil paintings, back-of-the-napkin notations to dissertations -- is subjected not just to the entropy of time but also the red pen of the indexers. We may speak about the value of individuals, but in reality it's just a big game of &lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/Survivor_(TV_series)"&gt;Survivor&lt;/a&gt;,&lt;/em&gt; where the indexers are the ones to vote our creativity out of existence.&lt;/p&gt;&lt;p&gt;There is no good way to remove indexers from the equation, of course. If nothing were indexed, and no content were ever deemed to be more valuable (worth finding) than something else, content would be lost in the same way a paper cup with a lipstick stain inevitably disappears into a landfill. But who would have believed that &lt;em&gt;indexers &lt;/em&gt;are the ones in control, that &lt;em&gt;indexers &lt;/em&gt;are &lt;a href="http://en.wikipedia.org/wiki/The_Langoliers"&gt;&lt;em&gt;the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;Langoliers&lt;/span&gt;&lt;/em&gt;&lt;/a&gt;, who like the big kids in school get to decide who gets picked first for the schoolyard team, and who doesn't get picked at all. We are, let's face it, the bullies of the information world.&lt;/p&gt;&lt;p&gt;Don't mess with me. I'll erase you.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-3987181895920351972?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/3987181895920351972/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=3987181895920351972&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/3987181895920351972'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/3987181895920351972'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/03/passive-aggressive-bullies-of.html' title='The passive-aggressive bullies of the information world'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-4078694221105907494</id><published>2007-03-17T22:11:00.000-05:00</published><updated>2007-03-17T22:13:02.918-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing process'/><category scheme='http://www.blogger.com/atom/ns#' term='indexing tools'/><title type='text'>Notes on automatic indexing</title><content type='html'>"Automated indexing software" is, according to the common definition, software that analyzes text and produces an index without human involvement. I'm a firm believer that the technology doesn't exist, and that a human being is required to write an index. Thus I don't use the software, and I also don't recommend it.&lt;br /&gt;&lt;br /&gt;There are those who advocate it, arguing that it's "not as bad as an indexer would have you think." These people are often coming from the standpoint that automatic software is faster and cheaper, and they're right. Thus the issue surrounds quality.&lt;br /&gt;&lt;br /&gt;I believe that good automatic indexes will exist once there's good artificial intelligence, something that presently doesn't exist. In very limited circumstances, however, it does; a machine can easily cull capitalized words from a textbook to create an approximation of an index of names -- although, again, the machine isn't going to differentiate between names like "David Kelley" and places like "San Francisco," since they are both of the same format and used the same way. It also won't know that "Bill Clinton" is also "William Jefferson Clinton." And certainly it can't tell when the name is being mentioned in an unuseful and trivial way, as are the names in this paragraph! So imagine the problems trying to get a machine to parse full sentences of ideas and recognizing the core ideas, the important terms, and the relationships between related concepts throughout the entire text.&lt;br /&gt;&lt;br /&gt;FYI, those who advocate automatic software, however, would argue that the machine gets "close enough" so that a human being can edit the resulting product. However, expert evaluators unanimously agree that the software fails; those who disagree are likely those who are sufficiently ignorant of indexing in the first place such that they are unable to determine the quality differences.&lt;br /&gt;&lt;br /&gt;Oh, I should mention that there are software programs that human indexers use to simplify and speed up the mechanics of the index process. For example, it would be silly to disallow a computer to alphabetize the entries, reformat the index, and manipulate page numbers. There are a few software packages that do this exclusively, which are considered top of the line; other applications that have indexing capabilities, such as Microsoft Word and Adobe FrameMaker, have some of these capabilities, with notable limitations.&lt;br /&gt;&lt;br /&gt;For information on the various software available, see &lt;a href="http://www.asindexing.org/site/software.shtml"&gt;http://www.asindexing.org/site/software.shtml&lt;/a&gt;. If you have feedback, especially differing opinions, I'd love to hear them. Write me at &lt;a href="mailto:seth@maislin.com"&gt;seth@maislin.com&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;(This article was originally published in 2002 and 2004 -- and it's still 100% accurate.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-4078694221105907494?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/4078694221105907494/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=4078694221105907494&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/4078694221105907494'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/4078694221105907494'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/03/notes-on-automatic-indexing.html' title='Notes on automatic indexing'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-5117309795001232016</id><published>2007-03-05T00:12:00.000-05:00</published><updated>2007-03-05T00:27:04.728-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing process'/><category scheme='http://www.blogger.com/atom/ns#' term='Microsoft Word indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='human factors'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><title type='text'>Interpretation, not computation</title><content type='html'>After explaining the limitations of Microsoft Word's auto-indexing feature to one of the many people who write me asking for indexing advice, I got an interesting response. Clearly frustrated by the nonexistence of computer tools to do something as simple as generate a name index, he wrote:&lt;br /&gt;&lt;br /&gt;&gt; I'm amazed at the poor development of the science of indexing for printed matter such as books.&lt;br /&gt;&lt;br /&gt;I wrote back, "You misunderstand!"&lt;br /&gt;&lt;br /&gt;The science of indexing is quite broad, given that it has a history in long-ago library science. What seems undeveloped in this case are the tools, but that's a misunderstanding of what indexing is. Indexing is an editorial field, not an automatic one. You might say it's a lot like writing, in that the writer must decide what their readers want to read, and then the writer must communicate those ideas in an organized and approachable way. Indexing is the same: analysis of text to discover what readers might find interesting, and then multiply labeling and organizing those ideas so people can find them.&lt;br /&gt;&lt;br /&gt;Computers will never be able to write indexes because they can't (a) interpret importance of a concept, (b) understand concepts over simple words, and (c) connect ideas in contextually relevant ways. As much as I admire the Google.com search engine for what it can do, once again I will demonstrate what it &lt;em&gt;can't &lt;/em&gt;do. Google finds 10,000,000 things when we really only want 3 (or 10 or 20). It finds what we type, but it doesn't find synonyms. And there's no guarantee that Google is searching everything that's out there, though it appears to come close; in book indexing, however, there's a human to make sure every page was considered.&lt;br /&gt;&lt;br /&gt;How often has Microsoft Word attempted to auto-correct you in a completely inaccurate way? Spell-check? Auto-format? Auto-complete? Half-intelligent humans don't make the kinds of mistakes that these tools do.&lt;br /&gt;&lt;br /&gt;Here's what I wish he had written:&lt;br /&gt;&lt;br /&gt;&gt; I'm amazed that people who know full well that computers could never write newspaper articles still believe computers can write indexes.&lt;br /&gt;&lt;br /&gt;Another problem, of course, is that indexes aren't respected in the industry. The reason Microsoft Word even &lt;em&gt;has &lt;/em&gt;an automatic indexing feature is because the people who wrote that software have no idea of the damage such a tool provides. That Word's {XE} functionality is so miserable is even further proof. There's a nasty cycle: people use inferior tools, quality indexing grows less likely, and inferior tools become the standard.&lt;br /&gt;&lt;br /&gt;Indexing is an editorial process, just like writing and editing. Indexing requires interpretation, not computation.&lt;br /&gt;&lt;br /&gt;Computers will not and &lt;em&gt;should not &lt;/em&gt;be used as indexers. If my job ever dies because computer programmers have found a way to make me obsolete, at least I know I'll be in the enlightening company of human writers and artists.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-5117309795001232016?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/5117309795001232016/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=5117309795001232016&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/5117309795001232016'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/5117309795001232016'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/03/interpretation-not-computation.html' title='Interpretation, not computation'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-1299669133903723321</id><published>2007-02-07T22:13:00.000-05:00</published><updated>2007-02-06T11:30:50.672-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing process'/><category scheme='http://www.blogger.com/atom/ns#' term='future of indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='power of information'/><title type='text'>Indexes are the speed limit on the information highway</title><content type='html'>The growing demand for indexes that point to online, changing, and custom content is forcing a huge gap into the indexing industry, and that gap is physical.&lt;br /&gt;&lt;br /&gt;With traditional publishing, if I wanted a reader to find content on page 114, I would simply have the number 114 show up in my index: "credit card fraud, 114." To make this work, however, that page number must be immutable across the lifetime of my index. Should the content be republished in a different format, layout, or language, or with significant edits within the first 100-plus pages, my index could be rendered inaccurate. In other words, if I type "114" in my index, that content had better be on page 114.&lt;br /&gt;&lt;br /&gt;The appeal of fluid content, however, is slowly making traditional information delivery obsolete. Not only are books republished for lots of "traditional reasons" (e.g., updated editions, new languages, different book and print sizes), but technology is enabling books to be published without a single physical page. With the possible exception of the Adobe PDF format (which purposefully preserves the overall book-like format in an electronic file), page breaks are optional and subjective. A Web page or HTML document can have a scroll bar, such that there are no pages; an e-book intended for a handheld reader is paged according to the size of the reader; a news or magazine article of any length can be broken in two or three simply to increase ad sales; and some electronic documents can be edited by the readers such that anything goes.&lt;br /&gt;&lt;br /&gt;Ah, how I miss the days when 114 meant 114.&lt;br /&gt;&lt;br /&gt;Indexing content that changes is going to be hard, but the fundamental challenge isn't about keeping up with what was newly published today, or even in the last twenty minutes. It's about content ownership. When content is moving around all the time, indexers don't have a good way to tracking where that content is going.&lt;br /&gt;&lt;br /&gt;As an analogy, consider a classroom filled with thirty students, with one student at each desk. If you have a photograph of where every student is sitting, you could leave the room and generate a spreadsheet that lists each person's name and seat location. But what happens when the students are playing musical chairs? Every photograph you take is outdated almost immediately; even staying in the room wouldn't be good enough, because your typing speed will never match the speed of twenty kids jumping around. In fact, the only way you could manage a spreadsheet that shows where each student is sitting at all moments is if that spreadsheet operated in real time, by reference. In other words, if all thirty students carried GPS locator chips in their pockets, you could track the chips -- and thus the students -- by satellite. Your map could be as dynamic as what it is you're mapping.&lt;br /&gt;&lt;br /&gt;Embedded indexing, or indexing by reference, is a rudimentary and imperfect example of this process. With embedded indexing, I can have some kind of information inserted into the content -- like the GPS chip in the student's pocket -- and then I can generate an index based on where that information is at any one time. This blog entry, for example, has keywords attached to it; the website where my blog is published can, at any time, generate a list of all entries with that keyword. This kind of dynamic indexing is not uncommon these days; website content is served according to a number of immediate rules, and the result can be as simple as a website that publishes "Hello Seth Maislin" on my page but no one else's, or as complicated as an online stock trading program that keeps track of millions of private transactions.&lt;br /&gt;&lt;br /&gt;I say this is rudimentary, however, because it's still a snapshot. Perhaps it's convenient to have that snapshot taking at the moment I arrive at a website, but if I leave my browser at a website and walk away for 20 minutes, the picture doesn't have to change. The "Hello Seth Maislin" greeting made sense when I was sitting at the computer, but if I walk away and my wife sits down, it's now wrong. The snapshot is old. Google search results can change from one minute to the next. Even stock trading programs sport copious warnings that despite the best efforts of the website, the price you &lt;em&gt;think&lt;/em&gt; you're getting may not be the *actual* price when you complete a transaction; the delay between your clicking the mouse and the machines at the other end doing something is a legitimate and unavailable delay. Some website attempt to minimize this by taking a snapshot every fraction of a second, as if you were watching what was happening "live." In reality, there's still a delay, and there's still no way to truly synchronize everyone's machine.&lt;br /&gt;&lt;br /&gt;My point is that indexes to changing documentation must live apart from the documentation. If they really lived completely together, the content and the index would be essentially the same thing, just as the GPS chip and the student are really one merged object. But because indexes are &lt;em&gt;interpretations&lt;/em&gt; of content, there is always going to be a gap. The generation of the index be removed from the content that is being indexed, in order for that interpretation to take place.&lt;br /&gt;&lt;br /&gt;The only way for indexing to survive, I think, is for content to slow down. And because I believe indexing -- interpretation -- is critical for learning, the only logical conclusion is that content &lt;em&gt;will&lt;/em&gt; slow down.&lt;br /&gt;&lt;br /&gt;The need for an index is the logical limit of just how fast data can travel.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-1299669133903723321?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/1299669133903723321/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=1299669133903723321&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/1299669133903723321'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/1299669133903723321'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/02/indexes-are-speed-limit-on-information.html' title='Indexes are the speed limit on the information highway'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-6265700651245355259</id><published>2007-02-03T13:12:00.000-05:00</published><updated>2007-02-03T13:24:44.474-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='business of indexing'/><title type='text'>TAA Conference in Buffalo, June 22-23</title><content type='html'>I have been scheduled to present twice at the &lt;a href="http://www.taaonline.net/TAAConference/index.html"&gt;2007 conference for the Text and Academic Authors Association (TAA)&lt;/a&gt;. I'm excited about these presentations -- actually, one of them is a roundtable -- because this will be perhaps the first time when my audience is predominantly authors. Although I have taught indexing to numerous technical writers over the years, the nature of their writing is significantly different to that of other authors. &lt;a href="http://www.taaonline.net/"&gt;TAA&lt;/a&gt; members tend to write journal articles and textbooks; they are writing because they &lt;em&gt;want &lt;/em&gt;to share information (whether driven by a simple desire to share knowledge or by more complicated goals like industry prestige, peer respect, or job security), whereas technical writers are obligated to write documentation as part of a larger project.&lt;br /&gt;&lt;br /&gt;In some ways, having this opportunity to reach out to the authoring community represents a longer reach than usual, in that most indexers ply their trade among the publishers themselves, who manage the book production but don't do any of the writing. Although any business benefits I receive from these talks won't be as lucrative as the others -- convincing one author to hire me for the job isn't as valuable as convincing one publisher to hire me for &lt;em&gt;several &lt;/em&gt;jobs -- the advocacy benefits are likely bigger but unknown. I often think that the indexing process is hidden from authors, despite their desire to see quality indexes appended to their work.&lt;br /&gt;&lt;br /&gt;If the indexing industry is going to grow, it won't be because the indexers have advocated for themselves. No, indexers will be prominent only when others -- like writers -- advocate for them and their products.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-6265700651245355259?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/6265700651245355259/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=6265700651245355259&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/6265700651245355259'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/6265700651245355259'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/02/taa-conference-in-buffalo-june-22-23.html' title='TAA Conference in Buffalo, June 22-23'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-6711994005113238974</id><published>2007-01-20T11:15:00.000-05:00</published><updated>2007-02-06T11:30:50.713-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='web indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='books'/><title type='text'>Foreword to Heather Hedden's upcoming book</title><content type='html'>&lt;div align="left"&gt;I was asked to write the foreword to Heather Hedden's upcoming &lt;em&gt;Indexing Specialties: Web Sites,&lt;/em&gt; to be published in 2007 by ITI. Given the importance of this book in the indexing industry, I am reprinting that foreword here. For more information on the book itself (not yet available), visit either &lt;a href="http://www.asindexing.org/site/asipub.shtml"&gt;ASI's publications page&lt;/a&gt; or a list of &lt;a href="http://books.infotoday.com/books/index.shtml#index"&gt;ITI's indexing publications&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;- - - - - -&lt;br /&gt;&lt;span style="font-family:times new roman;"&gt;&lt;strong&gt;Foreword&lt;br /&gt;&lt;/strong&gt;&lt;br /&gt;Indexing is not a popular profession by any stretch of the imagination. Not only is it almost completely unknown in lay circles, but let's be honest: writing indexes sounds about as exciting as cleaning the house, but a hundred times harder. Also, if you were born in any year before 1990, the idea of Web indexing sounds like cleaning a house in outer space. I mean, there are no houses in outer space.&lt;br /&gt;&lt;br /&gt;The Internet and the Web -- this monstrously huge and growing system of sharing data -- desperately need more information sorcerers like Heather Hedden. Not only does Heather have the talent to recognize when knowledge is missing, but she also has the ability to make that knowledge visible. She starts by learning for herself, and then she loves to share.&lt;br /&gt;&lt;br /&gt;Heather and I first crossed paths in my classroom, where I taught a course called "Writing Indexes for Books and Websites." My course was written to explore the questions and theories of indexing, and so couldn't be limited to just books. Heather’s interest went much further, and since then she has explored writing web indexes as a singular discipline. For me, Heather has been a student, an apprentice, and a role model. She's someone I count on to get things done. She has vaulted across the lines from library science to book indexing to web indexing, each time with surprising success, and has since become a renowned and respected expert in the web indexing community.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Indexing Specialties: Web Sites &lt;/em&gt;is a book filled with honest, get-it-done advice. Heather is not afraid to talk about the code and the tools, because she has faith in her readers. In her hands, the complicated stuff looks straightforward. Besides, when the technical lessons are over, Heather shows readers how to think about web indexing as well: as a process and as a business. Until now, if book indexers wanted to graduate to the Internet frontier, they had no unified place of reference, no single source of everything they'd want to know. In fact, some of the tools Heather includes in this book were almost completely unknown to indexers until now.&lt;br /&gt;&lt;br /&gt;I am excited and pleased to see Heather compiling this knowledge in a book. She has put into print an indexer's Rosetta Stone, which will lead book indexers toward other information management topics like taxonomies, information architecture, and search tools. It's not about complicated coding practices and computer programs, but about the guidelines to getting that A-to-Z index published on the Internet, and doing it right.&lt;br /&gt;&lt;br /&gt;She begins by exploring the boundaries of web site indexing, clarifying what kinds of sites need indexing, how they should look, and how they should work. Then she immediately provides the HTML building blocks to making your indexes appear on the Web, the surprisingly simple code you'd need to create index pages, index entries, indentations, hyperlinks, and cross-reference links. If you've never programmed on the Web before and are afraid it's over your head, you’ll be kicking yourself once you see how easy Heather makes it.&lt;br /&gt;&lt;br /&gt;Once you're armed with the grammar, you next need the tools to actually write. Heather gives you the detail about the tools (&lt;/span&gt;&lt;a href="http://indexres.com/home.php"&gt;CINDEX&lt;/a&gt;&lt;span style="font-family:times new roman;"&gt;, &lt;/span&gt;&lt;a href="http://www.html-indexer.com/"&gt;&lt;span style="font-family:times new roman;"&gt;HTML Indexer&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:times new roman;"&gt;, &lt;/span&gt;&lt;a href="http://www.levtechinc.com/ProdServ/LTUtils/HTMLPrep.htm"&gt;&lt;span style="font-family:times new roman;"&gt;HTML/Prep&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:times new roman;"&gt;, &lt;/span&gt;&lt;a href="http://www.macrex.com/"&gt;Macrex&lt;/a&gt;&lt;span style="font-family:times new roman;"&gt;, &lt;/span&gt;&lt;a href="http://www.sky-software.com/"&gt;&lt;span style="font-family:times new roman;"&gt;SKY Index Professional&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:times new roman;"&gt;, and &lt;/span&gt;&lt;a href="http://publish.uwo.ca/~craven/xrefhtju.htm"&gt;XRefHT&lt;/a&gt;&lt;span style="font-family:times new roman;"&gt;) to create or generate indexes that are ready for web publication. She takes more time exploring the specialized tools of XRefHT and HTML Indexer, two stand-alone web indexing applications, and shows how you can use their features with agility.&lt;br /&gt;&lt;br /&gt;The last third of the book is dedicated to the "mindspace" of web indexing. There's more to indexing than just the tools, and so Heather writes carefully about how indexers should approach the job. She addresses the challenges of working out of order, adding anchors, indexing periodicals, and knowing which pages and at what level of detail you should index. She deals in detail with cross-references, language, subentry structure, and format. Finally, Heather dives into the nitty-gritty of the web indexing marketplace, including how to market yourself as a web site indexer.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Web Sites &lt;/em&gt;is going to satisfy you immediately and in the long term. On behalf of the American Society of Indexers -- and myself, personally -- I am honored to welcome Heather as an esteemed author in our community.&lt;br /&gt;&lt;br /&gt;Seth Maislin&lt;br /&gt;President of the American Society of Indexers (2006-2007)&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-6711994005113238974?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/6711994005113238974/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=6711994005113238974&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/6711994005113238974'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/6711994005113238974'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/01/foreword-to-heather-heddens-upcoming.html' title='Foreword to Heather Hedden&apos;s upcoming book'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-121955941814763775</id><published>2007-01-09T21:26:00.000-05:00</published><updated>2007-06-09T13:57:47.718-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='fun with indexing'/><title type='text'>Constructing a mysterious index</title><content type='html'>[NOTE: Edited on 6-9-07 to fix missing indentations for index entries.  -SM]&lt;br /&gt;&lt;br /&gt;For years I have puzzled over the possibility of writing an index, to an imaginary book, in which a mystery is revealed and potentially solved. The book itself would not have to be a mystery, but there would have to be some kind of secret.&lt;br /&gt;&lt;br /&gt;For example, suppose you had these entries:&lt;br /&gt;&lt;br /&gt;La Traviata, clandestine meeting at, 145&lt;br /&gt;Marters, Francine&lt;br /&gt;meeting at La Traviata, 145&lt;br /&gt;&lt;br /&gt;From these entries you would learn that Francine Marters met someone at La Traviata surreptitiously. An additional entry&lt;br /&gt;&lt;br /&gt;Rapiere, Evan&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;accidental discovery of La Traviata matches, 166&lt;br /&gt;&lt;br /&gt;implies not only that Evan was not the person at the restaurant, but also that Evan might have been the reason the secret was necessary. A final entry,&lt;br /&gt;&lt;br /&gt;Pfiser, Victor&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;confronted by Evan Rapiere, 231&lt;br /&gt;&lt;br /&gt;allows for the possibility that Victor was the other person with Francine (on page 145), such that Evan's discovery of the matches led to this confrontation.&lt;br /&gt;&lt;br /&gt;What would make an index like this potentially interesting as a puzzle would be (a) the randomization of information, caused by the alphabetization of entries; (b) the summary-style labels in the index, which must naturally leave out much of the story; (c) the creativity of the labels, which can emphasize or omit interesting facts without destroying the quality of the index itself; and (d) the ability to tell many overlapping and long stories across just a few pages.&lt;br /&gt;&lt;br /&gt;On the other hand, what makes an index puzzle challenging -- and the reason I've had no success so far -- is that the index must articulate all the facts; an index can have no secrets if it's going to work as a puzzle. For example, if this were a murder mystery, wouldn't the murder have to be indexed? If the index is going to be a good one (and that's a requirement for me, because otherwise it would seem too contrived), you'd have to have entries like these:&lt;br /&gt;&lt;br /&gt;Rapiere, Evan&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;murder of, 235&lt;br /&gt;Marters, Francine&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;guilty confession of, 469&lt;br /&gt;&lt;br /&gt;Is there any way that a legitimate index could &lt;em&gt;obfuscate &lt;/em&gt;information sufficiently enough to leave some mystery? In a way, an index must be too "honest" to allow for secrets.&lt;br /&gt;&lt;br /&gt;The other problem, opposite to the honesty problem described above, is that if an index isn't specific enough, it's impossible to put the facts together in the first place. For example, if I changed the &lt;em&gt;matches &lt;/em&gt;entry above to this:&lt;br /&gt;&lt;br /&gt;Rapiere, Evan&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;accidental discovery of match book, 166&lt;br /&gt;&lt;br /&gt;there's no way to connect this to La Traviata without help. Similarly, if I change the murder entry to simply this:&lt;br /&gt;&lt;br /&gt;Rapiere, Evan, 235&lt;br /&gt;&lt;br /&gt;or&lt;br /&gt;&lt;br /&gt;Rapiere, Evan&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;surprised by an intruder, 235&lt;br /&gt;&lt;br /&gt;then there is nothing in the index to clarify that Evan actually died.&lt;br /&gt;&lt;br /&gt;I'm looking for ideas on how to get around these challenges. How much integrity can the index maintain without either giving too much away or leaving too many holes in the story?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-121955941814763775?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/121955941814763775/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=121955941814763775&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/121955941814763775'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/121955941814763775'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/01/constructing-mysterious-index.html' title='Constructing a mysterious index'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-6969070531079308532</id><published>2007-01-03T22:46:00.000-05:00</published><updated>2007-01-03T22:53:33.384-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='human factors'/><category scheme='http://www.blogger.com/atom/ns#' term='business of indexing'/><title type='text'>My waitress, my audience</title><content type='html'>For some reason I felt like hugging my waitress this morning.&lt;br /&gt;&lt;br /&gt;There’s a diner in my hometown that I visit up to three times weekly, while my daughter is at school. They know me here. I have my special table, my eminent domain, where I plug in my computer. Kathleen, my regular waitress, brings me coffee even as I sit down and tries to guess what I want for breakfast, with some accuracy. She knows my name, too, which I feel is the best part. So in this new year, after I happily kissed 2006 good-bye, I was inspired to welcome my diner lifestyle in true living style. “Kathleen, can I give you a hug?” I asked, and she said yes.&lt;br /&gt;&lt;br /&gt;Then we get to talking. Her age, my age. Her career path, my career path. The kinds of things people always talk about at the start of a new year. To sum up her half of the conversation in just one sentence, I’d write this: She came to the realization that at the age of 54, if she had taken her father’s advice and gotten a job at the phone company when she was 20, she would be retired instead of working at the diner. (I have to add that she’s never had a vacation in her entire life, except for two weeks when this diner was closed temporarily, and that her job pays no benefits.)&lt;br /&gt;&lt;br /&gt;As a self-employed indexer working voluntarily where she works, what can I tell her?&lt;br /&gt;&lt;br /&gt;First, I tell her that my profession is dominated by women over the age of 45. Many of these women raised their children and wanted to do something different with their lives. Many felt oppressed by their imaginings of the traditional workplace, or were afraid of having insufficient skill to reenter the job market. Others no longer had the financial support of a spouse and simply needed to find work to stay comfortable, or even solvent. I see these women at every indexing meeting and in my classrooms. Almost all had never heard of indexing before, and they’re giving it a try. After all, being self-employed and reading books sounds a lot better than being buried in a cubicle.&lt;br /&gt;&lt;br /&gt;Then I tell her that once you’re truly self-employed, making enough money to pay monthly bills and quarterly taxes, in your life you will never experience unemployment again. You can’t be fired, you can’t be laid off, and you can’t be transferred to another division, location, building, team, or employer—not without first choosing it for yourself. As long as you have your core skills, you’re not much different from a child who can entertain himself with sticks, sand, and even an empty parking lot: the world is your workplace.&lt;br /&gt;&lt;br /&gt;Finally, I tell her that in my profession, everything I have every learned (and continue to learn) has a direct application to my job performance. My grandfather says “no knowledge is ever wasted,” and in my case he’s literally correct. Every conversation I have gives me better background on people and information, and every experience provides me with a potential story to share with my students.&lt;br /&gt;&lt;br /&gt;It’s not hard for me to be positive about what I do, because I love what I do. There are lot of perks, too, from being my own boss to sitting at my favorite table in the diner, where I am now. Kathleen, on the other hand, is pretty grumpy about her job. She gripes about a lack of benefits, a dearth of Social Security earnings, a regular 5:30am wake-up call. I have no doubt that I have it better than she does, at least in these ways.&lt;br /&gt;&lt;br /&gt;If there’s a professional lesson for me here, it’s that Kathleen is my audience. She is the person who reads the books I index, the person who might sit in my classroom. I have to remember that if I do my job correctly, I am building a bridge from me, the college-educated small business owner who works off his laptop, to people like her. The gap in our lives is the challenge I face every time I sit down to invent a keyword, and index entry, or a label for a hyperlink.&lt;br /&gt;&lt;br /&gt;In the world of indexing, these life differences are surmountable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-6969070531079308532?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/6969070531079308532/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=6969070531079308532&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/6969070531079308532'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/6969070531079308532'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2007/01/my-waitress-my-audience.html' title='My waitress, my audience'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-2700829619447235045</id><published>2006-12-28T10:49:00.000-05:00</published><updated>2006-12-28T10:58:58.539-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='spamming and similar behaviors'/><category scheme='http://www.blogger.com/atom/ns#' term='search engines'/><category scheme='http://www.blogger.com/atom/ns#' term='misspellings and other errors'/><category scheme='http://www.blogger.com/atom/ns#' term='web indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='social algorithms'/><title type='text'>Eighteen million people can't be wrong</title><content type='html'>&lt;p&gt;No matter how much you and I might like Google, the fact is that Google has some very serious problems with it comes to finding content. More specifically, if you're looking for the "right" answer, or if you're attempting to do any serious research, Google is likely to fail you miserably.&lt;/p&gt;&lt;p&gt;The flaw lies in &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0" onclick="BLOG_clickHandler(this)"&gt;Google's&lt;/span&gt; strength: &lt;em&gt;social algorithms.&lt;/em&gt; Social algorithms are processes in which decisions are made by watching and following the majority of people in a community. If blogger.com tends to be the place people go to create blogs, then a social algorithm will see blogger.com as "better." When a search engine is managed by a social algorithm, a website might appear first in search results not because of the quality of site, but rather because a larger number of people treated the site as if were of higher quality. In other words, social algorithms equate "majority" with "best," something that often looks right but actually is patently untrue. &lt;/p&gt;&lt;p&gt;When you perform a search at Google.com, your results are sorted based on majority behavior and little else. For simple questions about anything -- as well as complex questions about cultural issues, for which "lots of people" is critical -- frequently the majority opinion is rather close to what you want -- which is why Google is so successful. But the gap between "close to what you want" and "accurate" is an invisible one, and that makes it insidious and dangerous.&lt;/p&gt;&lt;p&gt;For example, search for "Seth &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1" onclick="BLOG_clickHandler(this)"&gt;Maislin&lt;/span&gt;." The first hit is &lt;a href="http://taxonomist.tripod.com/"&gt;my website&lt;/a&gt;. The second hit is &lt;a href="http://maislin.blogspot.com/"&gt;this blog&lt;/a&gt;. The third hit is &lt;a href="http://www.oreilly.com/news/seth_0799.html"&gt;an interview I did for &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_2" onclick="BLOG_clickHandler(this)"&gt;O'Reilly&lt;/span&gt; &amp; Associates in July 1999&lt;/a&gt;. An investigation of why these are the top three sites is rather interesting. First of all, these are the only results in which my name actually appears in the title; the fourth link and beyond have my name in the document, but not the title. Second, my website appears at the top not because it's the definitive website about "Seth &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_3" onclick="BLOG_clickHandler(this)"&gt;Maislin&lt;/span&gt;," but because Google knows of 24 people linking to it. In comparison, the only person who ever created a link to this blog is me -- a number far less than 24! The same goes for the &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_4" onclick="BLOG_clickHandler(this)"&gt;O'Reilly&lt;/span&gt; interview, except that the single linker isn't even a valid site any more: it's broken. The popularity of my home page (in comparison to this blog, for example) is why it's a better hit for my name. But if you folks out there started to actually link to this blog, that would change.&lt;/p&gt;&lt;p&gt;You should look into the search results for the word "Jew." A website known as &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_5" onclick="BLOG_clickHandler(this)"&gt;JewWatch&lt;/span&gt;.com, an offensive and inflammatory collection of antisemitic content, had appeared as the number-one result at Google.com for this one-word query. This happened because a large number of supporters of this site tended to build links to it; then, those were were outraged or amused also linked to it within their protestations. In the end, the social algorithms at Google recognized how popular (i.e., "linked to") this site was, and in response rated it very highly -- in fact, rated it first -- compared to all other websites with the word "Jew" in the title. Eventually, those who were enraged by this content fought back by asking as many people as possible to link somewhere else -- specifically, the &lt;a href="http://en.wikipedia.org/wiki/Jew"&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_6" onclick="BLOG_clickHandler(this)"&gt;Wikipedia&lt;/span&gt; definition of Jew&lt;/a&gt; -- just as I have here. Over time, more people linked to &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_7" onclick="BLOG_clickHandler(this)"&gt;Wikipedia&lt;/span&gt; than to &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_8" onclick="BLOG_clickHandler(this)"&gt;JewWatch&lt;/span&gt;, and so the latter dropped into second place at Google. This process of building networks of links in order to influence &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_9" onclick="BLOG_clickHandler(this)"&gt;Google's&lt;/span&gt; social algorithm is called "Google bombing." In other words, when the people who hated the site acted together in a large group, &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_10" onclick="BLOG_clickHandler(this)"&gt;Google's&lt;/span&gt; social algorithms responded.&lt;/p&gt;&lt;p&gt;(By the way, you'll notice that I do not create a link to the offensive site. I see no reason to contribute to its success.)&lt;/p&gt;&lt;p&gt;Do you see the problem? The success of Google bombing is analogous to the squeaky wheel metaphor, that the loudest complainer gets the best service. Social algorithms reward the most popular, regardless of whether they deserve it. &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_11" onclick="BLOG_clickHandler(this)"&gt;JewWatch&lt;/span&gt; made it to the top because it was popular first; &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_12" onclick="BLOG_clickHandler(this)"&gt;Wikipedia's&lt;/span&gt; definition moved to the top because those offended banded together to demonstrate even more loudly. And in the end, there's no reason for me to think either of these links is best.&lt;/p&gt;&lt;p&gt;Whether popularity is a good thing or a bad thing is often subjective. In language, some people lament the existence of the word &lt;em&gt;ain't,&lt;/em&gt; while others applaud its existence as an inevitable sign of change; either way, the word is showing up in our dictionaries because more and more people are using it. But I'm not talking about language; I'm talking about truth. &lt;/p&gt;&lt;p&gt;Do you think vitamin C is good at preventing colds? Well, it isn't; there have been no studies demonstrate its effectiveness, but there have been studies that show it makes no real difference. (It's believed that vitamin C will shorten the length of a cold, but studies are still inconclusive.) But after a doctor popularized the idea of vitamin &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_13" onclick="BLOG_clickHandler(this)"&gt;megadosing&lt;/span&gt;, our entire culture suddenly believes taking the vitamin will keep you extra healthy. Untrue.&lt;/p&gt;&lt;p&gt;Do you know why "ham and eggs" is considered a typical American breakfast? Because an advertising executive in the pork industry used Freudian psychology to convince people to eat ham for breakfast. He did it by asking American doctors if they thought hearty breakfasts were a good thing (which they did); the ad-man then asked if ham were a hearty food. Voila: ham, sausage, and bacon are American breakfast staples, and the continental breakfast vanished from our culture.&lt;/p&gt;&lt;p&gt;In both of these examples, majority belief trumps the truth. And look at the arguments about global warming! I won't repeat the arguments laid out by Al Gore in &lt;em&gt;&lt;a href="http://www.climatecrisis.net/"&gt;An Inconvenient Truth&lt;/a&gt;,&lt;/em&gt; but his argument is that as long as enough people insist that global warming isn't true, its dangers will remain unheeded. In fact, I'm not even going to argue here whether global warming is a real thing or not; it doesn't matter what I believe. What matters is that the debate over global warming isn't a fight over the facts. Instead, it's a shouting match, in which the majority wins. Right now, so many influential people have argued that it doesn't exist (or isn't such a big deal) that very little has been done in this country in response to its possible existence. But as more and more people start to believe it's at least possible, it's becoming a reality. Doesn't that just drive you nuts? Why are the facts behind global warming driven by democracy? &lt;em&gt;Can't something be true even if no one believes in it?&lt;/em&gt;&lt;/p&gt;&lt;p&gt;One last look at this "majority rules" concept, only this time let's avoid politics and focus on simple word spelling. If you search for the word &lt;em&gt;millennium,&lt;/em&gt; correctly spelled with two &lt;em&gt;L&lt;/em&gt;s and two &lt;em&gt;N&lt;/em&gt;s, you'll get about 54 million hits at Google (English-language pages only). If you search for the word &lt;em&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_14" onclick="BLOG_clickHandler(this)"&gt;millenium&lt;/span&gt;,&lt;/em&gt; misspelled with two &lt;em&gt;L&lt;/em&gt;s and only one &lt;em&gt;N,&lt;/em&gt; you'll get 18 million hits. Twenty-five percent of all websites have this misspelling in them! For content that's published, that by its very nature is biased toward having only correct spellings, this error rate is monstrous! But does Google let you know that &lt;em&gt;&lt;span class="blsp-spelling-error" id="SPELLING_ERROR_15" onclick="BLOG_clickHandler(this)"&gt;millenium&lt;/span&gt; &lt;/em&gt;is misspelled? Does it ask you if you "meant to type &lt;em&gt;millennium&lt;/em&gt;?" No! After all, Google considers the misspelled word correct.&lt;/p&gt;&lt;p&gt;I mean, eighteen million people can't be wrong, right?&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-2700829619447235045?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/2700829619447235045/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=2700829619447235045&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/2700829619447235045'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/2700829619447235045'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/12/eighteen-million-people-cant-be-wrong_28.html' title='Eighteen million people can&apos;t be wrong'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-5936954812765437386</id><published>2006-12-24T15:33:00.000-05:00</published><updated>2008-01-02T20:11:15.962-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='search engines'/><category scheme='http://www.blogger.com/atom/ns#' term='keywording'/><title type='text'>I met a famous indexer the other day</title><content type='html'>In my &lt;a href="http://maislin.blogspot.com/2006/03/frustrated-by-lack-of-meaning.html"&gt;March 20 post ("Frustrated by a lack of meaning")&lt;/a&gt;, I made reference to a Microsoft clip art mess that was quite public. The story is that the keyword "monkey bars" caused certain images to appear when someone searched for the word "monkey," and that these results were misinterpreted in a strongly negative way.&lt;br /&gt;&lt;br /&gt;Well, I met the indexer who actually wrote those keywords -- someone I've known for a long time -- and I have to say, there's something really cool about realizing that one of your good colleagues was behind that story. I also find it reassuring that the indexer is someone who really knows what she's doing, because it emphasizes just how far apart good indexing is from good search: smart people, dumb tools.&lt;br /&gt;&lt;br /&gt;For more on this subject, I recommend reading &lt;a href="http://www.amazon.com/Inmates-Are-Running-Asylum-Products/dp/0672326140/sr=8-1/qid=1166420052/ref=pd_bbs_sr_1/102-4093742-0908149?ie=UTF8&amp;amp;s=books"&gt;The Inmates Are Running the Asylum&lt;/a&gt;. The book is about computer programming in general, but the sentiment is dead on.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-5936954812765437386?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/5936954812765437386/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=5936954812765437386&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/5936954812765437386'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/5936954812765437386'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/12/i-met-famous-indexer-other-day.html' title='I met a famous indexer the other day'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-7116072763820819135</id><published>2006-12-24T13:58:00.000-05:00</published><updated>2006-12-24T14:04:23.914-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing process'/><category scheme='http://www.blogger.com/atom/ns#' term='spamming and similar behaviors'/><category scheme='http://www.blogger.com/atom/ns#' term='search engines'/><category scheme='http://www.blogger.com/atom/ns#' term='web indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='keywording'/><title type='text'>A needle in a haystack with 100,000,000 blades</title><content type='html'>The Internet has more than 100 million websites, according to the &lt;a href="http://news.netcraft.com/archives/2006/11/01/november_2006_web_server_survey.html"&gt;November Netcraft survey&lt;/a&gt;. If you were standing on top of the growth curve, by now your stomach would have nothing left to vomit up.&lt;br /&gt;&lt;br /&gt;I did some math, and I've figured out a way to make sure that all of these websites are indexed. Here's what I discovered.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Between October and November 2006, approximately 3.5 million sites were created. Assuming that my team would be responsible for inventing a set of keywords for the whole site -- and not for individual pages or parts of pages -- we would have to build 3.5 million keyword sets.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Let's further assume that on average, every website would have four keywords or key phrases. For example, this blog would get the keywords "Seth Maislin," "indexing," "blog," and perhaps my company name, "Focus Information Services." Ideally we'd have the time to invent many more, since it's our goal to help the website perform well at the various search engines, but this team simply can't give everyone special attention. So I'm making the executive decision to limit ourselves to creating 4 terms each for 3.5 million sites.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Assuming that we can invent and type one keyword every two seconds -- a conservative estimate, given that my company name takes me a minimum of two seconds to type -- we'll need 28 million seconds to get the job done.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Now remember, we're just taking about the new sites created in October 2006. Consequently, we have only a month to get the job done before we have to start indexing the November 2006 sites. For this reason, I'm going to build a team of several people, with each one putting in eight hours per day, twenty days each month. That's 576,000 seconds per person per month.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Dividing 28,000,000 seconds per month by 576,000 seconds per person per month gives me 48.1 people, which I'll round to a nice 50 people. That means I need a team of just 50 people to get the job done.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;So there you go: a team of 50 people can index the Internet. That doesn't sound nearly as bad as I thought. Of course, everyone will have to type rather quickly, and we'll need a system in place to prevent us from accidentally indexing any one website more than once, but that shouldn't be too bad. And yes, I'm assuming that all of these websites are in English, but most of them are; I'll bring a few translators to work on the few remaining.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;At U.S.$50,000 per year per indexer, which is quite modest for a highly intense round-the-clock job like this, plus $100,000 for me as manager, I could probably put together a bid of about $350,000/year to get the job done. Given how many billions of dollars are spent or exchanged over the Internet today, that seems quite reasonable, too. Heck, I should triple the whole thing, since we'd have to re-index the old sites every once in a while. Maybe I should double it again, too, so we'd be allowed to use eight keywords instead of four.&lt;br /&gt;&lt;br /&gt;So let's see, that brings the total bill to to $2.1 million. Gosh, that isn't bad at all, is it? I mean, we all agree that indexing the Internet is at least a two-million-dollar-per-year business, right?&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Except it's not. &lt;/strong&gt;Indexing the Internet is a &lt;strong&gt;zero&lt;/strong&gt;-dollar-per-year business. No one is doing it. Just about no one seems to care about quality keywords. In fact, there are only two industries that exist around keyword creation. One of them is misnamed "search optimization," which is about spamming the heck out of the Web. Optimize, I think not: this is the &lt;em&gt;opposite &lt;/em&gt;of the intelligent product my team would be build. The other business is the search business itself, companies springing up around those fancy algorithms that Google, Yahoo, Lycos, Ask Jeeves, and the rest use. The thing is, those algorithms are just word-matching machines. These engines are looking for keywords, but none of them is actually writing any. So you see, no one with indexing training is writing any keywords. The inexpensive market for human indexers is being completely overlooked.&lt;br /&gt;&lt;br /&gt;Guess it's not worth the two million.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-7116072763820819135?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/7116072763820819135/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=7116072763820819135&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/7116072763820819135'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/7116072763820819135'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/12/needle-in-haystack-with-100000000.html' title='A needle in a haystack with 100,000,000 blades'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-1101083982270892672</id><published>2006-12-18T15:31:00.000-05:00</published><updated>2006-12-18T15:35:11.841-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing tools'/><title type='text'>"Can I Delete All My ___ Entries in MS Word?"</title><content type='html'>(This is the newest entry in my &lt;a href="http://taxonomist.tripod.com/indexing/wordproblems.html"&gt;MS Word Indexing FAQ&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;Every now and then, there's nothing you want to do more than globally delete a bunch of entries. The problem is how this is supposed to happen. For example, suppose you have a common main entry for "publicity," when you decide that you're better off with a cross reference like "publicity. &lt;em&gt;See&lt;/em&gt; marketing." In addition to creating this cross reference, you need to &lt;em&gt;remove&lt;/em&gt; all of your original &lt;em&gt;publicity&lt;/em&gt; entries. Although you can search for marker text, you can't search for whole markers. In other words, you can search for the word "publicity" when it's used within index markers (look for hidden text), but you can't search for a whole marker like {XE "publicity"} or {XE "publicity:methods for"}. For this reason you can search globally and delete.&lt;br /&gt;&lt;br /&gt;The easiest approach to deleting all &lt;em&gt;publicity&lt;/em&gt; entries is the manual approach: generate your index, then delete everything that starts with the word &lt;em&gt;publicity.&lt;/em&gt; Unfortunately, manual edits will be undone as soon as you generate the index again; you'll have to remember that you want to make these manual changes every time you create a new version of the index. To help you remember to make these manual changes, I recommend changing the format and/or language for the word &lt;em&gt;publicity&lt;/em&gt; to make sure it jumps out at you. Search for &lt;span style="font-family:courier new;"&gt;XE "publicity&lt;/span&gt;, the unique text for all publicity entries, and replace it with boldface, all caps, and a shocking color like red. I also recommend that you change the word &lt;em&gt;publicity&lt;/em&gt; with something that will sort at the very beginning of your index, such as &lt;em&gt;aaa DELETE ME.&lt;/em&gt; Now, when you generate your index, you'll see some red, boldface, all-caps reminder at the top of your index file. Hopefully this will be enough for you to remember deleting your entries.&lt;br /&gt;&lt;br /&gt;Another approach, and by far the one I prefer, is to replace the marker syntax with something that Word can't interpret. Instead of using the letters XE in your marker, use something like DELETE_ME. In other words, globally change &lt;span style="font-family:courier new;"&gt;XE "publicity&lt;/span&gt; with &lt;span style="font-family:courier new;"&gt;DELETE_ME "publicity&lt;/span&gt;. Since markers are hidden text, your DELETE_ME markers will remain hidden from publications; further, they'll fail to become index entries since Word won't interpret them as XE markers. The biggest advantage to this method is that it works globally, and you only have to make these changes once. Another advantage is that you aren't actually deleting the entry, just rewriting it; if for any reason you need to reconstruct entries, you can always change DELETE_ME to XE. (This is a kludgy way of creating conditional text, but it might be just what you need.) The disadvantage is that you're not actually deleting anything, potentially cluttering your documentation.&lt;br /&gt;&lt;br /&gt;As a side note, whenever you remove an entry from your index, remember that you have to delete any cross references that target those now-removed entries. For example, if you replace your &lt;em&gt;publicity&lt;/em&gt; entries with "publicity. &lt;em&gt;See&lt;/em&gt; marketing," you'll need to rewrite or delete entries like "public relations. &lt;em&gt;See also&lt;/em&gt; publicity."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-1101083982270892672?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/1101083982270892672/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=1101083982270892672&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/1101083982270892672'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/1101083982270892672'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/12/can-i-delete-all-my-entries-in-ms-word.html' title='&quot;Can I Delete All My ___ Entries in MS Word?&quot;'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-4697293818207474156</id><published>2006-12-16T21:27:00.000-05:00</published><updated>2006-12-16T22:23:27.361-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='future of indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='American Society of Indexing'/><title type='text'>ASI President's Letter (December 2006)</title><content type='html'>Below are the first few paragraphs of my letter as ASI president, published in &lt;em&gt;Key Words.&lt;/em&gt; The full letter is available in the December 2006 issues of the bulletin, available to ASI members at the &lt;a href="http://www.asindexing.org"&gt;ASI website&lt;/a&gt;, as well as &lt;a href="mailto:president@asindexing.org"&gt;by request&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;font-size:85%;"&gt;&lt;strong&gt;ASI: Prospective and Retrospective, a Presidential Perspective&lt;br /&gt;&lt;/strong&gt;&lt;br /&gt;At the end this year, I’ll write a letter to “Seth of 2008.”&lt;br /&gt;&lt;br /&gt;For a number of years I’ve been sending snapshots of my life to future “selves,” capturing a year’s events, achievements, and desires onto a couple pages. Even though I’m writing to myself, however, I’m trying to communicate with versions of me that don’t yet exist. Who will I be in 2008? Why will I want to know about today’s “me”? What about the Seth of 2014? So, after warming up my pen with details about family, house, job, art, and health, I inevitably get to the tough stuff: ambitions, anxieties, hopes, and disappointments. There’s an irony to the whole thing, knowing I’ll be reading the letter with perfect hindsight. It’s an incentive to improve every year.&lt;br /&gt;&lt;br /&gt;ASI’s strategic plan is just such a letter. With its many strategies and priorities, we’re informing our future society about some critical information. Our members have shared with us a vision in which indexing will be recognized and respected more; to reach this vision we’ll have to look critically at who we are, now and soon. With the hindsight we’ll have in 2008 (and 2010 and 2014), I don’t want us to feel nostalgic when we look back. I want us to feel successful. I want us to be glad that we live in better times.&lt;br /&gt;&lt;br /&gt;The conflict between the needs of the immediate and our goals for the future is real. To function as a society, we need people in charge of what’s happening right now, as well as people in charge of what’s happening in the future. Week to week, ASI manages a long stream of important details: chapter name changes, SIG formations, PR construction, training course materials, administrative shifts, the Philadelphia conference, membership drives, and so on. The board gets a few dozen reports from committees, fourteen chapters, SIGs, and task forces. This is the “ASI of 2006,” focused on bylaws and meetings and content development.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-4697293818207474156?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/4697293818207474156/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=4697293818207474156&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/4697293818207474156'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/4697293818207474156'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/12/asi-presidents-letter-december-2006.html' title='ASI President&apos;s Letter (December 2006)'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-3212611517647692189</id><published>2006-12-16T21:17:00.000-05:00</published><updated>2006-12-18T00:07:18.470-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='information architecture process'/><title type='text'>Seth, a *different* enabler</title><content type='html'>Most coincidentally, taxonomist Seth &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0" onclick="BLOG_clickHandler(this)"&gt;Earley&lt;/span&gt; wrote about the enabling process in his own blog, "Not Otherwise Categorized...." Of course, his entry is less comic, more professional, and thus more meaningful than mine (12 Dec 2006), so it would be a shame for me not to reference it.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://sethearley.wordpress.com/2006/11/07/just-tell-me-the-answer-the-challenge-of-stakeholder-engagement/"&gt;"Just tell me the answer" (blog entry, 7 Nov 2006)&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;There are more enabling &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_1" onclick="BLOG_clickHandler(this)"&gt;Seths&lt;/span&gt; out there than you might have noticed at first.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-3212611517647692189?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/3212611517647692189/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=3212611517647692189&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/3212611517647692189'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/3212611517647692189'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/12/seth-different-enabler.html' title='Seth, a *different* enabler'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-116598657752435047</id><published>2006-12-12T23:56:00.000-05:00</published><updated>2006-12-18T00:07:45.994-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='information architecture process'/><title type='text'>Seth, the enabler</title><content type='html'>I rediscovered the two roles that a consultant can play in business. He can step forward, propose and perhaps implement a solution all by his lonesome, and then walk away; or he can sit quietly to what everyone has to say, push and prod in strategic ways, and get everyone else to do the work for him.&lt;br /&gt;&lt;br /&gt;Here's an analogy. Suppose a friend who needs a resume approaches you for help. "Would you write me resume?" she asks. On approach is to say "yes," ask a couple of questions, and then crank out a complete resume. Handing it to her you say, "Go ahead and make some edits, if you want." There are some wonderful advantages to this process: you get to work on your own, on your own terms, and for a very short period of time. On the other hand, what you're really supposed to do is sit down with your friend and say, "Well, what have you got so far?" Then you ask all sorts of clever questions like, "What do you think you do best?" and "What kind of job do you think you want?" She answers these questions, and as you nod wisely, you tell her to write all that stuff down.&lt;br /&gt;&lt;br /&gt;The greatest part about being an enabler is that you never have to make a decision at all. You're a Freudian psychologist asking all sorts of provocative questions, getting paid by the hour to watch someone else do all the work. The better they do, the better you look.&lt;br /&gt;&lt;br /&gt;I've discovered that being an enabler is the smartest, most lucrative, and most effective way to be a consultant -- but the fact that I never have to make a decision is very interesting. "What do you think? How would you do this? Do you think this would work? Before tomorrow, see if Joe agrees." I'm amazed at the power these kinds of questions have.&lt;br /&gt;&lt;br /&gt;Ask yourself how much enabling you do in your job. I'm starting to realize that helping people do things on their own is more rewarding than doing it myself. Frankly I'm unnerved by this. This wasn't at all what I learned in engineering school.&lt;br /&gt;&lt;br /&gt;But everything I've read says that this is now the right way to do this. Decisions made by people who don't actually use the system are less likely to succeed. Evidence-based practice is about moving forward not on what you think, but on what you know, such as from testing. So yes, it's about asking the right questions, and not about what you know. In fact, psychologists who &lt;em&gt;do &lt;/em&gt;know the answer have to play dumb if they're to succeed.&lt;br /&gt;&lt;br /&gt;If you had told me years ago that the subject matter experts are far less valuable than the subject matter dunces, I'd have said you were full of... what's the word? (I trust your opinion.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-116598657752435047?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/116598657752435047/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=116598657752435047&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/116598657752435047'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/116598657752435047'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/12/seth-enabler.html' title='Seth, the enabler'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-116598577231448937</id><published>2006-12-12T23:41:00.000-05:00</published><updated>2006-12-18T00:09:15.000-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing process'/><category scheme='http://www.blogger.com/atom/ns#' term='cataloguing'/><category scheme='http://www.blogger.com/atom/ns#' term='embedded indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='keywording'/><title type='text'>Indexing moving content</title><content type='html'>Has it been three months? Almost!&lt;br /&gt;&lt;br /&gt;Fact is, the world has a way of throwing curve balls on a regular basis. For me, those curve balls include a family-wide influenza epidemic, teething babies, travel plans, and the like. Trying to keep a grip on life is like trying to catch fish with your hands.&lt;br /&gt;&lt;br /&gt;Tonight I give a presentation about trying to index moving targets. I was surprised to discover that of all the presentations I've ever given, this was absolutely the hardest to write. In fact, I just finished a few minutes ago. I've taught three-day classes, with eight hours of material on each day, but this 45-minute presentation really stymied me. There are two reasons for this.&lt;br /&gt;&lt;br /&gt;First, trying to index moving content is, no matter what, a mess. The simplest example of a problem is creating an index entry like "software development, 111-121," and then finding out that pages 111 and 121 have moved respectively to pages 113 and 123. With standalone indexing (where you type in the page numbers), the only real way to fix this is manually: go back and rewrite all your page numbers. It's a MESS. So here I am, hoping to provide some tips to indexers and technical writers, something to help them avoid these kinds of corrections -- only to realize that there's no good answer. (A bad answer is to not index at all. :-)&lt;br /&gt;&lt;br /&gt;The second problem is that even if I did have a list of useful tools, they don't make for interesting presentation materials. The first draft of my presentation would have resembled a public reading of the weather report for ever American city, in alphabetical order: if you're lucky, you're interested in Albuquerque and Atlanta and can walk out early.&lt;br /&gt;&lt;br /&gt;The fact is, our growing reliable on live and custom information is wreaking havoc on the indexing world. It's becoming harder and harder to collate information in relevant chunks. Search will never do it; even if there were human beings out there developing controlled vocabularies, full-text search still retrieves a tremendous amount of flotsam. But creating keywords for something that won't live an hour seems kind of pointless, too. We're all just pounding sand.&lt;br /&gt;&lt;br /&gt;I'm looking forward to what the participants have to say. Must we accept the false imprisonment of uncatalogued real-time information flow, or will writers finally catch on that indexers have an important role on the creation side as well?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-116598577231448937?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/116598577231448937/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=116598577231448937&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/116598577231448937'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/116598577231448937'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/12/indexing-moving-content.html' title='Indexing moving content'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-115773555289482757</id><published>2006-09-08T12:06:00.000-05:00</published><updated>2006-12-18T00:13:01.271-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing process'/><category scheme='http://www.blogger.com/atom/ns#' term='misspellings and other errors'/><category scheme='http://www.blogger.com/atom/ns#' term='findability'/><category scheme='http://www.blogger.com/atom/ns#' term='privacy'/><title type='text'>Unfindable (a virtue)</title><content type='html'>Indexers work to make things findable, but there's another side to this coin. Indexers also work to make things &lt;b&gt;un&lt;/b&gt;findable.&lt;br /&gt;&lt;br /&gt;An important and often overlooked consequence of the culling process that indexers hone when deciding what should be indexed or labeled, and how, is that every decision an indexer does &lt;em&gt;not&lt;/em&gt; make makes something that more unfindable. In fact, just as there are infinite number of misspellings for any one word, there an infinite number of indexing choice that an indexer can choose. But unlike misspelled words, which by definition are "mistakes," every indexing choice and every keyword is a good keyword in the right context. The information space is too big for mistakes; for each conscious and unconscious inaction, the best we can hope for is "highly unlikely." If we don't do it, perhaps no one will need it.&lt;br /&gt;&lt;br /&gt;There's a sign in a general store in &lt;a href="http://www.lakegeorge.com/"&gt;Lake George, New York&lt;/a&gt;: "If you don't see it, you don't need it." This is the inadvertent motto of all indexers. We can only pray that everything concept we leave unindexed, every word we don't choose, and every relationship we don't articulate is unneeded. Then again, there are an infinity of choices we never even see, aren't there?&lt;br /&gt;&lt;br /&gt;Unfindability is a pandemic, a glorious desert that stretches beyond our senses and imaginings. In today's word of &lt;a href="http://en.wikipedia.org/wiki/RFID"&gt;RFID technology&lt;/a&gt;, in which every object and person (and object-person combination) can potentially be mapped and tracked over an arbitrary length of time, the vast wasteland of unfindability starts to rank up there with a good vacation.&lt;br /&gt;&lt;br /&gt;Let's turn the indexing process around. As indexers, what do we want to make lost?&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;u&gt;Systems of belief that destructively conflict with our own.&lt;/u&gt; Even if you believe in an open exchange of ideas, perhaps the close-minded ideas and people could disappear to make your world a better place. Some indexers face this challenge regularly, being asked to index materials that they disagree with on political or religious grounds, or to use indexing methods that conflict with their professional ethics.&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;&lt;u&gt;Falsehoods.&lt;/u&gt; Wisdom is knowing what you don't know, but what if you believe something that is completely untrue? Certainly there are layers of truth, and ignorance isn't inherently bad (e.g., it drives scientific research and discovery), but what of the urban legends and purposeful deceit that scatters our information space? If they can't be labeled as more than 90% false, shouldn't they just disappear?&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;&lt;u&gt;Hurtful knowledge from which there is no clear benefit.&lt;/u&gt; Constructive truths hurt but spur growth; destructive truths are better left unsaid. We've all had the experience in which someone commented on our selves, perhaps even politely and with good intention; I know a woman who was told she could never excel at tennis without surgery, because her body shape interfered with a good backhand. I'm sure the instructor was trying to helpful, but she's never enjoyed tennis since. If these comments can't be left unsaid, unfound is the next best option.&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;&lt;u&gt;Private information, and the deep past.&lt;/u&gt; Is my street address on the World Wide Web? What about my childhood photos? Medical and financial records? Candid post-mortem comments about my behavior in past relationships? Or should I accept that my life is an open and Google-accessible book? With millions of blog pages being written by today's schoolchildren, what will happen when someday they run for office and the world (re)discovers their underage exploits? There is no expiration date on personal content, something we sometimes regret.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;And finally,&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;&lt;u&gt;Stuff no one needs.&lt;/u&gt; How do indexers judge what nobody needs now, or in the future, under any circumstances? And yet we do it every day at work. We tell ourselves that we can take out our biases, ignoring our beliefs and lives and emotions and ethics and needs, just as so many others become detached emotionally when performing their jobs. But unfindability is inevitable; sooner or later (and usually sooner), our choices will have a direct effect on the abilities of others to find something they want, from historical knowledge to insider secrets, from biographical summaries to photographed nudity. I say that it is our &lt;em&gt;responsibility&lt;/em&gt; to make these decisions, to apply our beliefs and biases as well as our knowledge, to unmap the information space we want left behind.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;As indexers, we shape the worlds that no one sees. Now let's do it on purpose.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-115773555289482757?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/115773555289482757/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=115773555289482757&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115773555289482757'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115773555289482757'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/09/unfindable-virtue.html' title='Unfindable (a virtue)'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-115751179608047285</id><published>2006-09-05T22:00:00.000-05:00</published><updated>2006-12-18T00:09:41.089-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Microsoft Word indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='indexing tools'/><title type='text'>Troubleshooting Microsoft Word indexes</title><content type='html'>I've been archiving various questions people have asked me about Microsoft Word indexes. It's amazing how many things can go wrong, and then can't be easily solved. In response, I've started building a page for troubleshooting Word indexes. You can access that page here:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://taxonomist.tripod.com/indexing/wordproblems.html"&gt;http://taxonomist.tripod.com/indexing/wordproblems.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;If you're having a problem with Word, first know that you're in good company, and then &lt;a href="mailto:seth@maislin.com"&gt;write me&lt;/a&gt; about it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-115751179608047285?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/115751179608047285/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=115751179608047285&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115751179608047285'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115751179608047285'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/09/troubleshooting-microsoft-word-indexes.html' title='Troubleshooting Microsoft Word indexes'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-115721984867126438</id><published>2006-09-02T12:34:00.000-05:00</published><updated>2006-12-18T00:10:02.693-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing humor'/><category scheme='http://www.blogger.com/atom/ns#' term='cross references'/><title type='text'>The salesman's cross reference</title><content type='html'>My wife and I burst out laughing.&lt;br /&gt;&lt;br /&gt;We have a short book called &lt;em&gt;Baby Animals&lt;/em&gt;. (Apparently there are a &lt;a href="http://a9.com/baby%20animals?a=sbooks" target="_blank"&gt;lot of books called Baby Animals&lt;/a&gt;.) It's filled with page-sized photographs of very cute creatures, like these pictures of &lt;a href="http://www.sfondideldesktop.com/Images-Nature/Animals/Baby-Animal/Baby-Animal-0001/Baby-Animal-0001.jpg" target="_blank"&gt;chicks&lt;/a&gt; or &lt;a href="http://www.wildlife-sanctuary.co.uk/baby_rabbits.jpg"&gt;bunnies&lt;/a&gt; or &lt;a href="http://www.joanocean.com/DR05-2/huggin_baby.jpg"&gt;whales&lt;/a&gt;. In fact, there's a cute &lt;a href="http://www.seymoursimon.com/images/Baby_animals_lg.jpg"&gt;baby lion on the cover of our book&lt;/a&gt;. I'm telling you, these baby animals are CUTE!&lt;br /&gt;&lt;br /&gt;After reading cool facts about these babies -- dogs and cats are born blind, chicks eat small stones because they have no teeth, whales drink 100 gallons of milk daily -- we get the inside back cover, where the publisher writes the following:&lt;br /&gt;&lt;br /&gt;&lt;blockquote align="center"&gt;If you liked learning about &lt;strong&gt;Baby Animals&lt;/strong&gt;, you will also enjoy&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Fighting Fires&lt;br /&gt;Giant Machines&lt;br /&gt;Killer Whales&lt;br /&gt;Planets Around the Sun&lt;br /&gt;Wild Bears&lt;/strong&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;What am I missing?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-115721984867126438?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/115721984867126438/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=115721984867126438&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115721984867126438'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115721984867126438'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/09/salesmans-cross-reference.html' title='The salesman&apos;s cross reference'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-115673505841384193</id><published>2006-08-27T20:38:00.000-05:00</published><updated>2006-12-18T00:23:49.611-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='search engines'/><category scheme='http://www.blogger.com/atom/ns#' term='content management'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='power of information'/><title type='text'>Information is owned by the few</title><content type='html'>Consider the history of manufactured items. At one time in United States history, the manufacturer in your neighborhood was the primary source of whatever it made. If &lt;a href="http://www.maytag.com" target="_blank"&gt;Maytag&lt;/a&gt; had a plant in your city, you bought Maytag. There was almost no question of buying a competitor product in a distant city, with reasons ranging from the practical (delivery requirements) to the social (your family was employed there) to the unconscious (you heard about this company every day in the news). Manufacturers were king: they made, you bought (assuming you could afford), and no questions were asked.&lt;br /&gt;&lt;br /&gt;Then came the middlemen, resellers like &lt;a href="http://www.sears.com" target="_blank"&gt;Sears&lt;/a&gt;, who discovered that if you brought a number of competing products into the same show room, customers came to that show room to make an educated decision. No longer convinced to buy from one manufacturer, you could shop among several models. This was how resellers made their money: providing you a service you'd pay extra money for. Products and manufacturers that failed to compete well in side-by-side arrangements were abolished in the face of consumer choice.&lt;br /&gt;&lt;br /&gt;And finally came the Internet. The World Wide Web provided you with not only all the same information the resellers had, but much more: professional and amateur reviews, community-level and industry-specific emails filled with recommendations and warnings, and manufacturers' contact information in case you had questions. Now you could shop intelligently around the world. Much of the resale industry was demolished, now that their services paled in comparison to what consumers could do themselves. Look at the fate of independent bookstores, who all-but-vanished in a wired world where consumers read reviews and compare prices among &lt;a href="http://www.asindexing.org/amazon" target="_blank"&gt;Amazon.com&lt;/a&gt;, &lt;a href="http://www.bn.com" target="_blank"&gt;BN.com&lt;/a&gt;, and &lt;a href="http://www.borders.com" target="_blank"&gt;Borders.com&lt;/a&gt;, only to buy the book from an Internet-based reseller with massively discounted prices. Travel agents, too, disappeared in the face of &lt;a href="http://www.expedia.com" target="_blank"&gt;Expedia.com&lt;/a&gt; and &lt;a href="http://www.travelocity.com" target="_blank"&gt;Travelocity.com&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It is thus believed, therefore, that the Internet has empowered the individual.&lt;br /&gt;&lt;br /&gt;Not true. I'm sorry to say that it's all an illusion.&lt;br /&gt;&lt;br /&gt;First of all, the online resellers are no better than the brick-and-mortar resellers. After browsing the options available at an online travel agency, it's often cheaper to then go to the airline site itself to buy your tickets. For example, if I want to fly from Boston to San Francisco, I'll plug my dates into a search engine at Expedia (and later, Travelocity), find the cheapest option at the best times, and then buy my ticket at &lt;a href="http://www.delta.com" target="_blank"&gt;Delta.com&lt;/a&gt;, &lt;a href="http://www.iflyswa.com" target="_blank"&gt;Southwest.com&lt;/a&gt;, or another airline, or perhaps call a human travel agent after all at &lt;a href="http://www.aaa.com" target="_blank"&gt;AAA&lt;/a&gt; and start again. As long as I have access to the source of that service or product -- the manufacturer, the service provider, etc. -- the reseller is a source of information without sale.&lt;br /&gt;&lt;br /&gt;Second, the online resellers are limited in scope. Thanks to partnerships and other marketing choices, not all of my options are provided. For example, both Expedia and Travelocity tend to overlook small, unaffiliated airlines. Additionally, at one time (and perhaps still today) Expedia charged extra money if I wanted to buy a ticket for &lt;a href="http://www.usair.com" target="_blank"&gt;USAir&lt;/a&gt; flights, without telling me. The bottom line is that going through an online reseller is not necessary more comprehensive or cheaper than my other options.&lt;br /&gt;&lt;br /&gt;But biggest of all, however, is that for me to perform ANY search these days, I'm going to have to use a search engine, like &lt;a href="http://www.google.com" target="_blank"&gt;Google&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Without even getting into problems with spam, search engines are responsible for providing me with the information I'll need to do &lt;em&gt;anything&lt;/em&gt; on the Web, if I don't already know precisely how and with whom to do it myself. Google is the next Sears. If I wanted to find some good choices for a boy's name, Google will provide me with so many choices that I'll inevitably stop after the first twenty (and more likely, stop after three). Google is filtering my search, valuing some choices above others just as my supermarket creates end-of-aisle displays to sell me things. The only difference is that I &lt;em&gt;know&lt;/em&gt; the supermarket makes money from the sale. With search engines, you have no way of guaranteeing you're not clicking on a link the search engine company prefers.&lt;br /&gt;&lt;br /&gt;Consider the &lt;a href="http://images.military.com/Data/MIB/Finance/Pics/car_salesman.jpg" target="_blank"&gt;unscrupulous used car salesman&lt;/a&gt;. Let's step through the process.&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;I approach the salesman asking a simple question: "I want a reliable automobile for a good price."&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;The salesman immediately points out a few models. The first one he shows me is way too expensive. The second one is terrible. In comparison, the third one he shows me seems wonderful at first glance, but then I ask more questions.&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;The salesman doesn't give me precisely the information I want. Some of his answers sound ridiculous. He's reluctant to show me any more cars. But when I keep pushing, he finally gives in and shows me a fourth car, without much enthusiasm.&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Finally, I ask for specific kinds of cars, things I've heard rumors about. "What about a &lt;a href="http://www.edmunds.com/media/seo/500/2006.toyota.sienna.jpg" target="_blank"&gt;Toyota Sienna&lt;/a&gt;? Is there a good &lt;a href="http://images.consumerguide.com/autoreview/400x266/2006-Ford-Freestar-06114541990001.JPG" target="_blank"&gt;Ford minivan&lt;/a&gt;?" The salesman is completely unhelpful. Clearly this was a terrible place to come shopping. Maybe I'll visit some dealers, or talk to my neighbor.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;Let's compare this to a Google search for boys' names. I choose Google here because it's currently a very popular search engine that, people seem to believe, does an honest job in helping people search both online and offline content.&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;br /&gt;&lt;li&gt;I start with a simple request: "I want to find a good boy's name." My query is "boys names."&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Google gives me some immediate results. Some of them are immediately terrible and can be skipped over, but it doesn't take long to find something promising. I visit the website and, although looks like what I want might be there, I have a hard time using it. I decide to give up and return to Google and its search result list.&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;I try a second website, but I've lost confidence. Maybe it's not Google's fault in any obvious way, but none of these websites is helping me in the way I want to be helped.&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;I decide to try some new queries. Maybe "boy names"? Do I need an apostrophe? Or perhaps, because I'm interested in a boy's name that isn't too ethnically different from the names I know in the United States, I should try a search like "American boy names." Unfortunately, my search choices are even worse. I give up. The Web is a terrible place to search for boy names. I'll try the bookstore.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;You see? No practical difference.&lt;br /&gt;&lt;br /&gt;You might think this exercise was a bit silly, but I'm not wrong. The people, companies, or machines that control what you want are the same entities that control the process. The car salesman controls which cars you buy; even if you trust him, the process is &lt;em&gt;his&lt;/em&gt;, not yours. He's just nice about it. The same is true with Google. Sure, we all tend to trust Google -- and what's not to trust or like -- but we do &lt;strong&gt;not&lt;/strong&gt; own the information-seeking process. Google owns it. Here's why:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Not only doesn't Google find everything, it doesn't tell you there are things missing.&lt;/strong&gt; The Google database isn't as up-to-date as the Web. Your search words don't match every relevant result in every relevant language. Sure, it looks as if there are 2,600,000 hits for you search, but that doesn't mean it found everything. What's more, you can't even see all 2,600,000 hits if you wanted to! Google shuts you out after only a few hundred.&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;Google doesn't explain what it's doing, or why.&lt;/strong&gt; The search algorithm is never explained; it's a patent secret. We know what kinds of ingredients go into the mix, but we don't know the precise details. And although sponsored links appear separate from search results -- something not all search engines do -- we have no certainty that there are some other sponsorships happening in there.&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;&lt;strong&gt;If Google is biased, we have no way of knowing.&lt;/strong&gt; I guarantee Google is biased, because its algorithm is based on how people use the Web. Google News collects stories more often from the &lt;a href="http://www.ap.org" target="_blank"&gt;AP Wire&lt;/a&gt; than the &lt;em&gt;&lt;a href="http://www.boston.com" target="_blank"&gt;Boston Globe&lt;/a&gt;&lt;/em&gt;, and more often from the &lt;em&gt;Globe&lt;/em&gt; than the &lt;em&gt;Arlington Tab&lt;/em&gt;. That's well-intentioned bias. There are less favorable biases, too, like social biases. Because there are fewer computer users who are poor or homeless, the websites of interest to these people never show up at the tops of list. Because the Google default language in the United States is English, U.S.-based news articles are far favored over newspapers in other countries, even when the news takes place in those countries. And because most people have heard of large companies like Amazon.com, smaller companies like independent booksellers are pushed into obscurity. There are also language-based biases. It's easier to find websites related to &lt;em&gt;money&lt;/em&gt; because this word is both singular and plural, whereas &lt;em&gt;finance&lt;/em&gt; has a plural form. It's easier to search for words like &lt;em&gt;mistress&lt;/em&gt; and &lt;em&gt;misogyny&lt;/em&gt;, which exist, than for the nonexistent gender-opposite versions. And it's nearly impossible to find a company that sells windows because your search results will be overwhelmed by companies that sell [Microsoft] Windows.&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;But we don't have a choice. There is too much information in the world. We must go through an information repackager if we're not going to do the work ourselves. (Librarians do the work themselves; the results are of excellent quality, of limited quantity, and of almost negligible relevance for our day-to-day needs of airline tickets and boys' names. Libraries have some excellent information with which we can arm ourselves -- like using &lt;em&gt;&lt;a href="http://www.consumerreports.org" target="_blank"&gt;Consumer Reports&lt;/a&gt;&lt;/em&gt; to choose a quality used car -- but in general we still have to take the final steps on our own.)&lt;br /&gt;&lt;br /&gt;Regardless of their motives, search engines OWN the information access. Maybe that's good enough. Maybe you're comfortable performing your searches in ignorance of the engine's inner workings, generally satisfied with the results most of the time. But &lt;em&gt;please&lt;/em&gt;, that doesn't make it a good thing. What if Google started charging you for some of your searches? What if Google integrated its sponsored links into the search engine (as &lt;a href="http://www.lycos.com" target="_blank"&gt;other engines&lt;/a&gt; did or do)?&lt;br /&gt;&lt;br /&gt;Here's a real-life, immediate example. Search for &lt;em&gt;Pluto&lt;/em&gt;. There has been a ton of recent press regarding Pluto's demotion as a planet in our solar system. &lt;em&gt;Where is all that news in the search results page?&lt;/em&gt; There's just a tiny news area that most people won't see because it looks different, and then there's a bunch of sponsored links. This is a branding decision; Google thinks "&lt;a href="http://news.google.com" target="_blank"&gt;news&lt;/a&gt;" and "sites" are very different things and doesn't even combine their results.&lt;br /&gt;&lt;br /&gt;Don't kid yourself. The power of the Internet has moved, but not to you.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-115673505841384193?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/115673505841384193/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=115673505841384193&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115673505841384193'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115673505841384193'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/08/information-is-owned-by-few.html' title='Information is owned by the few'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-115455170348654721</id><published>2006-08-02T15:44:00.000-05:00</published><updated>2006-12-18T00:11:52.545-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='human factors'/><category scheme='http://www.blogger.com/atom/ns#' term='hierarchical organization'/><category scheme='http://www.blogger.com/atom/ns#' term='training in indexing'/><title type='text'>We're lost without an information education</title><content type='html'>A few years ago, one chapter of the &lt;a href="http://www.asindexing.org"&gt;American Society of Indexers&lt;/a&gt; created a bumper sticker: "If you don't teach your children about indexing, who will?" Now that my daughter is old enough to repeat words if I ask her to -- "Say &lt;i&gt;cucumber&lt;/i&gt;." "&lt;i&gt;Kuku&lt;/i&gt;." -- I tried one of my job words on her.&lt;br /&gt;&lt;br /&gt;"Say &lt;i&gt;index&lt;/i&gt;," I said.&lt;br /&gt;"&lt;i&gt;Icks&lt;/i&gt;," she replied.&lt;br /&gt;&lt;br /&gt;Given how my colleague Rachel taught her three-year-old to recite how bad a book is if it doesn't have an index, it seems I have some work to do. My daughter shouldn't respond to indexing with &lt;i&gt;icks&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;In the United States, children learn about indexes when they are old enough to visit the school library and get instruction on how to use its resources. And while many of the printed card catalogs of my youth have been replaced with computer systems, students are still taught how to use the indexes in the backs of some books. After that, their indexing education is complete. They probably never talk about indexing with the librarian again.&lt;br /&gt;&lt;br /&gt;Though brief, even this index education is extremely important. Instinctively, children unfamiliar with indexes will look up information just as adults use a dictionary to look up spellings. For example, if you think &lt;i&gt;deceive&lt;/i&gt; is spelled &lt;i&gt;decieve&lt;/i&gt;, you'll go to the dictionary to look up &lt;i&gt;decieve&lt;/i&gt;. Not finding it, you'll look for a neighboring word that looks somewhat similar, and discover the correct "deceive." In other words, you'll enter the dictionary looking for one word, but be satisfied with another. This is how children use indexes, too. They'll look up "Civil War," not find it, and be satisfied with "civil engineering." Then, of course, they'll fail.&lt;br /&gt;&lt;br /&gt;(It is worth noting that adults demonstrate this behavior with indexes, too. I might attempt to look up "potatoes" in a cookbook, yet be satisfied with a result of "&lt;a href="http://www.foodreference.com/html/art-sweet-potato-yam.html" target="_blank"&gt;potatoes and yams&lt;/a&gt;.")&lt;br /&gt;&lt;br /&gt;Meanwhile, &lt;em&gt;adults&lt;/em&gt; don't instinctively understand the metaphor of things inside things inside things. The well-known &lt;a href="http://www.shene.richmond.sch.uk/Images/Russian%20Taster%20037.jpg" target="_blank"&gt;marushka dolls&lt;/a&gt;, in which a large bowling-pin-shaped doll holds a smaller doll that holds another doll, and so on, is endlessly fascinating for children. As adults, we're fascinated by the plots to suspense novels. Each step along our way -- an uncovered doll, a turned page -- is built upon the past in a linear way. We follow events, from first to last, in linear sequence, and we succeed.&lt;br /&gt;&lt;br /&gt;Hierarchical organization, in contrast, has no obvious place in human existence. To survive, it's enough to separate things into only two groups at a time: dangerous vs. safe, edible vs. inedible, alive vs. dead, something we like vs. something we don't like, family vs. nonfamily. As intelligent creatures we might create a few more categories at a time -- family, co-workers, non-work friends, acquaintances, strangers -- but rarely do we construct them into layers like "people I know &gt; people I like &gt; people I like to work with." Layering is completely unnecessary in our daily lives. Perhaps it is for this reason that human beings cannot instinctively organize things in a hierarchical way -- in the same way we can't tell the (very big) spatial difference between one million miles and one billion miles. To do these things, we need training.&lt;br /&gt;&lt;br /&gt;You know, we don't do math naturally, either. Our instincts tell us the difference between one item, two items, a few items, many items, and very many items, but that's it. We also understand more and fewer. But we don't have an instinct that tells us how to add or multiply, let alone solve calculus problems. (If you don't believe me, then I dare you to cut a pizza or a cake into five equal slices without making a mistake.)&lt;br /&gt;&lt;br /&gt;Today, we have math classes. Before math was taught as its own course, certain elements of math were taught within specific subjects. Shipbuilders and shoemakers learned enough math to do their jobs, and that was it. The idea of teaching math independent of application must have seemed very strange. What good is shipbuilders' math to shoemakers? But eventually, the math-proficient individuals in each field spoke to one another and discovered exactly what they had in common: a need to add numbers together, a need to calculate weight, and a need for geometry. Now math is an integral part of standardized testing, which means students aren't allowed to graduate from school without proving themselves in basic math skills, separate from their application.&lt;br /&gt;&lt;br /&gt;So why aren't we teaching information the way we teach math? Information classification exists in every field of human exploration, from literature (divisions of author style or message) to sales (styles of negotiation), and from biology (life classifications) to auto mechanics (systems of function). If a student is going to learn anything about anything, he should learn a little something about how information itself fits together.&lt;br /&gt;&lt;br /&gt;The impact a basic, application-independent information education can have is astounding. As an example, consider driving directions. In general, we give directions to people in a linear order, something that makes sense given how we travel. Here is how you can get to the post office near my home: "(1) Take route 95 until exit 26. (2) Take route 2 East until exit 59. (3) Take route 60 into Arlington Centre. (4) Turn left onto Massachusetts Avenue. (5) After three blocks, turn right onto Court Street. (6) The post office is on your left at the end of the street." As I said before, you don't need information hierarchy to survive; following these linear directions is quite easy. But suppose you make a wrong turn, or miss your exit? To find your way back to the path I provided, you need to know something about the geographic layers that make up these regions: "greater Boston &gt; north Boston suburbs &gt; town of Arlington &gt; Arlington Centre area &gt; Court Street." You need a hierarchical knowledge of the area! Put another way, what many of us refer to as "a great sense of direction" is actually "a deep understanding of relevant geographical hierarchies." That's why someone who knows their way around New York City will get lost in the woods: they learned how NYC streets fit together (NYC &gt; Manhattan &gt; Upper East Side &gt; etc.) but learned nothing about forests. Get my point? Sense of direction is taught and learned.&lt;br /&gt;&lt;br /&gt;It's time for us to start teaching information construction in schools. We're lost without it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-115455170348654721?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/115455170348654721/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=115455170348654721&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115455170348654721'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115455170348654721'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/08/were-lost-without-information.html' title='We&apos;re lost without an information education'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-115257798851182298</id><published>2006-07-10T19:16:00.000-05:00</published><updated>2006-12-18T00:19:56.906-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='human factors'/><category scheme='http://www.blogger.com/atom/ns#' term='business of indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='search engines'/><title type='text'>Meta-indexing</title><content type='html'>Many indexers in the business are frustrated that search engines are getting so much publicity and credit, when in fact indexes are, well, more effective!&lt;br /&gt;&lt;br /&gt;Usability testing has shown that people prefer indexes over search when it comes to accuracy and comprehensiveness, and yet the same tests show that people prefer search as a technology. In other words, people prefer search engines even though they unanimously agree that indexes are more accurate. This is not that different from the person who insists on lifting the heavy box himself, even though someone stronger and better able has volunteered. "No, no, I'll do it myself," says the searcher.&lt;br /&gt;&lt;br /&gt;And then he injures himself. Silly, silly person.&lt;br /&gt;&lt;br /&gt;What is at stake, apparently, is something more psychological or emotional. Search engines may offer users a sense of power and control, or a sense of speed, that indexes don't. Further, indexes seem so much more complicated when you glance at them -- words, words everywhere -- and in comparison search is so simple: an empty box. Just type a word and bingo! If you were to stop and look at this behavior you'd realize that there's something subsconscious going on; rationality is losing to some deeper sense of emotionality and self. Search simply &lt;em&gt;feels &lt;/em&gt;right in a way that using an index does not, at least not instinctively.&lt;br /&gt;&lt;br /&gt;Some indexers take this news with a strong sense of pessimism, seeing this "shift toward the emotional" as paralleling our current lifestyle of sensationalist news and entertainment. They believe that indexes will become extinct in most practical circumstances, because search engines are psychologically preferred -- not to mention faster, cheaper, online, and scalable.&lt;br /&gt;&lt;br /&gt;These pessimists aren't wrong.&lt;br /&gt;&lt;br /&gt;However, I contend that the pessimists are also looking at the situation completely upside-down. Ask yourself what makes a search engine effective or likeable at all -- that is, what does Google have that seems to draw a majority of Web users not only to the Google.com website but also to license Google technology at their own sites -- and you'll realize that there's indexing on the back end. People don't call it "indexing," necessarily, but the intellectual, rational processes that comprise indexing are still taking place.&lt;br /&gt;&lt;br /&gt;The difference, however, is that a search company like Google doesn't really look at the individual words and their instances. Instead, the designers of Google search (and other tools) are looking at how people respond to these words. They are looking at behavioral patterns, and using those patterns to do the indexing for them.&lt;br /&gt;&lt;br /&gt;My brother and I used to play a game at ballparks. One of us, when it was his turn, would attempt to turn as many heads as possible without speaking. My brother would turn his head and look over his shoulder casually, then allow his eyes to lock on something imaginary but far behind all the people sitting behind us. He'd tap my on the shoulder and get me to look; I'd play along. He'd point. I'd point, and he'd correct me. Then he'd stand up. And so on. After a while, some of those people who can see us directly in front of them would be curious to know what we're looking at, and they'd turn their heads to see. This would inspire other people to turn their heads, and so on. If we'd done our job well -- it was a game of timing as well as body language -- we could get hundreds of people to look behind them, at nothing.&lt;br /&gt;&lt;br /&gt;This kind of behavior explains the popularity of some really stupid websites. Get enough people to visit your website, and Google will acknowledge that there's something about this website worth looking at. Then more people will look at it. Internet-based fads occur weekly, from paparazzi photos to cool advertisements.&lt;br /&gt;&lt;br /&gt;If a human were indexing this, the indexer might think, "This isn't so important that it needs to be found a million times." That human is right. But the meta-human looks at what all the humans are already doing and thinks, "There is a cultural need for this content."&lt;br /&gt;&lt;br /&gt;For a back-of-the-book indexer to break into the world of mass search, he'll have to give up the words and instead figure out the rules -- linguistic as well as social -- behind how these words are being used. Those rules, which govern &lt;em&gt;how &lt;/em&gt;we find things (and not &lt;em&gt;what &lt;/em&gt;we find), don't describe the indexing we know at all.&lt;br /&gt;&lt;br /&gt;If indexing as we indexers know it is going to survive, we'll have to find that nifty middle ground between the words and the people. It should be easy, given that we already do this, but so far we haven't managed to break into this field at all. Hopefully we'll evolve.&lt;br /&gt;&lt;br /&gt;In the next generation, we'll index the indexes.&lt;br /&gt;&lt;br /&gt;(Brian Pinkerton developed the first full-text retrieval search engine back in 1994. "Picture this," he explained. "A customer walks into a huge travel outfitters store, with every type of item, for vacations anywhere in the world, looks at the guy who works there, and blurts out, 'Travel.' Now where's that sales clerk supposed to begin?")&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-115257798851182298?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/115257798851182298/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=115257798851182298&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115257798851182298'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115257798851182298'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/07/meta-indexing.html' title='Meta-indexing'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-115203610333495651</id><published>2006-07-04T11:04:00.000-05:00</published><updated>2006-12-18T00:12:27.398-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing process'/><category scheme='http://www.blogger.com/atom/ns#' term='human factors'/><category scheme='http://www.blogger.com/atom/ns#' term='misspellings and other errors'/><title type='text'>The detailed analysis of indexing mistakes</title><content type='html'>In linguistics, the analysis of error is one means of learning how we cognitively process language. For example, when someone accidentally misspeaks "unplugged the phone" as "un&lt;strong&gt;ph&lt;/strong&gt;ugged the &lt;strong&gt;pl&lt;/strong&gt;one," we discover that both the speaker is a visual learner (because he switched the &lt;em&gt;P &lt;/em&gt;blends in the phrase, despite their different sounds) and that the speaker processes language in its component sounds. On the contrary, a speaker who says "un&lt;strong&gt;phone&lt;/strong&gt;d my &lt;strong&gt;plug&lt;/strong&gt;" processes language in morphemes (&lt;em&gt;e.g., &lt;/em&gt;root words), and a speaker who says "unplugged my &lt;strong&gt;feet&lt;/strong&gt;" is an aural learner (because &lt;em&gt;phone &lt;/em&gt;and &lt;em&gt;feet &lt;/em&gt;start with the same &lt;em&gt;f&lt;/em&gt; sound). There seems to be an infinity of spoken-language errors possible, including absences, duplications, inclusions, misalignments, substitutions, and transpositions of letters, sounds, morphemes, words, and phrases.&lt;br /&gt;&lt;br /&gt;When I evaluate an index, my job is to look for mistakes. As a now-experienced indexer who himself has made mistakes, I know that I can learn much about how an indexer thinks (or doesn't think) by analyzing her errors and accidents. And as with speech, there are innumerable kinds of mistakes available for the unwary indexer: absences, duplications, inclusions, misalignments, misrepresentations, and missortings of page numbers, letters, words, structures, and ideas.&lt;br /&gt;&lt;br /&gt;Consider the incorrect page number, such as when content on page 42 is indexed as if it were on page 44. This kind of error tells us that the indexer did not attend properly to detail, perhaps because the working environment (deadlines, tools, etc.) was less than ideal. When a page range appears simplified to a single number, such as when &lt;em&gt;42-45 &lt;/em&gt;appears simply as &lt;em&gt;42,&lt;/em&gt; I am more likely to consider the indexer lazy instead of scatterbrained, though again it is also possible to blame the working environment (including client demands).&lt;br /&gt;&lt;br /&gt;Entries that appear in an index but have no value to readers (&lt;em&gt;e.g., &lt;/em&gt;the inclusion of passing mentions and other trivia) demonstrate the indexer's ignorance of the audience, or of the indexing process itself. Entries that fail to appear in an index but should (&lt;em&gt;e.g., &lt;/em&gt;the under-indexing of a concept) demonstrate either the indexer's ignorance of the audience, the indexer's ignorance of the subject content, or a sloppy or otherwise rushed working process.&lt;br /&gt;&lt;br /&gt;Awkward categorizations, such as entries that are mistakenly combined or that doesn't relate well to their subentries, are a clear sign that the indexer misunderstands the content or is too new to indexing to understand how structure is supposed to work. For example, an indexer who creates&lt;br /&gt;&lt;br /&gt;American&lt;br /&gt;....&lt;em&gt;Idol &lt;/em&gt;(television program), 56&lt;br /&gt;....Red Cross (organization), 341&lt;br /&gt;&lt;br /&gt;doesn't think of indexing as a practice of making ideas accessible, but rather as a concordance of words without meaning. Under no circumstances should &lt;em&gt;American Idol &lt;/em&gt;or American Red Cross have been broken into halves, let alone combined. Since categorization can be subtle, however, evaluators can learn something interesting about indexers by looking closely at their choices:&lt;br /&gt;&lt;br /&gt;writing&lt;br /&gt;....as artistic skill, 84&lt;br /&gt;....fiction vs. nonfiction, 62&lt;br /&gt;&lt;br /&gt;In this example, the first subentry defines &lt;em&gt;writing &lt;/em&gt;as a trade; it's clear the indexer is comfortable with the idea of a writer. The second subentry defines &lt;em&gt;writing &lt;/em&gt;as a process, with a start and finish, such that the process (or journey) of writing could be different when you're writing fiction instead of nonfiction. Analysis of this entry tells us that the indexer doesn't recognize or appreciate the difference between &lt;em&gt;writing (trade) &lt;/em&gt;and &lt;em&gt;writing (process). &lt;/em&gt;Is the indexer revealing her inner disdain for writers, does she believe that all writers are the same no matter what they produce, or does she simply know nothing about the writing life?&lt;br /&gt;&lt;br /&gt;One of the big challenges for indexers is to provide the language that readers will need to find the content they're looking for. When an indexer either offers language that no one will look up or omits the terms that readers prefer, she is demonstrating an ignorance of the audience or of the content, or hinting that the overall indexing process or environment is inadequate. Further, when the indexer fails to provide access from an already existing category entry (for example, if the index has an entry for "writing, fiction vs. nonfiction" but fails to provide the cross reference "&lt;em&gt;See also &lt;/em&gt;author" when there are &lt;em&gt;author &lt;/em&gt;entries), she tells us clearly that she is unfamiliar with the material. No other combination of errors speaks of subject ignorance as clearly; by failing to connect existing concepts, the indexer shows us gaps in her knowledge of the information map.&lt;br /&gt;&lt;br /&gt;There are several kinds of text errors. Misspellings and other typographical errors are a sign of carelessness or insufficient tools. Accidental missortings are a sign of ignorance, poor tools, accelerated schedules, or a failure of communication among publication staff. Ambiguous terms that aren't clarified are caused by indexers who are too limited in their thinking or their assumptions about the audience, indexers who don't know the material, and authors who failed to communicate the ideas clearly enough for the indexer to understand. Finally, odd grammatical choices usually signal a poor production process, such as when two indexes are combined automatically with insufficient editing effort, or a brand new indexer with no formal training.&lt;br /&gt;&lt;br /&gt;Before concluding, I would be amiss to ignore errors of formatting. A failure to use consistent styles signals a deficit in tools or attention, whereas awkward or unreadable decisions regarding indentations, margins, and column widths are a big sign that the index designer (who is not necessarily the indexer) has no clear idea whatsoever how indexes work. Missing continued lines communicate the same thing. (On the other hand, exceptional use of formatting, such as the isolated use of italics within a textual label, is a clear sign that the indexer really does understand both the audience and how they approach the index.)&lt;br /&gt;&lt;br /&gt;Ignorance, sloppiness, indifference, and confusion: these are shortcomings even a professionally trained, experienced indexer might have, but thankfully they often manifest as isolated exceptions in her practice of creating quality work. But when a single kind of mistake appears multiple times throughout an index -- numerous misspellings, huge inconsistencies of language, globally insufficient access, awkward structures -- we need to be concerned. When we see these, we have an obligation to analyze the indexer. By properly arming ourselves with this knowledge, we can determine for ourselves if the indexer was the wrong choice for a particular project, struggled with the challenges of inferior tools, or simply had a bad day.&lt;br /&gt;&lt;br /&gt;Meanwhile, if indexes written by different indexers are plagued by the same exact problem, it's unmistakably clear that the problem is in the systemically faulty publication process: ridiculous deadlines, uncooperative authors, uncaring editors, poor style guides, and so on. In other words, you shouldn't evaluate indexes in isolation. Instead, look at the work of other indexers for the same publisher, as well as the work of other publishers by the same indexer.&lt;br /&gt;&lt;br /&gt;Okay, but what if the index is essentially perfect, with no errors at all? Can we still learn something? Yes, we can. The absence of all error tells us something &lt;em&gt;very &lt;/em&gt;important about the indexer: She's being underpaid.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-115203610333495651?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/115203610333495651/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=115203610333495651&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115203610333495651'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115203610333495651'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/07/detailed-analysis-of-indexing-mistakes.html' title='The detailed analysis of indexing mistakes'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-115132889996615918</id><published>2006-06-26T08:31:00.000-05:00</published><updated>2006-12-18T00:17:50.997-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='books'/><title type='text'>With respect to our loved ones</title><content type='html'>There are many who believe that books -- and by this I refer to the &lt;a href="http://www.everypoet.com/images/old-book.jpg" target="_blank"&gt;traditional object of bound paper and print&lt;/a&gt; -- are sacred objects. &lt;a href="http://occr.ucdavis.edu/html/author.html"&gt;Anne Fadiman&lt;/a&gt; wrote in &lt;em&gt;&lt;a href="http://www.amazon.com/gp/product/0374527229/qid=1151328593/sr=2-1/ref=pd_bbs_b_2_1/104-0428854-4355134?s=books&amp;v=glance&amp;amp;n=283155"&gt;Ex Libris&lt;/a&gt;&lt;/em&gt; that books are often marketed as if they were toasters, and yet remembered as if they were friends. Certainly a block of paper, ink, and glue would not hold such an esteemed place in our hearts if there were nothing transcendent about it. The way in which we archive old books on our shelves because of the memories they inspired in us, whether as a favorite book from early childhood or an intellectual realization from our older years, is not unlike the way a museum places found bones under glass. Unlike the skeletal remains of an ancient animal, however, our old books are neither unique nor unused nor truly old.&lt;br /&gt;&lt;br /&gt;The loss of such a book -- from spilled juice, from a disrespectful borrower, from a forgetful moment on a bus -- can be &lt;a target="_blank" hef="http://www.bodley.ox.ac.uk/dept/preservation/information/emcp/disaster.jpg"&gt;devastating&lt;/a&gt;. Almost all titles can be repurchased, in some cases with benefits like a new introduction by the author, an improved detail of scholarly footnotes, or a respectful commentary written with the benefit of time. Rarely, though, is it the corporeal book itself that brings us such pleasure. Had the book been empty, like those ubiquitous writing books and diaries sold at the checkout displays of almost every bookstore, its loss would have gone mostly unnoticed, valued at approximately the retail cost printed on its back cover. No, it is the content that brings us pleasure, with its memories of having been explored.&lt;br /&gt;&lt;br /&gt;Content is what makes our books cherished items. Porcelain figures, music boxes, ticket stubs, and toy animals have memories but no inherent content, whereas books can be reopened, reread, and rediscovered. Even when the words don't change -- and they rarely do -- the experience of seeing them with changed eyes and minds is different each time.&lt;br /&gt;&lt;br /&gt;Listen, friends, for this is the romantic side to indexing.&lt;br /&gt;&lt;br /&gt;As readers -- gentle, voracious, impulsive, or any other adjective that best defines the nature of your reading relationships -- we have an obligation to provide access to these memories, past and future. Even if we do not write, we must endow the writings of others with every tool at our disposal. We cannot guarantee that any single book won't get lost in the attic or destroyed by fire, but we can, as indexers, guarantee that every important sentence within is flagged with accuracy and passion. We can, in the end, turn the writings of others into useful thoughts, moments of learning, and renewable tools for discovery and self-discovery.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-115132889996615918?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/115132889996615918/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=115132889996615918&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115132889996615918'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/115132889996615918'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/06/with-respect-to-our-loved-_115132889996615918.html' title='With respect to our loved ones'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114800758233252241</id><published>2006-05-18T21:55:00.000-05:00</published><updated>2006-12-18T00:18:50.159-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing process'/><category scheme='http://www.blogger.com/atom/ns#' term='human factors'/><category scheme='http://www.blogger.com/atom/ns#' term='hierarchical organization'/><category scheme='http://www.blogger.com/atom/ns#' term='misspellings and other errors'/><title type='text'>Bias in indexing</title><content type='html'>The greatest advantage that indexing processes have over automated (computer-only) processes is the human component. Of course, as someone who has worked with humans before, you probably recognize there can be imperfections.&lt;br /&gt;&lt;br /&gt;I was reading &lt;a href="http://www.amazon.com/gp/product/0002007916/qid=1147443044/sr=2-2/ref=pd_bbs_b_2_2/104-0428854-4355134"&gt;Struck by Lightning: The Curious World of Probabilities&lt;/a&gt; earlier this week, in which the author writes of biases in scientific studies. I realized that these same biases occur with indexes and indexers as well, and I wondered if I can list them all.&lt;br /&gt;&lt;br /&gt;(The biggest bias in indexing isn't one of the index at all, but rather the limitations on what the authors write. For example, if a book on art history didn't include information about &lt;a href="http://a9.com/Vincent+van+Gogh?factid=34345"&gt;Vincent van Gogh&lt;/a&gt;, I would expect van Gogh to be missing from the index; this absence might be caused by an author bias. However, I am going to focus on biases that affect indexing decisions themselves.)&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Inclusion bias. &lt;/em&gt;Indexers may demonstrate a bias by including more entries related to subjects that appear more interesting or important to that indexer. For example, I live in &lt;a href="http://www.cityofboston.gov/"&gt;Boston&lt;/a&gt;, and so I might consider Boston-related topics to be less trivial (more important) than the average indexer; consequently, documentation that includes information about Boston is more likely to appear in my index. I imagine inclusion bias is a common phenomenon in documentation that includes information about contentious social issues -- &lt;a href="http://news.google.com/news?hl=en&amp;lr=&amp;amp;c2coff=1&amp;rls=GGLD%2CGGLD%3A2004-27%2CGGLD%3Aen&amp;amp;amp;amp;amp;ct=title&amp;ie=UTF-8&amp;amp;q=immigration"&gt;immigration&lt;/a&gt;, &lt;a href="http://news.google.com/news?hl=en&amp;lr=&amp;amp;c2coff=1&amp;rls=GGLD%2CGGLD%3A2004-27%2CGGLD%3Aen&amp;amp;amp;amp;amp;ct=title&amp;ie=UTF-8&amp;amp;q=tobacco"&gt;tobacco legislation&lt;/a&gt;, &lt;a href="http://news.google.com/news?hl=en&amp;lr=&amp;amp;c2coff=1&amp;rls=GGLD%2CGGLD%3A2004-27%2CGGLD%3Aen&amp;amp;amp;amp;amp;ct=title&amp;ie=UTF-8&amp;amp;q=energy+policy"&gt;energy policy&lt;/a&gt; -- because the drive to communicate one's ideas on these issues is stronger. I also believe that inclusion bias is not entirely subconscious, and that indexers may purposefully choose to declare their ideas with asymmetric inclusion. It should be noted, however, that biased inclusion would not necessarily provide insight into the indexer's opinion on the subject; creating an entry like "death penalty morality" does not clearly demonstrate whether the indexer actually disagrees with capital punishment.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Noninclusion bias. &lt;/em&gt;Similar to inclusion bias, indexers might feel that certain mentions in the text are not worth including in the index because of their personal interests or beliefs. Unlike inclusion bias, however, I suspect noninclusion bias does not appear in regards to contentious issues; conflict is going to be indexed as long as the indexer recognized the conflict has value. Instead, an indexer is likely to exclude things that "seem obvious"; rarely are these tidbits of information controversial. For example, an indexer who is very familiar with computers is likely to exclude "obvious computer things," subjectively speaking; you probably won't find "keyboard, definition of" in such a book.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Familiarity (unfamiliarity) bias. &lt;/em&gt;When an indexer is particularly interested in or knowledgeable about a subject, the indexer is likely to create more entry points for the same content than another indexer might. For example, an indexer who is familiar with "Rollerblading" might realize that &lt;a href="http://www.rollerblade.com/"&gt;Rollerblade&lt;/a&gt; is a brand name, and that the actual items are called inline skates. This indexer is more likely to include "inline skates" as an entry. &lt;em&gt;Unfamiliarity bias &lt;/em&gt;would be opposite, in that multiple entry points are not provided because the indexer doesn't think of them, or perhaps doesn’t know they exist.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Positive value bias. &lt;/em&gt;An indexer who has reason to make certain content more accessible for readers to find (and read) is likely to create more entry points for that idea. At the extreme, the indexer will overload access by using multiple categorical and overlapping subtopics, where those subcategories are at a higher granularity than the information itself. For the generic topic of "immigration," for example, an indexer might include categorical entries like "Hispanic immigrants," "European immigrants," and "Asian immigrants," as well as overlapping topics like "Asian immigrants," "Chinese immigrants," and "Taiwanese immigrants," with all of them pointing to "immigration" in general.&lt;br /&gt;&lt;br /&gt;There are three types of positive value bias. &lt;em&gt;Personal positive value bias &lt;/em&gt;is demonstrated when the indexer himself believes that the information is of greater-than-average value. &lt;em&gt;Environment-based positive value bias &lt;/em&gt;is demonstrated when the index is swayed by environment forces, such as social pressures, political pressures, pre-existing media bias, and so on. Finally, &lt;em&gt;other-based positive value bias&lt;/em&gt; is demonstrated when the index bows to pressures imposed by the author, client, manager, or sales market (i.e., the person paying the indexer for the job). Although it can be argued that this last type of bias is not the indexer's bias, strictly speaking the indexer can choose to fight any bias forced upon him. For example, a client who instructs the indexer to "index all the names in this book" might interpret this instruction as some kind of market bias, and thus refuse to follow this guideline. In reality, however, most indexers do accept the pressures placed upon them by the work environment, and thus in my opinion take on the responsibility and ethical consequences of this choice.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Negative value bias. &lt;/em&gt;It's possible for an indexer to provide fewer entry points for content that he feels is not of great importance to readers -- the direct opposite of positive value bias -- but the reasons for limiting access to that content are probably not related to indexer's perceived value of that content for readers. Instead, indexers are likely to limit access to content when there is a significant amount of similar content in the book, and as such including access to those ideas would either bulk up the index unnecessarily or waste a lot of the indexer's time. For example, if an indexer were faced with a 40-page table of computer terms, it's unlikely that each term would be heavily indexed, even if such indexing were possible and even helpful to readers.&lt;br /&gt;&lt;br /&gt;For this reason, I believe that there are three kinds of negative value bias: &lt;em&gt;time-based negative value bias, &lt;/em&gt;in which the indexer skimps on providing access in an effort to save time; &lt;em&gt;financially motivated negative value bias, &lt;/em&gt;in which the indexer skimps on providing access in an effort to earn or save money; and &lt;em&gt;logistical negative value bias, &lt;/em&gt;in which the indexer skimps on providing access in response to logistical issues like software limitations, file size requirements, page count requirements, controlled vocabulary limitations, and the like.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Topic combination (lumper's) bias. &lt;/em&gt;This bias is exhibited by indexers who are likely to combine otherwise dissimilar ideas because they find this "lumping together" of ideas to be aesthetically pleasing or especially useful. This kind of bias is visible in the ratio between locators (page numbers) and subentries, in that entries are more likely to have multiple locators than multiple subentries, on average. For example, an entry like "death penalty, 35, 65, 95" shows that the indexer believes the content on these three pages is similar enough that subentries are not required or useful. Topics that start with the same words might also be combined in a more general topic (such as combining "school lunches" and "school cafeterias" into a combined "school meals.") It is worth noting that some kinds of audiences or documentation subjects may tend toward topic combination bias; for this reason, it may be difficult to recognize lumper's bias.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Topic separation (splitter's) bias. &lt;/em&gt;This bias is exhibited by indexers who are likely to separate otherwise similar ideas because they find this "splitting apart" of ideas to be aesthetically pleasing or especially useful. As with lumper's bias, splitter's bias is represented by the ratio of locators to subentries throughout an index, in that splitters are likely to create more subentries than would other indexers, on average. It is worth noting that some kinds of audiences or documentation subjects may tend toward topic separation bias; for this reason, it may be difficult to recognize splitter's bias.&lt;br /&gt;&lt;br /&gt;These are all the biases I've found or experienced. If you think there's another kind of bias that indexers exhibit, &lt;a href="mailto:seth@maislin.com"&gt;let me know&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The remaining question is this: Is it wrong for an indexer to have bias? That is, should indexers study their own tendencies and work to avoid them? I don't think it's that simple. The artistry that an indexer can demonstrate is fueled by these biases -- experiences, opinions, backgrounds, interpretations -- and perhaps should even be encouraged. An indexer's strengths come from his understanding of not just the material, but also his perceptions of the audience, the publication environment, and the audience's environments. Further, indexers who know and love certain subjects are going to be drawn to them, just as many readers are; these biases aren't handicaps so much as commonalities shared between indexers and readers. Biases will hurt indexers working on unfamiliar materials in unfamiliar media, but under those conditions the biases are the least of our worries; when the indexer is working without proper knowledge, the higher possibility of bad judgment or error is a much greater concern.&lt;br /&gt;&lt;br /&gt;If anything, indexers should be aware of their biases because they can serve as strengths -- especially in comparison to what computers attempt to do.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114800758233252241?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114800758233252241/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114800758233252241&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114800758233252241'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114800758233252241'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/05/bias-in-indexing_18.html' title='Bias in indexing'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114765112447080273</id><published>2006-05-14T18:17:00.000-05:00</published><updated>2006-12-18T00:21:11.965-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='search engines'/><category scheme='http://www.blogger.com/atom/ns#' term='books'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='power of information'/><title type='text'>Demands for quantity are misplaced</title><content type='html'>There is a continuing trend in search engines: more, more, more.&lt;br /&gt;&lt;br /&gt;Press releases from &lt;a href="http://www.google.com"&gt;Google&lt;/a&gt;, like “Google Checks Out Library Books” [&lt;a href="http://www.google.com/intl/en/press/pressrel/print_library.html"&gt;December 14, 2004&lt;/a&gt;] and “Google Tunes Into TV” [&lt;a href="http://www.google.com/intl/en/press/pressrel/video.html"&gt;January 25, 2005&lt;/a&gt;], hit the media waves in a grand style. For the first time, entire libraries of books, from &lt;a href="http://www.harvard.edu"&gt;Harvard&lt;/a&gt; and &lt;a href="http://www.stanford.edu"&gt;Stanford&lt;/a&gt; Universities to the Universities of &lt;a href="http://www.umich.edu"&gt;Michigan&lt;/a&gt; and &lt;a href="http://www.ox.ac.uk"&gt;Oxford&lt;/a&gt;, and soon the &lt;a href="http://www.nypl.org"&gt;New York City Public Library&lt;/a&gt;, will be available from Google’s website. Within limits of copyright, the words of entire books can be searched. Information philosophers are all over this story, essentially declaring that Google will become the public library of the next generation, excited about how the very nature of libraries might change, and scratching their heads over how the book publishing industry is going to survive yet another hit in the market.&lt;br /&gt;&lt;br /&gt;In the second, Google (as well as &lt;a href="http://www.yahoo.com"&gt;Yahoo&lt;/a&gt;!) applauds themselves for once again providing access to a greater diversity of the world’s information, because television’s closed captioning content has been indexed into a &lt;a href="http://video.google.com/"&gt;Google Video&lt;/a&gt; database. Viewers of public broadcasting and basketball are early adopters, and why not? Finally, all those oh-so-deprived sports consumers can satisfy themselves on more than just the videos, statistics databases, press releases, articles, blogs, commentaries, and (don’t forget) live games themselves. Because now they can search among the announcers’ words.&lt;br /&gt;&lt;br /&gt;Now when I search for &lt;a href="http://www.nba.com/sixers/"&gt;76ers&lt;/a&gt;, instead of getting 1.61 million hits, I’ll get 1.62 million. Phooey.&lt;br /&gt;&lt;br /&gt;Our instinctive reaction is to be impressed. I’m thinking of all those Ph.D. theses gathering dust in the &lt;a href="http://www.lib.rochester.edu/index.cfm?PAGE=162"&gt;Physics-Optics-Astronomy Library at my alma mater&lt;/a&gt;, 150-page books without indexes. I’m thinking about &lt;a href="http://www.redsox.com"&gt;Red Sox&lt;/a&gt; fans who, for the first time in a very long time, are interested in the World Series.&lt;br /&gt;&lt;br /&gt;But I’m also thinking about the catalog search system at &lt;a href="http://www.mln.lib.ma.us"&gt;my public library&lt;/a&gt;, which won’t improve with Google’s additions. For a book already in the catalog, adding its content doesn’t help at all. Instead, we’d be cluttering up the database with a few trillion new words.&lt;br /&gt;&lt;br /&gt;So our instincts are wrong.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Why Quantity Hurts&lt;br /&gt;&lt;/strong&gt;&lt;br /&gt;Every few years, search engine companies are finding new ways to promote themselves by bragging about how much they can find. In the early 1990s, &lt;a href="http://www.northernlight.com"&gt;Northern Light&lt;/a&gt; was independently rated top among competitors because they searched the largest percentage of the World Wide Web: sixteen percent. &lt;a href="http://www.time.com"&gt;Time&lt;/a&gt; and &lt;a href="http://www.msnbc.msn.com/id/3032542/site/newsweek"&gt;Newsweek&lt;/a&gt; contributors warned, “We’re not finding everything.”&lt;br /&gt;&lt;br /&gt;In the late 1990s, articles about the “&lt;a href="http://en.wikipedia.org/wiki/Deep+web"&gt;invisible web&lt;/a&gt;” appeared in popular magazines and newspapers, explaining how search engines cataloged only text, image, and sound files, thus skipping over the good content stored as spreadsheets, databases, and fonts. Again came the cry, “We’re not finding everything!”&lt;br /&gt;&lt;br /&gt;And now, in just two months, Google and Yahoo have added more to the huge pile of information: library books and &lt;a href="http://www.fcc.gov/cgb/consumerfacts/closedcaption.html"&gt;closed captioning&lt;/a&gt; data. No longer are our searches limited to the billions of files already on the Web. Yay!&lt;br /&gt;&lt;br /&gt;It’s all about quantity. Nobody seems to care about quality any more.&lt;br /&gt;&lt;br /&gt;Libraries have been struggling to redefine themselves ever since the Internet (and more, the &lt;a href="http://www.ideafinder.com/history/inventions/story069.htm"&gt;World Wide Web&lt;/a&gt;) reached people’s homes. &lt;a href="http://www.publishamerica.com"&gt;Book publishers&lt;/a&gt; also have suffered. The failure isn’t that of these institutions and industries, however, but of the public. The public seems unaware of the &lt;a href="http://en.wikipedia.org/wiki/Observation"&gt;natural filtering process inherent in human behavior&lt;/a&gt;. Publishers choose which titles to publish, and libraries choose which titles to add to their catalogs. You might not agree with their reasoning or results, but you do have to admit that there are human beings at the helm.&lt;br /&gt;&lt;br /&gt;(By the way, &lt;a href="http://publishing-industry.net/modules.php?name=News&amp;file=article&amp;amp;sid=231"&gt;Google's library program was put on hold&lt;/a&gt; because of criticism. This isn't because it's a bad idea or anything, but rather because the traditional publishing industries got scared they'd lose money. It's an I-was-here-first money-by-copyright battle.)&lt;br /&gt;&lt;br /&gt;For many, this filtering-by-design is a major disappointment, which explains the astounding popularity of the World Wide Web. The Web allows everyone to speak up: to post pictures of their pets and babies, their ideas about government, their &lt;a href="http://www.fanfiction.net/l/224/3/0/1/1/0/0/0/0/0/1/"&gt;Harry Potter fan fiction&lt;/a&gt;. But in a room where everyone is shouting, nothing gets heard. Northern Light and Yahoo earned their money offering a way through the noise. Other companies, calling themselves search optimization experts, profit by offering their clients the means to be noticed by these search engines, the hypertext equivalent of megaphones.&lt;br /&gt;&lt;br /&gt;The filtering process is missing.&lt;br /&gt;&lt;br /&gt;Okay, yes, adding content to search databases is a good thing. I might make fun of &lt;a href="http://mikesalsbury.com/mambo/content/view/362/"&gt;sport fanaticism&lt;/a&gt;, but the desire to retrieve information of choice is a valuable privilege of the individual. I might not care about the Red Sox, but I respect that there are others who do. I also feel extremely happy for the researchers who now have access to volumes of scientific research. In many ways, adding content to a database is like translating content into new languages. Really, these are not bad things.&lt;br /&gt;&lt;br /&gt;Even so, we are only adding to the number of &lt;a href="http://www.exodus.co.uk/pictures/a04hp52c.jpg"&gt;people shouting in a room&lt;/a&gt;. Google and Yahoo are definitely improving the scope of what we can find, but they are not improving our ability to find. I might proudly accumulate more and more in &lt;a href="http://www.vrplumber.com/portfoli/full/clutter.jpg"&gt;my attic&lt;/a&gt;, while simultaneously making it harder and harder to retrieve anything.&lt;br /&gt;&lt;br /&gt;Here’s a real example. My wife and I had been expecting our first child (15 months ago). We were struggling in our decision of her middle name. When we searched for “baby names” on the Web, with quotation marks around the phrase, we found 2.5 million sites at Google, 0.9 million at Yahoo, 0.7 million at MSN. Now, imagine that all the contents of library books and scientific articles and sports broadcasters are added to the Web. Although there may exist a few anthropology articles that would have helped us choose a name, I sincerely believe that over 99% of this new content would have proven unhelpful. I also believe that some of this unhelpful content includes the unusual phrase “baby names.” For example, consider this sentence, which appeared on the Web in November 2004: “&lt;a href="http://en.wikipedia.org/wiki/Julia+Roberts"&gt;Julia Roberts&lt;/a&gt; now joins the list of celebrities who have jumped on the Hollywood bandwagon, which gives license to choosing &lt;a href="http://www.keepersoflists.org/index.php?lid=5552"&gt;odd baby names&lt;/a&gt;.”&lt;br /&gt;&lt;br /&gt;After my wife and I decide on a candidate name, we search for that name online, looking for its meaning. The query “Ryan meaning” (without quotes) for the boy’s name Ryan gets 1.2 million hits at Google. (By the way, Google suppresses near-duplicated content and never displays beyond the first 1000 results, so the “true” result set is quite inaccessible.) Because Ryan is a common name, it likely appears numerous times within bibliographies. The word meaning is also extremely common among scholarly articles. If Google indeed adds university libraries to it’s already large database, 1.2 million will become a very small number. As the scientists benefit from a library search, my wife and I will find it that much harder to learn about a particular name using Google.&lt;br /&gt;&lt;br /&gt;It is common knowledge among library scientists and search engine experts that you cannot improve the accuracy of a search at the same time you improve its comprehensiveness. Either you get perfect relevance but miss something useful, or you get everything you want along with content that you don’t. As search engines trend toward larger and larger databases, results pages grow more cluttered.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Please, Sir, May I Have Some Less?&lt;br /&gt;&lt;/strong&gt;&lt;br /&gt;Google’s popularity as a search engine has nothing to do with the numbers of results. When I ask people why they like Google (or whatever search engine they prefer), they answer, “Because what I want is usually within the first few results.” I’ve also gotten the answer, “Because it thinks the way that I do.” Most people don’t want &lt;a href="http://www.usamega.com"&gt;millions of results&lt;/a&gt;. They want three. Three quality results.&lt;br /&gt;&lt;br /&gt;I can’t remember the last time I heard about a search engine &lt;a href="http://www.onlamp.com/pub/a/onlamp/2003/08/21/better_search_engine.html"&gt;improving its algorithms&lt;/a&gt;. Perhaps they do this all the time in secret, inventing features behind the scenes. I do know that if a search engine started regularly serving up &lt;a href="http://nonsense.sourceforge.net"&gt;nonsense&lt;/a&gt;, it would go out of business.&lt;br /&gt;&lt;br /&gt;So why are these efforts at improving quality so unpronounced? Did you know that while Google pays attention to quotation marks, &lt;a href="http://www.lycos.com"&gt;Lycos&lt;/a&gt; doesn’t? That you can type a &lt;a href="http://www.usps.com/zip4"&gt;zip code&lt;/a&gt; into Google to get a map? Many of Google’s best features are published in books like &lt;em&gt;&lt;a href="http://www.amazon.com/gp/product/0596008570/qid=1147650596/sr=2-1/ref=pd_bbs_b_2_1/104-0428854-4355134?s=books&amp;v=glance&amp;amp;n=283155"&gt;Google Hacks&lt;/a&gt;&lt;/em&gt; and even &lt;em&gt;&lt;a href="http://www.amazon.com/exec/obidos/tg/detail/-/0596101619/qid=1147650635/sr=2-3/ref=pd_bbs_b_2_3/104-0428854-4355134?v=glance&amp;s=books"&gt;Google Maps Hacks&lt;/a&gt;&lt;/em&gt; (O’Reilly &amp;amp; Associates, 2004 and 2006 respectively), where few people are going to look for them. Either nobody cares, or nobody knows the difference.&lt;br /&gt;&lt;br /&gt;But when Google adds sports commentary to its search engine, watch out! The story appears in all the major newspapers.&lt;br /&gt;&lt;br /&gt;At times like these, I get rather discouraged. I feel as though I am trying to &lt;a href="http://3lotus.com/images/Bible/RedSeaCrossing.jpg"&gt;hold back the ocean&lt;/a&gt;. It takes me more than a week to index a single, average book. The information world is growing at such an insane pace, my job seems absurd.&lt;br /&gt;&lt;br /&gt;At times like these, I have to remind myself of two perspective. First, context. When I write the index for a book with 350 pages, it doesn’t matter that the “book of Google” has over 8 billion. Someone decided that these 350 pages needed to be written, and it’s my job to make them accessible. My work improves this book. No, I haven’t changed the world, but I have made a difference within the context of this one book, in a segment of this one industry, to a small set of readers. For me, indexing is like the civic duty of &lt;a href="http://www.census.gov/population/www/socdemo/voting.html"&gt;voting&lt;/a&gt;: few win by one vote, and yet every vote counts. It’s also contagious, because voting begets voting. And indexing does beget indexing, because 5% of the people I talk to about my job want to know more, offer me work, or express a desire to become an indexer themselves.&lt;br /&gt;&lt;br /&gt;The second perspective is one of application. I don’t have to index books. If I wanted to make a difference at the source, there are many other applications of skills.&lt;br /&gt;&lt;br /&gt;The key to these perspectives is a willingness to become activists. We are environmentalists in an information world. Just as scientists show concern with over a one-degree rise in ocean temperature, so should we show concern with a one-percent increase in information dissemination. Bulk up our search engines? This is not an environmentally friendly choice.&lt;br /&gt;&lt;br /&gt;I want a search engine—it doesn’t even have to be Google—to announce that they’ve found a way to help me filter out the pages I couldn’t possibly want. The important word here is &lt;em&gt;announce.&lt;/em&gt; The modus operandi of these companies is to bring more &lt;a href="http://scream.deprogramming.us/"&gt;shouting people&lt;/a&gt; into the room, and then publicize this with pride. No wonder the libraries and publishers are in trouble: they’re not being praised for what they do. Neither are the indexers. In the public media, quality gets a whole lot less attention than quantity.&lt;br /&gt;&lt;br /&gt;If the search engine is being improved, they’re not telling anyone. Apparently it’s a secret. I don’t want to hear that someone has added billions of pages to the database, unless I hear also about a system that filters billions of pages away.&lt;br /&gt;&lt;br /&gt;I think it’s wonderful that more esoteric content is being added to the database. I applaud search engine companies who continue to improve their algorithms. What drives me crazy is that everyone is talking about the first, but not the second. The publicity is lopsided. Why won’t anyone talk about quality any more?&lt;br /&gt;&lt;br /&gt;When we asked search engine companies for more, that’s what we got. And we lost precision. Maybe it’s time for us to ask for less.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114765112447080273?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114765112447080273/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114765112447080273&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114765112447080273'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114765112447080273'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/05/demands-for-quantity-are-misplaced.html' title='Demands for quantity are misplaced'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114744127162257863</id><published>2006-05-12T08:34:00.000-05:00</published><updated>2006-05-12T08:44:17.023-05:00</updated><title type='text'>Library of Congress Control Number sj 96004876 ("Babies")</title><content type='html'>If any of my faithful readers were growing concerned about my e-silence, I'm pleased to say that my reason for muteness is wonderful: &lt;a href="http://taxonomist.tripod.com/fun/oren.jpg"&gt;Oren Harold Maislin&lt;/a&gt;, born April 24, 2006. Perhaps coincidentally, this is the 207th anniversary of the establishment of the &lt;a href="http://www.loc.gov"&gt;United States Library of Congress&lt;/a&gt; by &lt;a href="http://www.whitehouse.gov/history/presidents/ja2.html"&gt;President John Adams&lt;/a&gt;. For a full story, visit &lt;a href="http://www.historychannel.com/tdih/tdih.jsp?category=general&amp;month=10272956&amp;amp;day=10272989"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114744127162257863?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114744127162257863/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114744127162257863&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114744127162257863'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114744127162257863'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/05/library-of-congress-control-number-sj.html' title='Library of Congress Control Number sj 96004876 (&quot;Babies&quot;)'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114420269766541125</id><published>2006-04-04T20:44:00.000-05:00</published><updated>2006-12-18T00:20:18.535-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='misspellings and other errors'/><title type='text'>Whatever happened to "indices"?</title><content type='html'>My &lt;a href="http://socrates.uhwo.hawaii.edu/Humanities/Chapin/default.html"&gt;uncle&lt;/a&gt; (among others) asked me, "Is the word &lt;em&gt;indices&lt;/em&gt; no good anymore? It seems that &lt;em&gt;indexes&lt;/em&gt; has won."&lt;br /&gt;&lt;br /&gt;I didn't know those words were contesting, but yes. If a U.S. winner were to be declared today, I'd have to go with &lt;em&gt;indexes.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Although strictly speaking the correct term is &lt;em&gt;indices,&lt;/em&gt; I think in common speech a distinction has been made for an item that is rarely plural. For example, when referring to an appendix in the back of a book, often there is more than one: &lt;em&gt;appendices. &lt;/em&gt;However, when referring to the &lt;a href="http://en.wikipedia.org/wiki/Vermiform_appendix"&gt;(vermiform) appendix in the human body&lt;/a&gt;, rarely do you talk about more than one at a time, and thus "&lt;em&gt;appendixes.&lt;/em&gt;" (And then there's the acronym, &lt;a href="http://www.answers.com/topic/appendix-acronym?method=22"&gt;APPENDIX&lt;/a&gt;, which simply doesn't count.)&lt;br /&gt;&lt;br /&gt;With book indexes, there is &lt;a href="http://rather.jp/p/index.html"&gt;rarely more than one&lt;/a&gt; -- although you can certainly talk about the indexes across the books, as I do. But in &lt;a href="http://www.bhcc.mass.edu/AR/ProgramsOfStudy/Programs2005.php?programID=15"&gt;database programming&lt;/a&gt; and similar constructions, often each line in the database (each record) has its own index. And so you can have thousands of indices. When you're working with indices (as opposed to indexes), you're working with large quantities of small bits of information.&lt;br /&gt;&lt;br /&gt;To me, this logic is what's also behind such oddities as the words "persons" and "peoples." These terms, though related, are attempts at showing quantity in environments where quantity is &lt;a href="http://www.oneinamillionprayer.com/site/pp.asp?c=frKPITNKF&amp;amp;b=12829"&gt;much less likely&lt;/a&gt;. Said another way, these words are attempting to emphasize the value of the singular, even while referring to more than one. Thus "persons" is used in legal contexts where the individual is important, "people" is referring to a group of beings in which &lt;a href="http://www.uh.edu/engines/romanticism/blakeessay2.html"&gt;individuality is not important&lt;/a&gt;, and "peoples" is referring to a collection of groups of beings in which the nature of each group remains important.&lt;br /&gt;&lt;br /&gt;How's THAT for an answer?&lt;br /&gt;&lt;br /&gt;Of course, the only real test is if there are other words that seem to follow the same pattern. So far, I can think of only &lt;em&gt;index &lt;/em&gt;and &lt;em&gt;appendix. &lt;/em&gt;Others?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114420269766541125?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114420269766541125/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114420269766541125&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114420269766541125'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114420269766541125'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/04/whatever-happened-to-indices.html' title='Whatever happened to &quot;indices&quot;?'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114403353966306599</id><published>2006-04-02T21:28:00.000-05:00</published><updated>2006-12-18T00:25:40.892-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing process'/><category scheme='http://www.blogger.com/atom/ns#' term='business of indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='American Society of Indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='training in indexing'/><title type='text'>Standards in the indexing community</title><content type='html'>The &lt;a href="http://www.asindexing.org"&gt;American Society of Indexers&lt;/a&gt; has been deluged with opinions regarding its venture into credentialing individual indexers. Leaving aside all legitimate concerns about the implementation of credentials, there are still many strongly voiced opinions about whether &lt;a href="http://en.wikipedia.org/wiki/Credentialing"&gt;credentialing&lt;/a&gt; is a good idea in the first place.&lt;br /&gt;&lt;br /&gt;One thing about indexing certification and index standards building that I keep getting stuck on is the question of who they're really for. There are indexers who believe credentialing will impact all indexers in a negative way, or just ASI members. There are many who suspect &lt;a href="http://www.asindexing.org/site/indfaq.shtml#FAQ1012"&gt;new indexers&lt;/a&gt; will benefit, and that everyone else will get hurt.&lt;br /&gt;&lt;br /&gt;But I don't believe the truest benefits to standards building are about the individual indexer. There will be effects, and certainly those effects will be different for different people, but the whole reason standards are created are to improve the industry as a whole. At least, that's how I understand it.&lt;br /&gt;&lt;br /&gt;There is a standard for indexing already out there. It's the &lt;a href="http://www.asindexing.org/site/bibliog.shtml#internationalA"&gt;ISO 999 standard&lt;/a&gt;. Does it benefit you? If you were indexing when it was updated in 1996, did you feel any repercussions in your business? Gosh, it sounds as if I'm joking.&lt;br /&gt;&lt;br /&gt;Recently I learned that &lt;a href="http://www.boston.com/news/local/massachusetts/articles/2006/03/30/carbon_monoxide_detectors_are_hot_items/"&gt;Massachusetts now requires carbon monoxide detectors&lt;/a&gt; on every floor. That means I need to install two more in the house. If I don't install them, I won't have a problem until I attempt to sell the house. And truthfully, I'm kind of annoyed that safety regulations are being pushed onto me -- the vehicle &lt;a href="http://www.snopes.com/autos/accident/seatbelt.asp"&gt;seat belt law&lt;/a&gt;, the &lt;a href="http://www.helmets.org/mandator.htm"&gt;bicycle helmet law&lt;/a&gt;, and now this. I'm not saying that seat belts and helmets are stupid things, but I don't like feeling forced into wearing them.&lt;br /&gt;&lt;br /&gt;ASI is attempting to create a standard -- certainly one that is harder to define than "wearing a seat belt or not," I'll freely admit -- and so yes, it is a bit disconcerting to be on the receiving end. I'm doing just fine without my carbon monoxide detectors, and I'll do just fine without professional credentialing.&lt;br /&gt;&lt;br /&gt;But ask yourself if you believe in the ideal here. Do you believe that standards *should* exist? You don't have to.&lt;br /&gt;&lt;br /&gt;I do.&lt;br /&gt;&lt;br /&gt;There are many industries that survive very well without some kind of license, certificate, or degree, and some might think that indexing is one of them. But no matter how it affects me personally, I really believe indexers should be a part of a larger entity, something more important than a simple &lt;a href="http://www.artshopgallery.com/Lavallee%20coffee_klatch.jpg"&gt;networking community&lt;/a&gt;. Do we -- and I mean ALL of us who write indexes -- have anything in common? Do we have anything to fight for as a group (other than higher rates)? Nah. As an industry, we really don't stand for anything without a standard. We're just a bunch of word &lt;a href="http://www.invent.org/"&gt;inventors&lt;/a&gt; -- &lt;a href="http://en.wikipedia.org/wiki/Tinkerer"&gt;tinkerers&lt;/a&gt; -- working alone in our attics.&lt;br /&gt;&lt;br /&gt;Most days, I'm fine standing for nothing. I like my &lt;a href="http://www.asindexing.org/site/indfaq.shtml#FAQ1011"&gt;income&lt;/a&gt;, I like most of my clients and projects, and I absolutely love being able to work at home and raise a &lt;a href="http://taxonomist.tripod.com/fun/flowers.avi"&gt;daughter who stops to smell the flowers&lt;/a&gt;. Some people like &lt;a href="http://www.sudoku.com/"&gt;Sudoku&lt;/a&gt; and &lt;a href="http://puzzles.usatoday.com/"&gt;crossword puzzles&lt;/a&gt;; I like indexing and teaching.&lt;br /&gt;&lt;br /&gt;Other days, I feel like someone &lt;a href="http://www.sstg.org/images/cartoon-bailing-out.jpg"&gt;bailing out a rowboat with a hole in it&lt;/a&gt;, with indexes as &lt;a href="http://weibel-lines.typepad.com/photos/slw/buckets.jpg"&gt;buckets&lt;/a&gt;. I am frustrated by a &lt;a href="http://www.polymercentre.org.uk/community/features/emulsion.php"&gt;complete lack of growth&lt;/a&gt; in the industry, by repeating myself to every new production editor I meet, by fighting for the right to use &lt;a href="http://support.microsoft.com/default.aspx?scid=kb%3Ben-us%3B119861"&gt;page ranges&lt;/a&gt; or &lt;a href="http://www.asindexing.org/site/software.shtml#dedicated"&gt;decent indexing software&lt;/a&gt;, and again and again by having to justify my very reasonable rates.&lt;br /&gt;&lt;br /&gt;The deeper meaning that can come from credentialing also comes from other things: professional development, education, and research -- including all those great index usability research ideas. Credentialing isn't the only pursuit of this association, nor would it truly succeed in isolation. But I need to be a part of an association that advocates not just for its individual members, but for the meaning of the industry itself.&lt;br /&gt;&lt;br /&gt;Credentialing can be part of the solution.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114403353966306599?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114403353966306599/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114403353966306599&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114403353966306599'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114403353966306599'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/04/standards-in-indexing-community.html' title='Standards in the indexing community'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114321288237008514</id><published>2006-03-24T09:23:00.000-05:00</published><updated>2006-12-18T00:24:54.835-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing process'/><category scheme='http://www.blogger.com/atom/ns#' term='web indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='pages and page ranges'/><category scheme='http://www.blogger.com/atom/ns#' term='books'/><category scheme='http://www.blogger.com/atom/ns#' term='keywording'/><title type='text'>The granularity of an online "page number"</title><content type='html'>When writing a hyperlinked index (where hyperlinks are used instead of page numbers), to what should those links point?&lt;br /&gt;&lt;br /&gt;Some people think they should point to the section title in which the information is provided; other people like to point right to the specific word used in the index. The "answer," obviously, is that hyperlinked index entries should take the readers to &lt;a href="http://www.infoplease.com/"&gt;where the information is&lt;/a&gt;, right? The problem -- the reason there's this question of "where do I point my entries" in the first place is that readers of hypertext might find themselves bounced somewhere they don't understand. How many times have you followed a link, only to find yourself fiddling around with the scroll bar to figure out where you ended up? Following a hyperlink is like being blindfolded and transported to an unknown destination.&lt;br /&gt;&lt;br /&gt;What you may not know is that book indexes aren't much different. :-)&lt;br /&gt;&lt;br /&gt;Think about how book indexes actually work, and you realize that direct readers to the page on which the information starts. An entry like "buoyancy, 164" tells the reader to look &lt;em&gt;somewhere on page 164&lt;/em&gt;; an entry like "global harmony, 164-167" tells the reader to start looking &lt;em&gt;somewhere on page 164.&lt;/em&gt; The &lt;strong&gt;&lt;a href="http://www.webopedia.com/TERM/G/granularity.html"&gt;granularity&lt;/a&gt; of an index&lt;/strong&gt; is defined as the smallest unit of area that can pointed to. For printed indexes, this area is the page number. Rarely will you find locators that use fractional or qualified page numbers like 164-1/2 or 164&lt;em&gt;top. &lt;/em&gt;(There are such things as qualified locators, like &lt;em&gt;164f,&lt;/em&gt; which might &lt;a href="http://72.14.203.104/search?q=cache:p8Rh4O-iu5kJ:www.psupress.org/author/indexing_guidelines.pdf+indexing+footnotes&amp;hl=en&amp;amp;gl=us&amp;ct=clnk&amp;amp;cd=2"&gt;point to the footnote&lt;/a&gt; on page 164, but even in the books in which they're used they comprise only a small number of all locators used.)&lt;br /&gt;&lt;br /&gt;If you follow the standards of the industry, then, the granularity of a printed index is one physical page. For this reason, books that have lots of words on a page -- big pages, narrow margins, tiny print -- are less friendly to book indexers. It's like telling someone that there's a &lt;a href="http://en.wikipedia.org/wiki/Space_needle"&gt;needle&lt;/a&gt; in that 164th &lt;a href="http://www.ibiblio.org/wm/paint/auth/monet/haystacks/"&gt;haystack&lt;/a&gt; over there. Maybe we should count our blessings that someone bothered to number the haystacks, but ideally this is where the book designer starts earning her salary. Book pages don't have to look like &lt;a href="http://southernfood.about.com/od/candyrecipes/r/bl30427n.htm"&gt;haystacks&lt;/a&gt; -- more accurately, wordstacks -- if the book has legible headings and subheadings. Books can be written with quickly visible landmarks within the pages, like italics and boldface, larger and smaller font sizes, headings and callouts, footnotes, and so on. Going back to the blindfolded analogy, there's no reason we have to drop our readers into deserts of information, when we can drop them in a place surrounded by location clues and navigational signs, like at a &lt;a href="http://www.google.com/local?hl=en&amp;lr=&amp;amp;c2coff=1&amp;safe=off&amp;amp;rls=GGLD,GGLD:2004-27,GGLD:en&amp;q=train+stations&amp;amp;near=Boston,+MA&amp;sa=X&amp;amp;amp;oi=local&amp;amp;ct=title"&gt;train station&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;On the Web, however, there is no such thing as a printed page. Web pages can be any length, from tiny pop-up windows with only a sentence fragment of information within, to long scrolls of endless paragraphs and images. Additionally, you don't have to direct the reader to just the page any more, but rather you can deposit him anywhere within the page. The granularity of a Web page is a word! You can send someone into the middle of a paragraph.&lt;br /&gt;&lt;br /&gt;When you have tiny little windows of information, using that window as a destination is a no-brainer: the reader arrives at a single sentence of information, which is what he needs. It doesn't matter if you point him to the beginning, middle, or end of that sentence, because it's all they get to read. Pointing someone to an isolated window of information -- what Web authors call "&lt;a href="http://www.macloo.com/webwriting/chunks.htm"&gt;chunks&lt;/a&gt;" -- is as easy as looking into a food pantry that contains only a single can. But when you have longer pages, and you have the ability to point someone to any spot within those longer pages, you have a decision to make. And it's a decision that didn't exist in the printed world, with its larger granularity.&lt;br /&gt;&lt;br /&gt;The solution is to connect the text of the index entry with the text of the documentation. Not the meaning, but the actual &lt;em&gt;words. &lt;/em&gt;If the index entries are written to almost identically match those of the documentation, then the reader won't mind as much because it won't look like a desert. They'll have exactly the landmark they need right in front of them. The entry "cancer, prevention of," for example, could point directly to this line without a problem:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;color:#006600;"&gt;... cessation of smoking. In fact, many physicians are well aware that one way to prevent cancer is to quit ...&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;That's because the words of your index entry, which are &lt;em&gt;cancer&lt;/em&gt; and &lt;em&gt;prevention,&lt;/em&gt; appear almost verbatim in that line of text. And if this information were part of a section titled "Using Peer Pressure to Help Patients Quit Smoking," then you really wouldn't want to point to the heading for context. That's because it's unclear to the reader that you're actually directing him to information about cancer or prevention. You're making them work at it.&lt;br /&gt;&lt;br /&gt;And then there's the other situation. Using the same sentence and heading as above, where should the indexer point readers who look up the entry "smoking, how to quit"? Clearly they should go right to the heading. If they went to the line that talked about physicians, they wouldn't know where they are.&lt;br /&gt;&lt;br /&gt;Our original question here was this: When writing a hyperlinked index (where hyperlinks are used instead of page numbers), to what should those links point? Clearly the only way to answer this question comprehensively is to suggest that the language of hyperlink indexes has two contexts: the index entry itself and the destination location. These two contexts need to work together. And as we saw, the same is true with the printed book: having arrived at page 164, how quickly can you find the idea you were looking for?&lt;br /&gt;&lt;br /&gt;Looking this closely at hyperlinked indexes only emphasizes something we need for all indexing: use index entries that match the documentation text. If you have to write a slightly longer entry, that's okay. Instead of "cigarettes," use "cigarette smoking, quitting." Instead of "social networks," use "social networks and peer pressure." The people who work with &lt;a href="http://www.searchenginewatch.com"&gt;search engines&lt;/a&gt; and &lt;a href="http://www.marketingtips.com/"&gt;Internet marketing&lt;/a&gt; are familiar with the term &lt;em&gt;&lt;a href="http://www.internet-marketing-dictionary.com/trigger-words.html"&gt;trigger words&lt;/a&gt;,&lt;/em&gt; which refers to visible language that matches the mental language of the searcher. If you're thinking of the words "&lt;a href="http://www.crystalinks.com/whitelephants.html"&gt;white elephant&lt;/a&gt;," then a result of "pale &lt;a href="http://www.tiscali.co.uk/reference/dictionaries/difficultwords/data/d0009466.html"&gt;pachyderm&lt;/a&gt;" doesn't work because it doesn't trigger your sense of recognition.&lt;br /&gt;&lt;br /&gt;So the next time there's a white elephant in haystack 164, be sure to tell someone as explicitly as possible.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114321288237008514?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114321288237008514/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114321288237008514&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114321288237008514'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114321288237008514'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/03/granularity-of-online-page-number.html' title='The granularity of an online &quot;page number&quot;'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114291245144607273</id><published>2006-03-20T21:49:00.000-05:00</published><updated>2006-12-18T00:26:25.970-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='search engines'/><category scheme='http://www.blogger.com/atom/ns#' term='keywording'/><title type='text'>Frustrated by a lack of meaning</title><content type='html'>Not for the first time, search engines have been wrongly criticized for the politics of their results.&lt;br /&gt;As reported by &lt;em&gt;&lt;a href="http://www.nytimes.com"&gt;The New York Times&lt;/a&gt;&lt;/em&gt; today ("&lt;a href="http://www.nytimes.com/2006/03/20/technology/20amazon.html?_r=1&amp;th&amp;amp;emc=th&amp;oref=slogin"&gt;Amazon Says Technology, Not Ideology, Skewed Results&lt;/a&gt;," March 20, 2006), an abortion-rights organization discovered and reported the appearance of biased results in its search engine. Apparently books with anti-abortion leanings appeared as more relevant on &lt;a href="http://www.amazon.com"&gt;Amazon&lt;/a&gt;'s search results pages. I am not taking sides on this highly charged issue; I am taking offense at the ignorance demonstrated by people who don't seem to understand how search works. (And I'm not singling out this issue either, as you'll see from my later examples.)&lt;br /&gt;&lt;br /&gt;See, there isn't a search engine on the &lt;a href="http://earth.google.com/"&gt;planet&lt;/a&gt; that can cull &lt;em&gt;actual meaning &lt;/em&gt;from its databases. They can only look at the words themselves. Even search engines that analyze &lt;a href="http://cepa.newschool.edu/~quigleyt/vcs/psychoanalysis.html"&gt;the behavior of their users&lt;/a&gt; still look at words and numbers, without interpretation.&lt;br /&gt;&lt;br /&gt;Let me explain what really happened with Amazon, and why Amazon is not automatically in the wrong. Someone went to the search engine and typed in the word &lt;em&gt;abortion.&lt;/em&gt; Now imagine that you're the search engine, and you have two results to give back. Result one is a book whose title is simply &lt;em&gt;Abortion.&lt;/em&gt; The second is a book whose title is &lt;em&gt;Understanding Abortion.&lt;/em&gt; Tell me: which result is more relevant? Answer: you have no clue.&lt;br /&gt;&lt;br /&gt;When faced with this impossible question, the search engines at Amazon and elsewhere attempt to apply certain generalizations that might work in other situations, but simply don't work here. For example, there might exist a rule that puts &lt;em&gt;Abortion &lt;/em&gt;ahead of &lt;em&gt;Understanding Abortion&lt;/em&gt; because the title of the first book matches the query exactly, whereas the second title is only "&lt;a href="http://www.poweroptimism.com/"&gt;half right&lt;/a&gt;." Or perhaps one of the books is 500 pages long, but the other is 200 pages long, and Amazon favors longer books. Maybe Amazon is interested in selling you the more expensive book, the book more recently published, or the book that gets a higher rating from all the people visiting the website. In the end, however, all of this analysis fails -- completely and utterly fails -- to answer a very simple question: which of these books is &lt;em&gt;against &lt;/em&gt;abortion? Heck, even I don't know, and I invented them!&lt;br /&gt;&lt;br /&gt;&lt;em&gt;With search, meaning is irrelevant. &lt;/em&gt;Search engines can look only quantitatively at the &lt;a href="http://www.pacificnet.net/~cmoore/alphabet/"&gt;letters&lt;/a&gt; of the words, and at innumerable statistics (e.g., &lt;a href="http://www.amazingcounters.com"&gt;number of Web views&lt;/a&gt;) that have at best a tangential relationship with meaning.&lt;br /&gt;&lt;br /&gt;Before we look at another example, let me also talk about another thing that Amazon did at one time. If you searched for the word abortion, in addition to your results you received what should be interpreted as a helpful search hint: "Did you mean &lt;em&gt;adoption&lt;/em&gt;?" This might sound political, but the logic of this lies in the similar spelling of the words adoption and abortion. Given that there are many more books about adoption than abortion at Amazon, the search engine guessed that someone typing the word &lt;em&gt;abortion &lt;/em&gt;might have misspelled something; the computer offered what it considered a &lt;a href="http://acronyms.thefreedictionary.com/Reasonable+Alternative"&gt;reasonable alternative&lt;/a&gt;. Had that suggested word been something different -- "Did you mean &lt;em&gt;apportion&lt;/em&gt;?" -- no one would have cared.&lt;br /&gt;&lt;br /&gt;By the way, I will admit that it is always possible that a company, like Amazon, could consciously manipulate its search results to accomplish some kind of selfish ends. &lt;a href="http://www.lycos.com"&gt;Lycos&lt;/a&gt; puts sponsored links at the top; &lt;a href="http://www.yahoo.com"&gt;Yahoo&lt;/a&gt; promotes its internal products over those of others; Amazon presents the products of its more lucrative partners over all others. It is not far-fetched to imagine a company exercising editorial control for political or religious purposes, especially in today's age. The problem is that some issues are perceived as so volatile that no one is willing to consider coincidence of language as just that, a coincidence. &lt;a href="http://www.mtoomey.com/poweroflanguage.html"&gt;Language is powerful&lt;/a&gt; stuff; spelling &lt;em&gt;women &lt;/em&gt;as &lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/Womyn"&gt;womyn&lt;/a&gt; &lt;/em&gt;to avoid the "men" letter subset is a powerful choice, whether you agree or not.&lt;br /&gt;&lt;br /&gt;Here's another story, from the late &lt;a href="http://en.wikipedia.org/wiki/1980s"&gt;1980s&lt;/a&gt;. A search for the word &lt;em&gt;monkey &lt;/em&gt;within a &lt;a href="http://office.microsoft.com/clipart/default.aspx?lc=en-us"&gt;database of clip art provided by Microsoft&lt;/a&gt; produced a seemingly offensive result: a picture of African-American children. There was an uproar, and although &lt;a href="http://www.microsoft.com"&gt;Microsoft&lt;/a&gt; denied that it had done anything intentionally racist, it quickly removed the image from the database. The real problem, however, is that the children in the image were playing on &lt;em&gt;&lt;a href="http://a9.com/monkey%20bars?a=simage"&gt;monkey bars&lt;/a&gt;.&lt;/em&gt; Interestingly, if you stop to think about it, the only racism in this example is caused by the person who performed the search! That's the person who actually connected the word &lt;em&gt;&lt;a href="http://a9.com/monkey?a=simage"&gt;monkey&lt;/a&gt; &lt;/em&gt;with the children (and not the playground equipment); no one at Microsoft did. In this example, the giant void where meaning should have been was automatically filled in by the searcher, by association and as a reflex.&lt;br /&gt;&lt;br /&gt;Here's another story, from last year. An article (I can't remember where) expressed how a bad critical review of a specific performer appeared more relevant in a &lt;a href="http://www.google.com"&gt;Google&lt;/a&gt; search that the good reviews -- of the performer's own website. This would be equivalent to searching for me ("Seth Maislin") and getting a top result of "Seth Maislin Has Bad Teeth" instead of &lt;a href="http://maislin.blogspot.com"&gt;this blog&lt;/a&gt;, &lt;a href="http://taxonomist.tripod.com"&gt;my website&lt;/a&gt;, or one of my &lt;a href="http://www.oreilly.com/news/seth_0799.html"&gt;interview&lt;/a&gt; at &lt;a href="http://www.oreilly.com"&gt;O'Reilly &amp;amp; Associates&lt;/a&gt;. In this case, Google isn't passing judgment, but it certainly feels like it! Instead, it's looking at how popular that Bad Teeth article might be, or its host (for example, it might be a &lt;a href="http://online.wsj.com/public/us"&gt;Wall Street Journal&lt;/a&gt; or &lt;a href="http://people.aol.com/"&gt;People Magazine&lt;/a&gt; article, periodicals that have readerships thousands of times larger than anything I've ever done), and using that popularity to push the article to the top. It's assuming -- wrongly, in this case -- that people looking for me are less interested in my website than in what &lt;em&gt;&lt;a href="http://www.teenpeople.com/"&gt;Teen People&lt;/a&gt;&lt;/em&gt; or the &lt;em&gt;&lt;a href="http://www.wsj.com/"&gt;WSJ&lt;/a&gt;&lt;/em&gt; has to say.&lt;br /&gt;&lt;br /&gt;Search just doesn't care. If you're looking for meaning, don't ask a search engine.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114291245144607273?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114291245144607273/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114291245144607273&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114291245144607273'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114291245144607273'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/03/frustrated-by-lack-of-meaning.html' title='Frustrated by a lack of meaning'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114247872503814710</id><published>2006-03-15T21:15:00.000-05:00</published><updated>2006-12-18T00:22:46.946-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='human factors'/><category scheme='http://www.blogger.com/atom/ns#' term='keywording'/><title type='text'>Bikini emergencies</title><content type='html'>Index entries can be divided into two categories: the things people look up, and the things they don't. The first category is straightforward, but the second category is inspiring.&lt;br /&gt;&lt;br /&gt;I think it's fair to say that index entries that no one looks up are, well, better left out of the index in the first place. If you're reading a book about the cardiovascular system, there's little point in including the index entry "jack-o-lanterns."&lt;br /&gt;&lt;br /&gt;It's at this point that I've love to say, "Enough said," but I'd be wrong. That's because having entries that no one looks up isn't really a problem, as long as you don't have too many of them. Ignoring the costs of the physical space they require, or the indexers' and editors' resources in making them actually appear in that physical space, unused index entries can exist without anyone really caring, like pennies in a penny jar. As long as the jar isn't full, no one cares.&lt;br /&gt;&lt;br /&gt;Why was the entry there in the first place, though? There are some legitimate reasons, of course, with the &lt;em&gt;most &lt;/em&gt;legitimate being a reflection of an author's non sequitur. It's not too far-fetched to imagine a cardiovascular surgeon authoring a textbook about the cardiovascular system, taking a moment to wax poetic in a footnote about how the surgically cutting into the chambers of the human heart always remind him pumpkin carving on Halloween. If he takes even a half-sentence to explain &lt;em&gt;why&lt;/em&gt; heart surgery has something in common with ritualistic gourd mutilation, the indexer will notice this and create that silly entry for "jack-o-lanterns." No one will look it up, but that's not the indexer's fault, is it? She's just doing her thorough-as-usual job.&lt;br /&gt;&lt;br /&gt;Then there are the indexers who include things without realizing who the audience is. They include ideas that are too esoteric, off-topic, general, or &lt;a href="http://www.lssu.edu/banished/"&gt;inappropriate&lt;/a&gt; for people to look up. For example, in a book about pet care, they might include an entry for "&lt;a href="http://www.cartoonstock.com/directory/s/schnauser.asp"&gt;schnausers&lt;/a&gt;. &lt;em&gt;See &lt;/em&gt;dogs." Alternatively, they index the &lt;em&gt;ideas &lt;/em&gt;people will look up, but under labels that no one would look up, like using the term "&lt;a href="http://en.wikipedia.org/wiki/Octothorpe"&gt;octothorpe&lt;/a&gt;" as a name for the # symbol.&lt;br /&gt;&lt;br /&gt;And of course there are always the honest mistakes: spelling or typographical errors, document file anomalies, outdated indexes for new materials, and so on.&lt;br /&gt;&lt;br /&gt;What most indexers seem to forget, however, is that just as these odd index entries might be created by accident -- author silliness, indexer ignorance, production oversight -- those very same index entries might be discovered by accident, too: reader serendipity. One of the great advantages of printed indexes over search results (or search-accessed indexes) is the serendipity that results from browsing. Just as we're likely to discover interesting words in the dictionary while trying to look up something else -- look up "&lt;a href="http://www.whitehouse.gov/history/presidents/tj3.html"&gt;Jefferson, Thomas&lt;/a&gt;" and find yourself reading about &lt;em&gt;jeffing&lt;/em&gt; and &lt;em&gt;jeffus&lt;/em&gt; -- so we might discover interesting things those indexes.&lt;br /&gt;&lt;br /&gt;If you found "jack-o-lanterns" in a cardiology textbook, wouldn't &lt;em&gt;you &lt;/em&gt;follow the entry? I know I would. And that's why there's one last reason for including these unlikely-to-be-used entries in indexes: sheer joy.&lt;br /&gt;&lt;br /&gt;Did someone say, "&lt;a href="http://www.lindisima.com/en/bikini.htm"&gt;bikini emergency&lt;/a&gt;"?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114247872503814710?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114247872503814710/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114247872503814710&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114247872503814710'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114247872503814710'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/03/bikini-emergencies.html' title='Bikini emergencies'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114239594165965051</id><published>2006-03-14T21:45:00.000-05:00</published><updated>2006-03-14T23:14:46.346-05:00</updated><title type='text'>Fixing the books that leak</title><content type='html'>The more that I think about &lt;a href="http://taxonomist.tripod.com/schedule.html#neasi0406"&gt;the presentation I am giving&lt;/a&gt; in April to the &lt;a href="http://www.newenglandindexers.org/"&gt;ASI New England Chapter&lt;/a&gt;, the more I find myself contemplating the environment in which indexers work. Why do so many indexers have trouble bidding for work?&lt;br /&gt;&lt;br /&gt;Because clients don't grasp what this industry is about.&lt;br /&gt;&lt;br /&gt;I'll make the analogy between a book without an index and a leaky faucet. The client recognizes that his sink faucet is malfunctioning. He is unable or unwilling to &lt;a href="http://doityourself.com/baths/leakingfaucet.htm"&gt;fix the faucet&lt;/a&gt; himself. He investigates the services of a professional faucet fixer.&lt;br /&gt;&lt;br /&gt;Taking the three most common bidding environments for indexers, we get these three models for an owner of a leaky faucet.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Model 1: The Drowning Homeowners at Midnight. &lt;/strong&gt;Having waited too long, or perhaps because another faucet-fixer failed to complete the job, these clients need help NOW. They're willing to pay any amount of money, but their standards are very low. "Please," they say, "just stop the leak." If they happen to know how to reach a specific faucet-fixer at midnight, that faucet-fixer is going to make a &lt;a href="http://www.texasmoving.com/images/Main/PileMoney1.jpg"&gt;lot of money&lt;/a&gt;. But if there's no one they can call at midnight, they're going to call everyone, leave a lot of messages, and wade knee-deep in hundreds of early-morning responses. It's a bidding war, and the least expensive person wins.&lt;br /&gt;&lt;br /&gt;For indexers to make this model work, they need to have their names right there on the clients' desks, and they need to be prepared to sacrifice quality for the sake of an important deadline. However, this model is terrible for the industry, because it leads to bidding wars where indexers underbid each other for the privilege of doing a lousy job.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Model 2: The Paranoid Homeowners. &lt;/strong&gt;Here are clients who have worked with so many bad contractors over the years that &lt;a href="http://www.psychologytoday.com/articles/pto-20020301-000002.html"&gt;they simply don't trust anyone&lt;/a&gt;. They're looking for someone who will &lt;a href="http://www.diynetwork.com/diy/diy_kits/article/0,2019,DIY_13787_2754599,00.html"&gt;fix the faucet&lt;/a&gt; according to ridiculously robust specifications, and who will allow them to stand over their shoulders, watch the fixer's every move, and offer suggestions and instructions all along the way. It's obvious this homeowner would &lt;a href="http://doityourself.com"&gt;do the work himself&lt;/a&gt; if he could -- which means all of those specifications, suggestions, and instructions are coming from a place of very hostile ignorance. With this model, indexers will find themselves &lt;a href="http://en.wikipedia.org/wiki/Insult"&gt;burned&lt;/a&gt; almost every time.&lt;br /&gt;&lt;br /&gt;This model doesn't help the individual indexers or the indexing industry, yet the model exists only because indexers as a whole failed to uphold any kind of standards, or to educate their clients about indexing. It's not uncommon for a more experienced indexer to clean up the mess left behind by someone without appropriate experience or sufficient resources to get the job done right the first time. When faced with this model, the indexer must be prepared to uphold his own principles, explain his choices, demonstrate precedent, and remain friendly throughout. I suppose it's like trying to talk about love to someone who &lt;a href="http://love.ivillage.com/lnsproblems/lnsdivorce/0,,nx2c,00.html"&gt;just got divorced&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Model 3: The Proud Fixer-Uppers. &lt;/strong&gt;Then there are the folks who would be happy to &lt;a href="http://www.ehow.com/how_15854_fix-leaky-faucet.html"&gt;fix the faucet&lt;/a&gt; themselves but don't have the patience, the know-how, or the resources. They want help, but on some level they resent needing it. These are the people who, when presented with a faucet-fixing cost of $80, ask, "Will you do it for $40?" If you say no, they'll say, "What if you just put the pieces in a line and I'll wrench them together myself?" Say hello to the &lt;a href="http://www.islandnet.com/~luree/silly.html"&gt;silly goons&lt;/a&gt; who would rather let their faucets leak than admit they need help worth paying for.&lt;br /&gt;&lt;br /&gt;This model exists because the industry is underrespected. If someone had an overflowing toilet, do you really think they'd try to haggle with the plumber? But with indexing, this happens all the time. "Just index the headings" and "Do the best you can in only three days" are all too common in an industry where people aren't aware of the advantages of a good index. The individual indexers perpetuate this problem by accepting these underpriced, undervalued jobs and creating mediocre products.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Yes, there is a fourth model, and that's the ideal situation: someone who understands that indexing is a specialized trade that requires a professional with the appropriate education, background, and resources to get the job done. If this model were the most common, though, would indexers still be afraid of bidding? Nope.&lt;br /&gt;&lt;br /&gt;What would it take to convert the clients of the three models above into the clients of a more reasonable, &lt;em&gt;respectful &lt;/em&gt;model? I've heard a lot of suggestions -- education, indexing standards, indexer credentialing -- but the problem runs a lot deeper than most people think. There are two fundamental issues here: knowing what a good index is, and knowing that an indexing profession exists.&lt;br /&gt;&lt;br /&gt;With a faucet, all we need is water on the floor to know it's broken, but you can't "replace a washer" to make a bad index into a good index. And while most home dwellers have heard of plumbers, few of the world's literate population have heard of indexers! Trying to educate the world about indexing is like trying to explain the &lt;a href="http://en.wikipedia.org/wiki/Ideal_gas_law"&gt;ideal gas law&lt;/a&gt; to toddlers: they may breathe the air, but that doesn't mean they know how it works. (And don't ask their parents, because they don't know either!)&lt;br /&gt;&lt;br /&gt;The law that mandated the wearing of &lt;a href="http://www.buckleupamerica.org"&gt;seat belts&lt;/a&gt; in motor vehicles raised awareness of their value. Would they work for indexers?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114239594165965051?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114239594165965051/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114239594165965051&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114239594165965051'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114239594165965051'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/03/fixing-books-that-leak.html' title='Fixing the books that leak'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114125233156515740</id><published>2006-03-01T12:09:00.000-05:00</published><updated>2006-12-18T00:24:05.692-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='content management'/><title type='text'>College content management</title><content type='html'>If you thought online education was a growing industry, you underestimated. It's about to explode. That's because Congress is now allowing federal financial aid to students of colleges that teach more than half of their courses off-campus, including over the World Wide Web. (See &lt;a href="http://www.nytimes.com/2006/03/01/national/01educ.html"&gt;"Online Colleges Receive a Boost From Congress," The New York Times, March 1 2006&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;Let me rephrase that. &lt;em&gt;Colleges &lt;/em&gt;are going to explode. Like scambling an egg.&lt;br /&gt;&lt;br /&gt;There are two types of courses that are likely to appear online at exponential speeds. First, you have the low-end requirement courses, like &lt;a href="http://www.google.com/search?q=calculus+101"&gt;Calculus I&lt;/a&gt; and &lt;a href="http://www.google.com/search?&amp;q=english+101"&gt;English 101&lt;/a&gt;, which are courses everyone has to take to earn a bachelor's degree. These courses are regularly populated with large numbers of students, and yet they are introductory-level, tried-and-true courses that are relatively simple to teach and grade. (In fact, graduate students usually teach these courses.) Instead of wasting valuable classroom space and valuable instructor time, these courses will appear online. For large universities, there's an even greater advantage: hundreds of students can be taught at once without that "giant lecture hall" atmosphere, and without having to create numerous sections (class segments) for advising lessons or grading purposes.&lt;br /&gt;&lt;br /&gt;The other type of course that is going to end up online are the courses that professors are just climbing all over themselves to teach, either because it's a cutting-idea, the product of a personal info-lust or -peeve, or simply because it's so esoteric that their departments have never granted them the opportunity to teach something that almost no one will attend. These kinds of courses will prosper online because students from all over the world can attend. That sociology elective called "What License Plates Tell Us About Our Culture" won't get just one student any more, but tens of students, and that makes professors and universities happy. I'm an online instructor of a rather esoteric subject -- &lt;a href="http://www.asindexing,org"&gt;indexing&lt;/a&gt; -- and so being able to offer &lt;a href="http://www.middlesex.mass.edu/CareerTraining/WritingIndexesforBooksandWebsites.htm"&gt;my indexing course&lt;/a&gt; over the Internet has enabled me to reach dozens of indexing professionals and enthusiasts from around the globe every year. Had I continued to teach only in person, the course might have been cancelled for lack of interest.&lt;br /&gt;&lt;br /&gt;So suppose &lt;a href="http://dir.yahoo.com/Education/Higher_Education/Colleges_and_Universities/United_States/"&gt;every college in the United States&lt;/a&gt; retools that Calculus I class into an online course. It's not easy &lt;a href="http://www.trainingcafe.com/members/coursesite/"&gt;building an online course&lt;/a&gt;, but talk about unnecessary redundancy! No, if department heads are smart, they'll team with the department heads from other colleges and share a course. For example, I can imagine a single English 101 course offered to every college student in &lt;a href="http://www.mass.gov/portal/index.jsp"&gt;Massachusetts&lt;/a&gt;. I'm not saying this is a perfect idea, but then again, neither is having &lt;a href="http://dir.yahoo.com/Education/Higher_Education/Colleges_and_Universities/By_Region/"&gt;14538&lt;/a&gt; of them.&lt;br /&gt;&lt;br /&gt;This is a content management problem (which is why I'm blogging about this). &lt;a href="http://www.amazon.com/gp/product/0735713065/qid=1141251409/sr=1-3/ref=sr_1_3/002-5416310-4424857?s=books&amp;amp;v=glance&amp;n=283155"&gt;Content management&lt;/a&gt; is about many things, including (a) avoiding redundancy in communication; (b) avoiding the communication of inaccurate, outdated, or contradictory things; and (c) providing the correct information for each audience subset, whether it's a single person (like "Welcome, &lt;a href="http://taxonomist.tripod.com"&gt;Seth&lt;/a&gt;!") or a large group (like English speakers); and (d) communicating everything that needs to be said. Content management is a huge issue in the distribution of information, one that got even more obvious with database use and the Internet.&lt;br /&gt;&lt;br /&gt;The magic question here is this: How many different Calculus I courses do we need? From a production standpoint, our goal is to have only one. In reality, however, there are many reason a student might prefer one version of this course over another: the instructor (charisma, ability to teach, ability to communicate, educational background, current interests, track record in past courses, reputation in the industry, etc.); the course materials (ability to relate to examples, quantity of independent and groups exercises, immediate relevance of exercises to students [e.g., local interest], ethical choices, strictness of prerequisite management, etc.); the course delivery (tools requirements, number of lectures, number of students, grading methods, student-to-teacher and student-to-student interactions, what percentage of the course is different from that in previous semesters); and the course environment (reputation of university, opportunity for real-time meetings [chat, voice, face-to-face], textbook requirements); and so on. As you can see, there are a lot of reasons one course might be "better" than another, for a particular student.&lt;br /&gt;&lt;br /&gt;And so you have a battle: competition for students vs. need to teach the basic materials. Colleges and universities are already doing this at a macro level -- compare &lt;a href="http://www.mit.edu/"&gt;MIT&lt;/a&gt; to &lt;a href="http://www.bhcc.edu/"&gt;Bunker Hill Community College&lt;/a&gt;, as in the film &lt;a href="http://www.imdb.com/title/tt0119217/"&gt;Good Will Hunting&lt;/a&gt; -- but the competition at the lower level is going to be very interesting.&lt;br /&gt;&lt;br /&gt;Further, if course development follows the path of computing, parts of courses are going to be delivered as if by subscription. Imagine some guy in his attic churning out math problems for fun. (Believe me, &lt;a href="http://calc101.com/"&gt;it's real&lt;/a&gt;.) Professors can subscribe to this guy in the same way you find those &lt;a href="http://www.sudoku.com/"&gt;Sudoku puzzles&lt;/a&gt;. In computer parlance this is called &lt;a href="http://en.wikipedia.org/wiki/Distributed_computing"&gt;distributed computing&lt;/a&gt;: where a bunch of computers are working together, but each is working on a separate problem. (It's sort of like wearing a wristwatch to tell the time, carrying an &lt;a href="http://www.apple.com/ipod/"&gt;iPod&lt;/a&gt; to listen to music, and attaching a &lt;a href="http://www.laserengravedkeychains.com/bevwrench.htm"&gt;bottle opener to your keychain&lt;/a&gt;, instead of investing in a WatchPod Opener.)&lt;br /&gt;&lt;br /&gt;Distributed education means the end of campus life as we know it! Professors moderate courses written by dozens of international specialists, students take courses moderated in other countries, and grades are applied to the diploma of your choice. Campuses become less about learning and more about community events, just as &lt;a href="http://www.publiclibraries.com/"&gt;public libraries&lt;/a&gt; and &lt;a href="http://www.mallofamerica.com/"&gt;shopping malls&lt;/a&gt; have been forced to evolve.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.congress.org/"&gt;Congress&lt;/a&gt; has made the right choice, but is this country truly ready for college management?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114125233156515740?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114125233156515740/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114125233156515740&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114125233156515740'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114125233156515740'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/03/college-content-management.html' title='College content management'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114089235143257003</id><published>2006-02-25T13:20:00.000-05:00</published><updated>2006-03-01T17:34:28.563-05:00</updated><title type='text'>Custom organization schemes</title><content type='html'>&lt;p&gt;In previous entries I've talked about exact and ambiguous schemes. Now I want to talk about "everything else that remains," the &lt;em&gt;custom organization scheme.&lt;/em&gt;&lt;/p&gt;&lt;p&gt;The three most common exact schemes -- alphabetical, chronological, and spatial -- aren't the only exact schemes in the world. There are also numeral (1, 12, 166, 2, 22, 266, 3, ...) and numerical/counting (1, 2, 3, 12, 22, 32, 166, 266, ...) schemes, and there are schemes where the sort style is based on some other intrinsic characteristic or standards, such as the periodic table of elements (numerical order by proton count) and the colors of the rainbow (numerical order by wavelength values). Although it's challenging to think up exact schemes that aren't based on numbers, characters, time, or space, it is incredibly easy to think up &lt;em&gt;subjective (ambiguous) &lt;/em&gt;ways of organizing information. Below are several suggestions. Although some of these possibilities seem quite similar to numerical order, number order won't ever change with time, although these can.&lt;br /&gt;&lt;br /&gt;by frequency of use&lt;br /&gt;by importance&lt;br /&gt;by logical complexity&lt;br /&gt;by sensory intensity (e.g., brightness)&lt;br /&gt;by mood&lt;br /&gt;by personal interest&lt;br /&gt;by profitability&lt;br /&gt;by likelihood to inspire controversy&lt;br /&gt;by necessity to avoid a lawsuit&lt;br /&gt;by the order in which you thought it up&lt;/p&gt;&lt;p&gt;Further, the subjectivity for each of these goes further, because you can choose your audience for these sorting schemes:&lt;br /&gt;&lt;br /&gt;by importance to the author&lt;br /&gt;by importance to the common user&lt;br /&gt;by importance to the experts&lt;/p&gt;&lt;p&gt;... and so on.&lt;/p&gt;&lt;p&gt;It's easy to get confused between complicated exact ordering schemes -- like the periodical table, which is in order of proton count -- and custom ambiguous schemes. In both cases, the sorting scheme may not be obvious, at which point it's easy to assume it's obtusely subjective or completely arbitrary. For example, the following are three ways to organize the English alphabet:&lt;br /&gt;&lt;br /&gt;a b c d e f ...&lt;br /&gt;q w e r t y ...&lt;br /&gt;e t a o i n ...&lt;br /&gt;&lt;br /&gt;The first is alphabetical order. The second is spatial order (letters on the standard keyword, affectionately known as the QWERTY keyboard). The third is in order of use in the American English language (the letter E is the most commonly used letter in the lexicon). I would argue that for all practical purposes, these are all exact schemes. There is no subjectivity here. To sort the alphabet in a subjective way, I'd suggest by ease of pronunciation. :-)&lt;/p&gt;&lt;p&gt;My point is that in the end, most custom schemes are exact and not ambiguous. Instead, they are &lt;em&gt;translations &lt;/em&gt;of intrinsic orders into something exact (again, like the periodic table being in numeric order by proton count or atomic weight). Alternately, they are exact or ambiguous schemes that are time-dependent, such that the scheme itself varies over time or application.&lt;/p&gt;&lt;p&gt;In summary, you should consider using a custom scheme when (a) you have an internal order that you want to obey, or (b) you have an order that changes over time. In both cases, however, it's important that your users understand that your schemes are not necessarily obvious. The periodical table requires training to understand and use; meanwhile, because the search results at Google.com tend to vary over time, some people are disturbed by getting different results for the same search they used yesterday. If you can't use an intuitively obvious scheme (like alphabetizing) or a subconsciously obvious scheme (like task or topic order), it's probably a good idea to find a way to impose a numeric scheme on top of your results. The periodic table has a key, and search engines often have a "relevance percentage."&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114089235143257003?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114089235143257003/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114089235143257003&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114089235143257003'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114089235143257003'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/02/custom-organization-schemes.html' title='Custom organization schemes'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114072698417326401</id><published>2006-02-23T15:33:00.000-05:00</published><updated>2006-02-23T15:36:24.190-05:00</updated><title type='text'>Toronto conference</title><content type='html'>The international indexing conference is scheduled for 15-17 June 2006, in Toronto, Ontario, Canada. It's my job, as president-elect of the &lt;a href="http://www.asindexing.org"&gt;American Society of Indexers&lt;/a&gt;, to plan this event. Boy, planning a conference can be a challenging thing sometimes.&lt;br /&gt;&lt;br /&gt;Finally, though, we have a schedule in place. The ASI website will have that information soon, but in the meantime you can join the &lt;a href="http://groups.yahoo.com/groups/asiconference"&gt;asiconference mailing list&lt;/a&gt; for current and archived information.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114072698417326401?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114072698417326401/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114072698417326401&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114072698417326401'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114072698417326401'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/02/toronto-conference.html' title='Toronto conference'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114047306303078875</id><published>2006-02-20T16:42:00.000-05:00</published><updated>2006-02-20T17:04:24.063-05:00</updated><title type='text'>Even bad entries can be good, in context</title><content type='html'>In technical documentation, often I've seen main entries like "creating user accounts, 325" and "managing settings," where the first word is a gerund that has questionable value. When I'm asked why this isn't a valuable entry, I explain that the ideas of "creating" and "managing" are rather vague. I also explain that if someone wants to know how to create something, he is more likely to look up the &lt;em&gt;something.&lt;/em&gt; For example, if you were interested in how to bake a cake, would you look up "baking" or just "cakes"? I believe you'd look up "cakes" first.&lt;br /&gt;&lt;br /&gt;Strictly speaking, however, there is nothing wrong with entries like "creating accounts" and "baking cakes," even though &lt;em&gt;accounts &lt;/em&gt;and &lt;em&gt;cakes &lt;/em&gt;are the more specific and more likely targets of most users. Entries under these generic or vague terms (creating, managing, etc.) become valuable, however, when these terms have greater meaning within their context. In religious texts, the concept of &lt;em&gt;creation&lt;/em&gt; (as in "world creation," often written with a capital C) is worth indexing. In business textbook, the concepts of &lt;em&gt;management&lt;/em&gt; is worth indexing. And in an instructional cookbook that explains how ovens are used in the most general sense, an entry for "baking" might be appropriate.&lt;br /&gt;&lt;br /&gt;The guideline to avoid these general terms, declaring the resulting entries as "bad," ignores context. The argument that readers are more likely to look up the objects of these actions -- "messages, writing" instead of "writing messages" -- doesn't negate the value of having these gerunds as available access points. Actions are still concepts and deserve to be indexed; in technical documentation, task-oriented language has even greater value than average. Consider these entries:&lt;br /&gt;&lt;br /&gt;filtering email messages, 000&lt;br /&gt;spell-checking your document, 000&lt;br /&gt;upgrading applications, 000&lt;br /&gt;&lt;br /&gt;The reason these work so well is because the ideas of &lt;em&gt;filtering, spell-checking, &lt;/em&gt;and &lt;em&gt;upgrading &lt;/em&gt;are distinct ideas of importance to technology users. In fact, the above term &lt;em&gt;document &lt;/em&gt;is itself a bit vague (whereas &lt;em&gt;file &lt;/em&gt;is not), and beginners know &lt;em&gt;applications&lt;/em&gt; better as &lt;em&gt;programs. &lt;/em&gt;So you can see just how valuable using those gerunds can be. Dismiss them at your peril.&lt;br /&gt;&lt;br /&gt;By the way, if you need a practical clue regarding their use, take a look at the full set of documentation and ask yourself if you can use that gerund much more often than you already are. For example, you might have a section where "creating accounts" is obvious, but does that same book talk about the creation of other things, like passwords, files, security filters, network connections, and so on? Just because the word "creation" isn't used (passwords are &lt;em&gt;invented, &lt;/em&gt;filters are &lt;em&gt;applied,&lt;/em&gt; networks are &lt;em&gt;initiated, &lt;/em&gt;etc.) doesn't mean it's wrong. If after some serious thought you realize that you'd have dozens or more subentries under that gerund, consider getting rid of it as not specific enough. Other candidates for nonspecific gerunds in technical documentation include &lt;em&gt;installing, configuring, customizing, opening/closing, hiding/showing, deleting, starting/stopping/quitting, editing/modifying/altering, &lt;/em&gt;and &lt;em&gt;accessing.&lt;/em&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114047306303078875?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114047306303078875/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114047306303078875&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114047306303078875'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114047306303078875'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/02/even-bad-entries-can-be-good-in.html' title='Even bad entries can be good, in context'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-114004548849374797</id><published>2006-02-15T17:50:00.000-05:00</published><updated>2006-02-19T18:29:39.523-05:00</updated><title type='text'>You have sixty days until Tax Day (U.S.)</title><content type='html'>I have two months to finish preparing my taxes. I know a lot of people out there who much prefer working with tax accountants and advisors, but I much prefer doing the work myself. All year long I throw anything related to my financial condition in a big red crate. Every year, usually in mid-February, I upend that crate and start putting everything into piles. After three days of labor, those piles of papers have been sorted, organized, reviewed, mined for valuable data, calculated, recalculated, tabulated, recorded, and sent in one big envelope each to the &lt;a href="http://www.irs.gov"&gt;Internal Revenue Service&lt;/a&gt; and the &lt;a href="http://www.mass.gov/portal/index.jsp"&gt;Commonwealth of Massachusetts&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Three days, yeah. You probably think I'm &lt;a href="http://cagle.msnbc.com/news/Taxes/Taxes%202004/bennett.jpg"&gt;crazy&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Here's why I go through that, without an accountant. First of all, the three days that I spend sorting and tracking has to be done--by someone at some time--before &lt;a href="http://cagle.msnbc.com/news/Taxes/"&gt;Tax Day&lt;/a&gt;. I could do a little bit of that work every day using software like &lt;a href="http://www.microsoft.com/money/default.mspx"&gt;Microsoft Money&lt;/a&gt; or a spreadsheet, or even using a pencil and a ledger. Alternatively I could create folders for myself and sort the paperwork as it arrives, exchanging that one big red crate for two dozen hanging file folders. And if I wanted, I could instead bring that big crate to the nice accountants and say, "Add this for me," and not worry about it all. But because it has to be done, and because I'm much more qualified to know when a certain phone call is a business expense, when a certain receipt is a medical expense, or when a piece of paper fell into the crate by mistake, the person who should be doing most of that adding is, of course, me. And since it really is just addition, why should I pay an accountant when I am fully capable of using a calculator on my own?&lt;br /&gt;&lt;br /&gt;Second, the one arena where the ability of accountants completely overshadows my own is in their understanding of tax law, and how those laws apply to my earnings, debts, and expenses. But I don't like not knowing about the laws that affect me. With the same curiosity I feel about &lt;a href="http://www.consumer.gov/idtheft/"&gt;who in the world might have my social security number&lt;/a&gt;, I want to know how much of money is going to fund my government. Going through the distress of all that mathematics pays off, because now I understand my how &lt;a href="http://www.ssa.gov/"&gt;Social Security&lt;/a&gt; payments are calculated, where the &lt;a href="http://72.14.207.104/search?q=cache:GQ4jcDR5sfIJ:www.irs.gov/pub/irs-pdf/p15.pdf+tax+tables&amp;hl=en&amp;amp;gl=us&amp;ct=clnk&amp;amp;cd=1"&gt;tax tables&lt;/a&gt; were derived, and how much of my money was actually spent on medical expenses, mortgage payments, and investment fees.&lt;br /&gt;&lt;br /&gt;And there's a corollary to all this, and it's a consequence of waiting until mid-February to look at this paperwork: Doing my taxes is when I figure out exactly how much I make! Sure I &lt;a href="http://www.planabudget.com/check_balancing.htm"&gt;watch my checkbook balance&lt;/a&gt; go up and down, but I don't waste my time looking at those numbers on a daily, weekly, or even monthly basis. Once a year is good enough for me. February is therefore a big day of numerical realization. For example, this year I discovered that 40% of my gross indexing income came from a single client. Other discoveries include how much less interest I'm paying on my mortgage from last year; how much money I invested on the house (which is important because I work at home and can deduct some of it as an expense); how much money I spent on &lt;a href="http://www.usps.gov/"&gt;postage&lt;/a&gt; and &lt;a href="http://www.fedex.com"&gt;shipping&lt;/a&gt;; how often I used my car for business; and all the other little things like phone bills, photocopying, client and colleague lunches, medical expenses, bank interest, retirement investment, and so on. If ever I needed a reality check, this is it, and it's a lesson on a global scale. It's one thing to see how much money I spent on business travel, but quite another to see that number next to how much money I spent on business advertising.&lt;br /&gt;&lt;br /&gt;Not only do I learn about the tax-related information, however, but I really learn about everything related to money. Once a year I thoroughly read my credit card statements, line by line. Under the pretense of looking for those $4.95/mo payments for website hosting at &lt;a href="http://www.tripod.com"&gt;tripod.com&lt;/a&gt;, I have the delicious opportunity to reminisce about each year's events, like the birth of my daughter, that vacation with my wife, the day I bought &lt;a href="http://www.tivo.com/"&gt;TiVo&lt;/a&gt;, the huge party catered by &lt;a href="http://www.blueribbonbbq.com/default.htm"&gt;Blue Ribbon Barbecue&lt;/a&gt;, etc.&lt;br /&gt;&lt;br /&gt;And finally, if ever there's a reason to say "Yes, I do my own taxes," it's to impress everyone! It takes them a moment to cough and gasp with incredulity, and to tell me that I'm insane not to have an accountant because they and everyone they know has an accountant, but then the light dawns. "Wow," they think, "this guy must really ____."&lt;br /&gt;&lt;br /&gt;a) be smart?&lt;br /&gt;b) have a lot of patience?&lt;br /&gt;c) enjoy building character?&lt;br /&gt;&lt;br /&gt;Fill in your own blanks. That's what I do.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-114004548849374797?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/114004548849374797/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=114004548849374797&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114004548849374797'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/114004548849374797'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/02/you-have-sixty-days-until-tax-day-us.html' title='You have sixty days until Tax Day (U.S.)'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-113969273955456862</id><published>2006-02-11T15:54:00.000-05:00</published><updated>2006-02-12T16:53:53.426-05:00</updated><title type='text'>Organizing ambiguously</title><content type='html'>In a continuation of my post from 8 Feb, I want to say a few words about ambiguous or subjective sorting. First of all, ambiguous sorting is &lt;em&gt;cool. &lt;/em&gt;Unlike the exact methods I wrote about, ambiguous sorting schemes actually pay attention to the meaning of what's being sorted.&lt;br /&gt;&lt;br /&gt;Topical schemes sort categories based on what they're about, like the organization of a textbook (simple to complicated). Task-oriented schemes are organized in "doing" order, such as the steps one takes to make a sandwich. Audience-oriented schemes separate items according to who wants them, like the split between members and nonmembers, or the MPAA categories for movie age appropriateness (e.g., PG-13).&lt;br /&gt;&lt;br /&gt;Consider how you might organize the steps involved of visiting a website. In task order they would appear as (1) turn on computer, (2) open the browser, (3) type in the website address, (4) read the page, (5) close the browser, and (6) turn off the computer. In topic order, the "turn on" and "turn off" items would be combined, since people pair these together; same with "open the browser" and "close the browser." In alphabetical order, on the other hand, the browser is closed before it's opened, and the computer is turned off before it's turned on. Using alphabetical order is pretty stupid here would be pretty stupid, eh? (Did you ever wonder notice that the options under the File menu aren't alphabetized?)&lt;br /&gt;&lt;br /&gt;The problem with ambiguous systems, of course, is that people don't necessary categorize things in the same way. In fact, this is why people can't find things in other people's kitchens! (See my post from 7 Feb.) We don't put measuring spoons, soup spoons, and serving ladles in the same drawer, even though they're all spoons. Instead, we interpret categories according to how &lt;em&gt;we&lt;/em&gt; perceive these connections, subjectively.&lt;br /&gt;&lt;br /&gt;Think about how restaurant menus are organized. First they are organized in task order: appetizers in the front, entrees in the middle, and desserts near the back. Then they might be organized by ingredient (e.g., all the pasta dishes appear together), although it's unclear in what order these ingredients are listed. And within these categories, what's the order? Maybe it's profitability; maybe it's to show off the chef's skills or the breadth of available selections; maybe it's to put the most popular or intruiguing items at the top.&lt;br /&gt;&lt;br /&gt;By the way, in my opinion audience-oriented categorization is the most powerful and useful of all sorting techniques, and yet it's woefully underutilized. Organizing items by popularity is simply a variation on this idea.&lt;br /&gt;&lt;br /&gt;As soon as you start really looking at how people use information, you will run away from exact schemes almost immediately, as much as possible. (Long flat lists, like the entries in an index, still demand some sort of umbrella sorting. Short lists, like the options in a computer menu, are fair game. Have you ever noticed that the choices under the File menu are generally in task order?)&lt;br /&gt;&lt;br /&gt;Nothing in life is exact, so why do we force it?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-113969273955456862?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/113969273955456862/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=113969273955456862&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/113969273955456862'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/113969273955456862'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/02/organizing-ambiguously.html' title='Organizing ambiguously'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-113953227990805702</id><published>2006-02-09T19:41:00.000-05:00</published><updated>2006-02-11T15:53:46.796-05:00</updated><title type='text'>International indexing conference in Toronto, Ontario (15-17 June 2006)</title><content type='html'>If you're just tuning in, you may not realize that the international indexing conference, co-sponsored by the &lt;a href="http://www.asindexing.org"&gt;American Society of Indexers&lt;/a&gt; and the &lt;a href="http://www.indexingsociety.ca"&gt;Indexing and Abstracting Society of Canada&lt;/a&gt; is coming up fast! Conference information is available at both websites.&lt;br /&gt;&lt;br /&gt;I'm the American in charge of the conference (and president-elect of ASI), and my Canadian counterpart is Ruth Pincoe. If you have any questions about the conference, please write &lt;a href="mailto:conference@asindexing.org"&gt;conference@asindexing.org&lt;/a&gt; and I'll be happy to provide an answer. A preliminary schedule of events is about to be released on the asiconference mailing list. (To subscribe, send a blank mail to &lt;a href="mailto:asiconference-subscribe@yahoogroups.com"&gt;asiconference-subscribe@yahoogroups.com&lt;/a&gt;.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-113953227990805702?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/113953227990805702/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=113953227990805702&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/113953227990805702'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/113953227990805702'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/02/international-indexing-conference-in.html' title='International indexing conference in Toronto, Ontario (15-17 June 2006)'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-113944296507445343</id><published>2006-02-08T18:36:00.000-05:00</published><updated>2006-02-11T15:53:23.773-05:00</updated><title type='text'>Organizing as exactly as possible</title><content type='html'>It should come as no surprise that information can be organized in many, many ways. According to &lt;a href="http://www.amazon.com/gp/product/0596000359/qid=1139441800/sr=1-1/ref=sr_1_1/002-5416310-4424857?s=books&amp;v=glance&amp;amp;n=283155"&gt;Rosenfeld &amp; Morville&lt;/a&gt;, there are two major types sorting schemes: &lt;strong&gt;exact &lt;/strong&gt;(objective) and &lt;strong&gt;ambiguous &lt;/strong&gt;(subjective). Personally, I like to imagine a third category, &lt;strong&gt;custom, &lt;/strong&gt;that overlaps both and therefore deserves special treatment.&lt;br /&gt;&lt;br /&gt;Exact sorting schemes are things like alphabetically ordered, numerically ordered, chronologically order, and spatially ordered. These schemes require that you follow an accepted and hopefully well-known sequence, such as from A to Z, from 1 to 9, or from top to bottom. Exact schemes do allow you to reverse order -- my blog, for example, sorts each entry in reverse chronological order, with the newest entries at the top -- but they don't allow you to start mixing things up. In an alphabetically ordered list of words, words starting with C will always appear between the B-words and the D-words.&lt;br /&gt;&lt;br /&gt;There are three major problems with exact schemes:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;They assume that your audience knows the scheme. In most situations this isn't hard, but it's a big assumption. There are many native English speakers who still have to sing "The Alphabet Song" when trying to remember whether J comes before K. For spatially ordered data, the audience must know the associated spatial map; in the United States, there aren't many people who can tell you exactly where each state is. (People in New England, where I am, tend to have problems with the block-shaped states in the middle of the country; people outside of New England can't remember which tiny state is New Hampshire, and which tiny state is Vermont.)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Your scheme is limited to its own items. Alphabetical order doesn't govern where numbers go. Some people sort numbers as if they were spelled out, putting "16 Candles" under S for sixteen. Other people will put all the numbers before the A-words, or after the Z-words. Punctuation also doesn't sort easily; which comes first, &lt;em&gt;it's&lt;/em&gt; or &lt;em&gt;its&lt;/em&gt;? And then there's the question of where spaces go (look up word-by-word sorting and letter-by-letter sorting) or if capitalization makes a difference (&lt;em&gt;windows&lt;/em&gt; vs. &lt;em&gt;Windows&lt;/em&gt;). I've given you all alphabetical examples, but the same is true in all other exact schemes. On a timeline, for example, how do handle repeating items? On a calendar of days, how do you record something that lasts two weeks, or breaks for lunch hour? Do you record time ranges (like hour-long appointments) as different from single moments (like 5:35pm)? In spatial ordering, how do you identify things that constantly move, or places that can be identified in multiple ways?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Exact schemes have no meaning. It's true that A-words appear before B-words in the alphabet, but is that because B isn't a lesser letter? In a week that runs from Sunday to Saturday, why must an important Thursday event be buried in the middle? Do the units that make something sort a certain way -- the letters in a word's spelling -- have any meaning whatsoever? No, they don't. In fact, if you're going to sort something exactly, you have to completely ignore what it means and chop it into little meaningless or random components. If you have a friend whose name starts with Z, your friend knows what it's like to be treated as inferior for no valuable reason whatsoever.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Of course, these three reasons aren't enough to toss exact schemes into the rubbish bin completely. Exact schemes are &lt;em&gt;easy.&lt;/em&gt; You can get a computer to sort things almost instantly, and most audiences have no trouble using them despite their shortcomings. However, my last point -- that they're meaningless -- is why there are so many better options.&lt;/p&gt;&lt;p&gt;I'll talk about ambiguous and custom schemes in my next posting.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-113944296507445343?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/113944296507445343/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=113944296507445343&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/113944296507445343'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/113944296507445343'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/02/organizing-as-exactly-as-possible.html' title='Organizing as exactly as possible'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-113937403709296280</id><published>2006-02-07T23:16:00.000-05:00</published><updated>2006-02-11T15:52:22.476-05:00</updated><title type='text'>Organizing the kitchen</title><content type='html'>I've always been fascinated by how challenging it can be to make analogies between indexing and the "real world," when in fact we organize and retrieve things all the time. So I'm always looking at the kitchen as my model of information organization.&lt;br /&gt;&lt;br /&gt;First of all, why is it so hard to find things in other people's kitchens? Doesn't everybody keep the trash can under the sink? Isn't cutlery always in a waist-high drawer near the sink? Don't people keep their drinking glasses and coffee mugs in the same cupboard? Apparently not.&lt;br /&gt;&lt;br /&gt;We organize our kitchens for ourselves. If we are living alone, we only need to put things where we want them to be. If we are living with others, we do our best to compromise with our home-mates and protect our children. The things we rarely use go way up high; the things we don't want our kids to get are up high or behind a lockable door. Everything else goes where it fits, where we can reach, and next to the areas where we're most likely to use them. So for some people, coffee mugs and water glasses are stored together because they fit neatly beside each other (unlike glasses and bowls). For other people, the coffee mugs are stored closer to the coffee maker, in the same cabinet as the sugar bowl and the coffee filters. Both of these choices involve organizing by function -- the function of drinking, the function of enjoying coffee -- but the results are personal. The kitchen is &lt;em&gt;ours.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Indexing involves turning your kitchen into a place that other people can use just as easily. This means you have to organize your kitchen in such a way that people don't have to ask you where the spoons are, but instead could just walk in and find exactly the spoon or other object they need. Your personal guidance should become unnecessary, because the kitchen is intuitively and universally organized. No one will ever open the wrong drawer or door or canister again.&lt;br /&gt;&lt;br /&gt;Yeah, right.&lt;br /&gt;&lt;br /&gt;Basically, you have four choices. The first choice is to label everything. Every drawer, every cabinet, every appliance, and every countertop object should have a little piece of paper attached to it. The cultery drawer might be labeled CUTLERY. The refrigerator might be labeled COLD FOOD. But this is not as easy as it sounds. What, other than cutlery, is in your cutlery drawer? A can opener? Twist ties? Napkin rings? Meanwhile, your refrigerator may contain cold food, but what kinds of food are kept cold? Are your apples in there, or are they in a bowl? Do you use fresh milk, or do you buy your milk in those boxes? You see, labeling is only as good as your labels. Don't you dare create a label for SPOONS, because you have teaspoons, dessert spoons, wooden spoons, slotted spoons, sugar spoons, serving spoons, antique decorative spoons, plastic spoons, and sporks in your kitchen.&lt;br /&gt;&lt;br /&gt;Clearly the problem is that your kitchen isn't perfectly organized. Why aren't all your spoons in one place? So pull everything out and lay it down on a freshly washed floor, and reorganize it. Put all of your spoons in one place. Everything you might call a plate or a platter goes together. Everything you eat goes against one wall, and everything you don't eat goes against the other walls. And finally, your labels make sense. Of course, you've sacrificed your kitchen for the sake of everyone else, but wasn't that the point? No! This is the problem with the Dewey Decimal System in some public libraries: nobody knows how to find anything except the librarians! But I'll tell you, if you want to learn about a topic, you might just discover that everything on that topic is the same exact place.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Almost &lt;/em&gt;guarantee. Go the library with an interest in World War II, and you'll find yourself in the history section to read about history, the romance section to read historical fiction, the fiction area to find some spy thrillers, the newspaper archive to read old news articles, the magazine section to read current articles, the science area to read about radar, the aeronautics section to read about the airplanes, the humor section to read those funny WWII joke books, and so on. The same is true with our kitchen, where the same knife can be used to cut food, spread jam, open envelopes, and even unclog the drain. Your kitchen objects, like words in the English language, are used in many different ways; categorizing them becomes rather subjective. So when that guest comes in looking for fruit, will he find it in the refrigerator, in a bowl, in a box or can, or in the compost bin? Yes, yes, yes, and yes. Wow, I guess we need a FRUIT category.&lt;br /&gt;&lt;br /&gt;The third approach, then, is to put everything everywhere! Put a teaspoon in every drawer, on every horizontal surface, in and next to every appliance, in each cabinet, and on every shelf. Now, when someone goes looking for a spoon, it doesn't matter where he &lt;em&gt;thinks&lt;/em&gt; the spoon is, because he's right! There's a spoon on top of the microwave, in the Crisper drawer of the refrigerator, and in the sink. Of course, not only are spoons everywhere, but so are everything else: can openers, slices of bread, blenders! One of everything, everywhere! (Of course, to be truly practical, you'd need more than one at every location, since sometimes people need more than one spoon at a time. :-) By the way, this is how people use search engines, like Google. We create a web page, and then we attach as many keywords as possible. We want to make sure that &lt;em&gt;everyone&lt;/em&gt; will find our stuff, no matter where they're looking. In fact, some people want their content discovered even when people &lt;em&gt;aren't&lt;/em&gt; looking -- stumbling over spoons everywhere.&lt;br /&gt;&lt;br /&gt;The final approach is some combination of all of these things: decent labels, better organization, and as much redundancy as the cabinets can stand. It won't be perfect for everyone all the time, but very few people are going to have to open more than one or two drawers until they find what they want, even if what they want is a tiny whisk or an egg timer. Everything is categorized, labeled, and multiply placed.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;That's &lt;/em&gt;indexing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-113937403709296280?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/113937403709296280/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=113937403709296280&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/113937403709296280'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/113937403709296280'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/02/organizing-kitchen.html' title='Organizing the kitchen'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22121434.post-113937169098409583</id><published>2006-02-07T23:05:00.000-05:00</published><updated>2006-02-07T23:08:10.990-05:00</updated><title type='text'>Introducing...</title><content type='html'>I like to think the world needs yet another forum to talk about indexing, but perhaps it's just me. These days, my absolute favorite part of being an indexer, an information architect, a trainer and educator, and an "information guru" (not my words!) is simply talking about the possibilities that come with indexing.&lt;br /&gt;&lt;br /&gt;Let's explore together.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22121434-113937169098409583?l=maislin.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://maislin.blogspot.com/feeds/113937169098409583/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22121434&amp;postID=113937169098409583&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/113937169098409583'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22121434/posts/default/113937169098409583'/><link rel='alternate' type='text/html' href='http://maislin.blogspot.com/2006/02/introducing.html' title='Introducing...'/><author><name>taxonomist</name><uri>http://www.blogger.com/profile/11832913832836400039</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://taxonomist.tripod.com/graphics/seth_photo.jpg'/></author><thr:total>0</thr:total></entry></feed>
