24 December 2006


A needle in a haystack with 100,000,000 blades

The Internet has more than 100 million websites, according to the November Netcraft survey. If you were standing on top of the growth curve, by now your stomach would have nothing left to vomit up.

I did some math, and I've figured out a way to make sure that all of these websites are indexed. Here's what I discovered.
So there you go: a team of 50 people can index the Internet. That doesn't sound nearly as bad as I thought. Of course, everyone will have to type rather quickly, and we'll need a system in place to prevent us from accidentally indexing any one website more than once, but that shouldn't be too bad. And yes, I'm assuming that all of these websites are in English, but most of them are; I'll bring a few translators to work on the few remaining.

At U.S.$50,000 per year per indexer, which is quite modest for a highly intense round-the-clock job like this, plus $100,000 for me as manager, I could probably put together a bid of about $350,000/year to get the job done. Given how many billions of dollars are spent or exchanged over the Internet today, that seems quite reasonable, too. Heck, I should triple the whole thing, since we'd have to re-index the old sites every once in a while. Maybe I should double it again, too, so we'd be allowed to use eight keywords instead of four.

So let's see, that brings the total bill to to $2.1 million. Gosh, that isn't bad at all, is it? I mean, we all agree that indexing the Internet is at least a two-million-dollar-per-year business, right?

Except it's not. Indexing the Internet is a zero-dollar-per-year business. No one is doing it. Just about no one seems to care about quality keywords. In fact, there are only two industries that exist around keyword creation. One of them is misnamed "search optimization," which is about spamming the heck out of the Web. Optimize, I think not: this is the opposite of the intelligent product my team would be build. The other business is the search business itself, companies springing up around those fancy algorithms that Google, Yahoo, Lycos, Ask Jeeves, and the rest use. The thing is, those algorithms are just word-matching machines. These engines are looking for keywords, but none of them is actually writing any. So you see, no one with indexing training is writing any keywords. The inexpensive market for human indexers is being completely overlooked.

Guess it's not worth the two million.

