20 March 2006
Frustrated by a lack of meaning
As reported by The New York Times today ("Amazon Says Technology, Not Ideology, Skewed Results," March 20, 2006), an abortion-rights organization discovered and reported the appearance of biased results in its search engine. Apparently books with anti-abortion leanings appeared as more relevant on Amazon's search results pages. I am not taking sides on this highly charged issue; I am taking offense at the ignorance demonstrated by people who don't seem to understand how search works. (And I'm not singling out this issue either, as you'll see from my later examples.)
See, there isn't a search engine on the planet that can cull actual meaning from its databases. They can only look at the words themselves. Even search engines that analyze the behavior of their users still look at words and numbers, without interpretation.
Let me explain what really happened with Amazon, and why Amazon is not automatically in the wrong. Someone went to the search engine and typed in the word abortion. Now imagine that you're the search engine, and you have two results to give back. Result one is a book whose title is simply Abortion. The second is a book whose title is Understanding Abortion. Tell me: which result is more relevant? Answer: you have no clue.
When faced with this impossible question, the search engines at Amazon and elsewhere attempt to apply certain generalizations that might work in other situations, but simply don't work here. For example, there might exist a rule that puts Abortion ahead of Understanding Abortion because the title of the first book matches the query exactly, whereas the second title is only "half right." Or perhaps one of the books is 500 pages long, but the other is 200 pages long, and Amazon favors longer books. Maybe Amazon is interested in selling you the more expensive book, the book more recently published, or the book that gets a higher rating from all the people visiting the website. In the end, however, all of this analysis fails -- completely and utterly fails -- to answer a very simple question: which of these books is against abortion? Heck, even I don't know, and I invented them!
With search, meaning is irrelevant. Search engines can look only quantitatively at the letters of the words, and at innumerable statistics (e.g., number of Web views) that have at best a tangential relationship with meaning.
Before we look at another example, let me also talk about another thing that Amazon did at one time. If you searched for the word abortion, in addition to your results you received what should be interpreted as a helpful search hint: "Did you mean adoption?" This might sound political, but the logic of this lies in the similar spelling of the words adoption and abortion. Given that there are many more books about adoption than abortion at Amazon, the search engine guessed that someone typing the word abortion might have misspelled something; the computer offered what it considered a reasonable alternative. Had that suggested word been something different -- "Did you mean apportion?" -- no one would have cared.
By the way, I will admit that it is always possible that a company, like Amazon, could consciously manipulate its search results to accomplish some kind of selfish ends. Lycos puts sponsored links at the top; Yahoo promotes its internal products over those of others; Amazon presents the products of its more lucrative partners over all others. It is not far-fetched to imagine a company exercising editorial control for political or religious purposes, especially in today's age. The problem is that some issues are perceived as so volatile that no one is willing to consider coincidence of language as just that, a coincidence. Language is powerful stuff; spelling women as womyn to avoid the "men" letter subset is a powerful choice, whether you agree or not.
Here's another story, from the late 1980s. A search for the word monkey within a database of clip art provided by Microsoft produced a seemingly offensive result: a picture of African-American children. There was an uproar, and although Microsoft denied that it had done anything intentionally racist, it quickly removed the image from the database. The real problem, however, is that the children in the image were playing on monkey bars. Interestingly, if you stop to think about it, the only racism in this example is caused by the person who performed the search! That's the person who actually connected the word monkey with the children (and not the playground equipment); no one at Microsoft did. In this example, the giant void where meaning should have been was automatically filled in by the searcher, by association and as a reflex.
Here's another story, from last year. An article (I can't remember where) expressed how a bad critical review of a specific performer appeared more relevant in a Google search that the good reviews -- of the performer's own website. This would be equivalent to searching for me ("Seth Maislin") and getting a top result of "Seth Maislin Has Bad Teeth" instead of this blog, my website, or one of my interview at O'Reilly & Associates. In this case, Google isn't passing judgment, but it certainly feels like it! Instead, it's looking at how popular that Bad Teeth article might be, or its host (for example, it might be a Wall Street Journal or People Magazine article, periodicals that have readerships thousands of times larger than anything I've ever done), and using that popularity to push the article to the top. It's assuming -- wrongly, in this case -- that people looking for me are less interested in my website than in what Teen People or the WSJ has to say.
Search just doesn't care. If you're looking for meaning, don't ask a search engine.
Whether or not the results are simply the result of how the search engine works, and not caused by an actual political agenda, there's still a perception problem. Remember how a while back someone discovered that people searching for Martin Luther King's work on Amazon were also told that they might like the movie "Planet of the Apes." Now, it's quite possible that a lot of customers who looked for the one actually were purchasing the other...but it still looks bad for Amazon.
The post hoc ergo propter hoc fallacy is how two events that happen in order appear to be correlated, like how washing your car might appears to make it rain. But this is the same kind of thinking that gets people thrown in jail because they look guilty. (If you're giving a ride to someone who has marijuana in his pocket, there's marijuana in your car and you're breaking the law. This is the same logic behind the open container laws.) Applied to language, this is equivalent to someone's assumption that the word "artifice" has something to do with face painting.
I'm not saying that image isn't important, as certainly it is to large companies like Amazon. But there is enough hostile interpretation in the world already that we don't need search engine ignorance to cause more.