Why open-search?
The direct motivation for this project is the increasingly worrying censorship and manipulation by major multinational search engine companies. "Consider how often you rely not just on search engines to find information but also on blogs, online newspapers, and other intermediaries that point you in the direction of useful information. It is one thing for government to crack down openly on forbidden information. But it can be harder to notice that information has become more difficult to find. It is hard, in other words, to know what you don't know".(
1) Also, because all major search engines have a central database they can log all queries and trace them back to a particular user, yes you!
"I worry about my child and the Internet all the time, even though she's too young to have logged on yet. Here's what I worry about. I worry that 10 or 15 years from now, she will come to me and say 'Daddy, where were you when they took freedom of the press away from the Internet?'" --Mike Godwin, Electronic Frontier Foundation
Some Illustrative Case Examples of Censorship, Manipulation, and Privacy Concerns
Censorship
- Yahoo:
- Yahoo!, along with Google China, Microsoft, Cisco, AOL, Skype, and others, has cooperated with the Chinese government in implementing a system of Internet censorship in mainland China.
- In April 2005, Shi Tao, a journalist working for a Chinese newspaper, was sentenced to 10 years in prison by the Changsha Intermediate People's Court of Hunan Province, China (First trial case no 29), for "providing state secrets to foreign entities". The "secret", as Shi Tao's family claimed, refers to a brief list of censorship orders he sent from a Yahoo! Mail account to the Asia Democracy Forum before the anniversary of Tiananmen Square Incident.
- Google:
- On October 22, 2002, a study reported that approximately 113 internet sites had been removed from the German and French versions of Google. There is no direct way to check whether a search has been affected in this way.
- Controversy has occurred over Google's decision to adhere to the Internet censorship policy in mainland China, colloquially known as, "The Great Firewall of China". Google.cn search results are filtered so as not to bring up any results concerning the Tiananmen Square protests of 1989, sites supporting the independence movements of Tibet and Taiwan or the Falun Gong movement, and other information perceived to be harmful to the People's Republic of China.
- In 2002 Google was found to have censored websites that provided critical information about Scientology, in compliance with the United States' DMCA legislation. See this New York Times article for more info: A copyright dispute with the Church of Scientology is forcing Google to do some creative linking April 22, 2002
- In early 2006 Google removed several news sites from its news search engine because complaints were received about various articles that were critical of Islam
- Google DMCA Takedowns: A three-month view June 2, 2005
- Abstract: Google receives more than 30 copyright-based takedown demands each month invoking the Digital Millennium Copyright Act. A review of three months of notices shows they cluster in a few big categories: C&Ds from companies and individuals demanding removal of competitors’ sites; C&Ds demanding removal of “cracks” or material copied wholesale; and C&Ds demanding removal of criticism.
- Google censorship FAQ
- Baidu
- In compliance with the policies of Internet censorship in China, the Chinese language version of Baidu filters controversial material from its search results. Ironically, this does not apply to Baidu Japan, which drew over 60% of its traffic from within China before subsequently being blocked on the Mainland.
- Wikipedia entry on Baidu Censorship
Manipulation
- Yahoo:
- In March 2004, Yahoo! launched a paid inclusion program whereby commercial websites are guaranteed listings on the Yahoo search engine after payment.
- Google:
- In September 2007 it was found that 911truth.org suddenly disappeared from Google for the query 9/11 while the site used to be in the top 5 Google returns in the 6 months before that.
- A Google bomb or Googlewash is Internet slang for a certain kind of attempt to influence the ranking of a given page in results returned by the Google search engine, often with humorous or political intentions
- Googlewashing
- Baidu
Privacy
- The general problem is that each query to a search engine is logged by the search engine and is uniquely identifiable to your computer. The more other (non-search) services a search engine offers, the more personal information they can gather and combine. An illustrative case is the following:
- AOL search data scandal
- On August 4, 2006, AOL released a compressed text file on one of its websites containing twenty million search keywords for over 650,000 users over a 3-month period, intended for research purposes. AOL pulled the file from public access by the 7th, but not before it had been mirrored, P2P? -shared and seeded via BitTorrent. News filtered down to the blogosphere and popular tech sites such as Digg and Wired News.
While none of the records on the file are personally identifiable per se, certain keywords contain personally identifiable information by means of the user typing in their own name (ego-searching), as well as their address, social security number or by other means. Each user is identified on this list by a unique sequential key, which enables the compilation of a user's search history. In fact, in a test of whether it was possible to do so, the New York Times was able to locate several individuals from the released, and anonymized search records by cross referencing them with phonebooks or other public records. Consequently, the ethical implications of using this data for research are under debate.
AOL acknowledged it was a mistake and removed the data, although the files can still be downloaded from mirror sites. Additionally, several searchable databases of the report also exist on the internet.
Although the searchers were only identified by a numeric ID, the New York Times successfully discovered the identity of several searchers, and with her permission, exposed search number 4417749 as Thelma Arnold, a 62-year-old Georgian widow. This privacy breach was widely reported, and led to the resignation of AOL's CTO, Maureen Govern on August 21, 2006. The media quoted an insider as saying that two employees had been fired: the researcher who released the data, and his immediate supervisor, who reported to Govern
- The web interface to AOL's 500K user search logs
Various
Tools which give you a glimps off search engine censorship