r6 - 18 Mar 2010 - 10:40:57 - TommyKarter? You are here: TWiki >  Opensearch Web > WebLeftBar > QuaeroTalk
This page contains the slides of our talk at the Forum on Quaero at the Jan van Eyck Acadamie in Maastricht, September 29 and 30, 2007. The elaborate report about this forum can be found on http://www.open-search.net/Blog/BlogEntry25

Centralized Information Infrastructure

centralized_information_infrastructure.jpg
  • There is growing concern about the world’s dependency on a few quasi-monopolistic search engines and their susceptibility to commercial interests, spam or distortion by spam combat, biases in geographic and thematic coverage, or even censorship.
  • Problems inherent to a centralized search-engine (short version of WhyOpenSearch):
    • Manipulation (e.g. paid bias, inclusion, favoring one site over the other)
    • Censorship (e.g. China)
    • Profiling:
      • information search engines get from their users is extremely valuable
      • "if Google decides that tracking and acting upon your private information is in its best interest, it can and it will." The Search - John Batelle
  • Centralized can be very dangerous for censorship and manipulation but also for disaster and terrorism

Decentralized Information Infrastructure

decentralized_information_infrastructure.jpg
  • P2p network could dwarf server farms - open-source only counter to Google because of infrastructure and processing power
  • Data originally highly distributed.
  • Decentralized in original spirit of the web
  • Give power to the users

Open-Search framework

Designed for Privacy

  • No user logs / data collection
  • Anonymity layer (e.g. TOR)
  • Don't know where queries came from
  • Plugin infrastructure for crawling, ranking and p2p layer

Features

  • Combining technologies of peer-to-peer file storage, distributed crawling and peer-to-peer searching
  • Client/server model to access a multitude of peer-to-peer networks.
  • Indexed meta data and content repositories for a given data-provider (URI)
  • Each peer
    • is autonomous
    • has its own local search engine with crawler and a corresponding local index.
  • Peers share their local indexes by posting meta-information into the p2p network
  • Directory decentralized and largely self-organizing (DHT)
  • DHT distributes data deterministically over the network.
    • Meta data stored in a distributed reverse indexed format
    • Content stored in file system structure.

Benefits

  • All peers equal
  • Functionality shared amongst all peers
  • Load evenly balanced
  • Scalable
  • Efficiency
  • Resilience to failure
  • Benefit from intellectual input users (bookmarks, Wikia, ...)

Challenges

  • Efficiently selecting promising peers for particular information needs
  • Query propagation
    • if a query is sent out to each host it will generate massive amounts of traffic
  • Spam
    • might not be such an issue if there is a plenitude of ranking algorithms / plugins
  • Users / testing environments
    • we need a community!
  • Time & money
    • Right now we are only with 2 volunteers and have no money for a paid developer anymore
  • Technology does not solve social problems
    • e.g. if you are not allowed to install it

Known p2p Search Projects

  • Freenet - A Distributed Anonymous Information Storage and Retrieval System.
  • Minerva - Research project for a p2p-based search engine (Max Planck Institut, informatik)
  • Open-Search
  • Yacy - Active project, distributed search with emphasis on censorship (no anonymity)
Show attachmentsHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
jpgjpg centralized_information_infrastructure.jpg manage 244.6 K 03 Oct 2007 - 12:33 ErikBorra  
jpgjpg decentralized_information_infrastructure.jpg manage 203.0 K 03 Oct 2007 - 12:35 ErikBorra  
Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r6 < r5 < r4 < r3 < r2 | More topic actions
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback