r1 - 16 Feb 2007 - 13:53:15 - TWikiGuestYou are here: TWiki >  Blog Web > WebHome

Report on the Forum on Quaero

9 months, 3 weeks ago in by ErikBorra?
Open-Search was invited to participate in the Forum on Quaero at the Jan van Eyck Acadamie in Maastricht, September 29 and 30, 2007. The purpose of the forum was to question and investigate the European intentions to build a search engine and, broader, to investigate the cultural, political, and philosophical issues related to information search and access. It turned out to be a critique on centralized search engines and a plea for systems like Open-Search: decentralized, open and privacy respecting. Following is an elaborate report and my impression of the two day forum on Quaero.

Quaero

Quaero is a [...] research and development program which has the goal of developing multimedia and multilingual indexing and management tools for professional and general public applications (such as search engines) (1). Initiated by the French ,Quaero aimed to be a European backed project but in 2006 Hartmut Schauerte, a state secretary within the [German] Ministry for Economics and Labour, announced during the IT-Gipfel summit in Potsdam that a German consortium has put together a semantic search project called Theseus that will be distinct from Quaero (2). The main source of disagreement was the format of the search engine, with German engineers favoring a text-based search engine and the French engineers favoring a multimedia search engine. Many German engineers also balked at what they thought was becoming too much of an anti-Google project, rather than a project driven by its own ideals (3).

Eight privacy protecting demands

The first presentation at the public forum on Quaero was by Michael Zimmer, who did his Phd on The Quest for the Perfect Search Engine: Values, Technical Design, and the Flow of Personal Information in Spheres of Mobility. In his talk Michael explained the goals of Google by using some of their quotes: "process and understand all the information in the world" and "understand exactly what you mean and give back what exactly what you want". In search engine talk this would be called 'perfect reach' for the former and 'perfect recall' for the latter. Google summarizes this when they state that they want to be "like the mind of God". Michael went on to say that personal data which was previously obscure is now searchable, and that Google collects as much information as possible about the searcher through server logs, cookies and user accounts. However, Google does not only have a search engine but also a plenitude of other product and services, of which the information is of course combinable with that gathered by offering search. The perfect search thus promises breadth, depth, efficiency, and relevancy but you lose 'security via obscurity' and it involves the widespread capture of personal and intellectual information. Michael Zimmer calls this the 'Faustian Bargain'. Quaero's perfect search would involve and combine sound, image and video search. To show one of the dangers of perfect multimedia search, Michael showed a series of pictures with automagically identified faces. The most salient of those was the one where protesters of a surveillance protest were algorithmically identified. After showing us a lot of threats of perfect search, Michael Zimmer argued for value-conscious design and stated eight privacy protecting demands. (4) He called upon all people from Quaero in the audience to really think about this. Unfortunately nobody from Quaero attended the conference …

The absence of Quaero

In the conference booklet was an email discussion between the organization and the director of the French Quaero® program. The director tried to convince the organization not to use the name Quaero because 'The event announced by Jan van Eyck mentions the Quaero program and uses the Quaero name without endorsement by the Quaero program partners'. 'Quaero aims – just – at a series of industrial and technological objectives', and therefor he was also not interested in the forum because he has 'no authority on cultural, political and philosophical aspects of search engines'. He would 'rather leave these aspects to the people with strong convictions in these areas'. And that was a pity because according to the organization 'One could see this Forum as the democratic counterpart to a strategy session' and for a democracy to work you have to hear different opinions.

Search engines, their borders, and appropriation of the image

Although the industrial and technological actors in Quaero were absent, we continued our democratic obligation to investigate the politics of search engines with a presentation by Florian Schneider (no, not the guy from Kraftwerk). Florian had a very interesting though difficult talk. In the following paragraphs I tried to reproduce Florian's arguments from my notes. I hope the idea gets through smile

Florian investigated the circularity between knowledge and power and analyzed the correspondences between migration machines (border control or management), as an ongoing attempt to control freedom of movement on the one hand, and how it corresponds to management and emergence of new information regimes on the other.

Florian sees search engine technology as an attempt to image knowledge. In the digital world an image is understood as an exact replication of any kind of information. A search engine however is not a backup but a snapshot / image of some of the information. The question then is who decides when, how, and what kind of information is stored.

When using a search engine we experience it as something that produces an image. The result of a query is a meta design. The results are reduced to a matter of graphical design; it needs to be functional and aesthetic at the same time. The search engine has a clearly defined purpose with a manufacturing process that is hidden to the user. The crucial point is the algorithm (the manufacturing), which is hidden to user and consists of the management of the algorithm.

The search engine is sort of an interface for what one would call a mirror by e.g. googling yourself or search engine optimization. This constitutes the paradigm of imaginary order, a subject is permanently captured and formed by its own image. Search engines thus deal in imaginary property. But what does it mean to 'own' such an image? The question is not so much who is owning the image but what it 'means' to own an image. This actually addresses privacy to theoretical reconsiderations of what an image is in an area of digital reproduction. In a purely Kantian sense: it's a matter of imagination.

The manufacturing (the algorithm) is involved in an act of determining space and time, as a rule of production. It is defined by the way in which the search engine is built and installed, and forms the objects that are supposed to be spidered, crawled, and stored. Search engines then appropriate this image as their property. The conservation of this image is subject to invariant operations – of the algorithm. The images might be results of structures for imaginations. Images then manage the violations rather than prevent them from the environment. The question of ownership has turned into executions in real-time. It's not only facial recognition – solutions are executed in real time.

A question which pops up is then: Who has the right to read write and access this image / property. Google claims ownership but is not owning up. Now there is a quite dramatic shift in the concept of property.

As the migrant new economy was build on illegalized migrants, according to Florian Schneider, web 2.0 is based on the imaginary revenue. This also means legalizing content. Valorization is built on immediate appropriation of content and building imaginary content from it. The arrival of search engines clearly corresponds with the migrant wave in the 20th century. The post-modern border machine appears as a system of control with constant modulation of rights that are transformed in a set of permissions. The border thus becomes a matter of performance.

The function of post-modern border management is to illegalize migration. Migrants are allowed in as long as it is 'useful labor force'. When translated to search engines, they operate in corresponding vain. The question is about imaginary property and it's valorization. It is important to see their borders, e.g. Google only indexes 10 to 15% of the 'surface web'. But how can you border the WWW? The question what is included / excluded remains. Is the WWW in principle limitless or do you want to border the territory so you can assure information is found? Behind the surface web there is a deep web, and before a query happens there is a certain quadrage of imaginary that defines territory or maps it out. Every search engine has to manage what is on and what is of. Censorship appears as a constant modulation in real time; this ranges from China to ignoring everything that is not text.

Searching thus is an act of imagination. Finding is not the opposite of missing. We rewrite imagination in the same way as we rewrite the future – when you actually find something. There is a delicate relationship between searching and measuring. Searching needs a measurement on what is accessible and therefor could be found. "I am not searching what i know that i need to find. I am searching for what i do not yet know that i want to find." The success of a search engine can be explained as an abstract machine which processes and colonizes (our power to) imagination.

We need to form new forms of autonomy in new forms of society of control. Search engines are misfunctional, success is build on misuse, search is appropriating the right to misuse. p2p access to cultural heritage is already available and re-appropriated by the mass. The problem of Europe and the European search engine is controlling and managing of what can be imagined as European.

Clearly, Florian Schneider's notion of border, the indexable, and the imaginary are very important concepts in an attempt to understand the politics of search engines.

Richard Rogers asked how this image could be described. You got the original, the copy, and … the third object? How does one think about and deals with this 'third object'? How can you justify the ownership of it?

Quaero Uncorporate & Virtual Territories, Real Borders

Metahaven, designers by origin, showed various newly designed logos for Quaero, thereby wondering what the characteristics of a good search engine should be. The second part of the presentation was about a sketch of a layered 3D interface that visualized the isolation within search results. Unfortunately I was very tired and I did not pick up much from their presentation. My apologies.

Political Algorithms: Value-Sensitive Design

Tsila Hassine showed us the Image Tracer and Schmoogle.

The Image Tracer is a collaborative project between Tsila Hassine and De Geuzen. It evolved out of their interests in media images and the way their significance and presence fluctuate in the ecology of the world wide web. It is built on top of Google Images and grabs images each day for a particular query. Image density builds and wanes through time, reflecting how long or little, a jpeg, gif or png has been online and how its rank shifts over time. Opacity denotes the consistent presence and position of an image through multiple searches, while more transparent or murky images have appeared for less time or moved in rank. (5)

Schmoogle also lives on top of Google. For each query it fetches all 1000 results which are returned by Google. Instead of displaying it in the order (with the rank) Google gives them, they randomly reorder the results. This way they question and critique the ranking algorithm of Google. What 'if the result you were looking for was hiding in page 53?'

So what is a crawl / scrape / index?

At the end of the first day Ingmar Weber did some hands on demonstrations of various aspects of building a search engine. By using the free software program HTTrack Ingmar started to crawl some websites, excluding the tld .org. This way he showed us the importance of starting-points, the design decisions of how and what to crawl (depth, outdegree, inclusion/exclusion), the resulting found websites and how a reversed index is built – everybody has encountered it as the word/concept index in a book. Ingmar then went on to demonstrate other kinds of search engines such as the p2p search engine yacy and the human search engine chacha.com. The latter was very funny, as Joris van Hoboken told us that most of the operators actually use Google to find information.

Maps

The next day started of with Bureau d'etudes who presented a beautiful map on Quaero. Unfortunately the map is not online (yet?) so I will not discuss it now, as it would not make any sense.

Blogging and searching as Self-Management under Communicative Capitalism

On netzmedium.de a good summary of Jodi Deans' presentation has been written, so I will quote it here:

Jodi Dean presented an interesting comparison between blogs and search engines. From a psychoanalytical perspective, she interpreted both phenomena as an answer to anxieties associated with a chaotic and unstructured information space. Based on algorithmic ranking, search engines promise an objective ordering of this space and introduce an element of purity and immaculacy. Blogs, on the other hand, act as guides in the information space by presenting a strictly individual view. Dean portrays them as “technologies for managing distributed subjectivities”.

Do engines have politics? Do politics have engines?

Next up was Richard Rogers. He had 4 case studies at hand.

First was the demise of the expert libraries. Using the wayback machine Richard found out that the directory disappeared from the web. Google, in its early days, had one accessible on the front page. Through the years however you had to click through more and more until it finally disappeared.

Second was the disappearance of 911truth.org from Google's result page for the query '9/11', while it used to have a prominent place in the top 5 in the months before. Although there is no clear reason about the disappearance (is it a result of manual censorship? Is it a violation of Google's term and conditions by e.g. SEO?), it does not make it less disturbing. Because Google's ranking is a secret, there is no way of finding out why it happened – which is one of the most problematic aspects of the current search engines in an era where access to information is largely controlled by one company.

After giving us an insight into the politics of search engines, Richard Rogers turned to the question whether politics have engines. To illustrate this he showed two examples, researched by the Digital Methods Initiative. Quoting from their source distance project:

The focus here is on the prominence of particular sources in different spheres (e.g. blogosphere, news sphere, images), according to different devices (e.g. Google, Technorati, del.icio.us). For example, how far are climate change skeptics from the top of the news? For comparison sake, how far are they from the top of search engine returns? The answer to this and similar 'cross-spherical' inquiries goes a way towards answering the question about the quality of old versus new media.

By searching climate change skeptics in the top 100 Google sites for the query 'Climate Change', Richard Rogers noted that "There is distance between the skeptics and the top of the search engine returns. [...] few skeptics appear on the Websites of the top ten results in Google. When they do appear (Patrick Michaels, Steven Milloy) their resonance is not particularly resounding. [...] From the visualization one is able to see the 'skeptic-friendly' sources, realclimate.org and, to a lesser extent, climatescience.gov stand out as skeptic-friendly. Sourcewatch also is prominent, albeit as a progressive watchdog group 'exposing' the skeptics. Remarkably, news sites, generally speaking, do not mention the climate change skeptics by name. Whilst news watchers and listeners may have the impression that 'uncertainty' in the climate change 'debate' continues in a general sense (as opposed to, say, in more specific, scientific sub-discussions), 'uncertainty' appears to be discussed without resort to the well-known, or identified, skeptics." (6)

His last example concerned which animals associated with climate change, issue animals, were represented in different spheres. It turned out Google, Google News, and Technorati present a whole different 'image' about which animals are referred to a lot in the climate change debate.

The semantic web with the use of a universal ontology or a folksonomy?

Florian Cramer declared all attempts at creating universal classification schemes (ontologies) futile. To say rephrase his arguments in the words of Theodor Nelson, who coined the term Hypertext:

Last week's categories, perhaps last night's field, may be gone today. [...] The categories are chimerical (or temporal) and our categorization systems must evolve as they do. Information systems must have built in the capacity to accept the new categorization systems as they evolve from, or outside, the framework of the old. (7), emphasis by the author.

Florian illustrated this with historical material, but his real target was the German Theseus initiative, which seeks to develop AI-like tools for “automated logical deduction”. Florian sees folksonomies as more promising then ontologies.

Open-Search

My presentation can be found on http://www.open-search.net/Opensearch/QuaeroTalk In this talk I explained what the problem with the current search engines is, according to Open-Search: because the control over information is central and nobody can look into their algorithms or decisions, it is susceptible to censorship, manipulation and profiling. Getting out the central part and making it open would solve a lot of problems. We believe this is possible through an open source p2p project. Of course there are a lot of challenges as well: spam, efficient query propagation, etcetera, but most of all we haven't had time to build a community and we do not have the time nor money to put a lot of continued effort in the development of Open-Search.

Richard Rogers had some constructive comments for Open-Search.

Manipulation vs. spam

Normally search engine companies argue that their logics are closed and not transparent, because if they were open, the search would be open to spamming and manipulation. In the presentation, you have that you strive to be open about the logics, but have no answer yet for spam. search engine companies already have an answer. Yours?

My first answer would be the one from Nutch:

Search engines work hard to construct ranking algorithms that are immune to manipulation. Search engine optimizers still manage to reverse-engineer the ranking algorithms used by search engines, and improve the ranking of their pages. For example, many sites use link farms to manipulate search engines' link-based ranking algorithms, and search engines retaliate by improving their link-based algorithms to neutralize the effect of link farms.

With an open-source search engine, this will still happen, just out in the open. This is analagous to encryption and virus protection software. In the long term, making such algorithms open source makes them stronger, as more people can examine the source code to find flaws and suggest improvements. Thus we believe that an open source search engine has the potential to better resist manipulation of its rankings.

A second answer would be that because people can devise their own ranking plug-in's with different ranking schemes, it would be harder to manipulate and spam them.

Personalization vs. tribalism

With plug-in's people can choose their own ranking algorithm, and thus can personalize results. Classically, there is the idea that shared media experiences (people seeing similar things) make society — i.e., commonality in exposure but different views about what it all means (cf. community, which is different). Personalization, however, creates tribes, hate groups privileging other hate sites and their results, dictatorial regimes privileging 'official' sites. Search engines claim a kind of egalitarianism (indexing the 'whole web'), but then ranking according to authority. Yours?

Search engines might claim a kind of egalitarianism but this is not because of indexing the whole web. Florian Schneider's presentation for example has questioned the 'borders' of the database and index of a search engine. There is also the tendency of search engines to localize and personalize, e.g. iGoogle serving results which are specifically tailered to you (recall also perfect recall from Michael Zimmer's talk), or a specific country version of a particular search engine. One might also be tempted to talk about an 'objective' ordering but there are enough examples that prove the current ranking schemes of the big search engines are not that objective at all.

The shared media experience that makes society is already diffusing by the broading offer of information channels via internet and interactive television. Of course personalization of the current engines also adds to the decline of the shared media experience.

One might think a p2p engine like Open-Search has the danger of tribalism and segregation because your peers might not 'know' as much as somebody else's peers. At the level of the search engine however, there are no peers. To the open-search engine, the p2p layer is an abstraction of a generic storage device, a database. It is assumed that each peer will have access to the entire database at all times, regardless of which peers connect to which peers. We are not using a flooding model, as does e.g. Soulseek.

Freedom of Expression and Search Engines

Joris van Hoboken has been interested in Open-Search from the beginning. I invited him to do a presentation with me. Joris is a Phd student whose research concerns regulation problems in search engines. On the public forum on Quaero he gave a short presentation on the implications for freedom of expression for search engine law and government involvement in particular. There are three actors on which you can focus regarding freedom of expression and search engines: the search engine, the information provider, and the user. From the user's perspective "Freedom of expression in the context of Internet search implies that a search engine has to make its index and ranking machinery openly available for its end users of various kinds to support free (not as in gratis) access to information. (user becomes the real search engine editor)."

The final discussion

Two days of conference raised a lot of critique on privately owned centralized search engines. Questions about the borders of indexed information, access to and control over information, objectivity in ranking, and concerns about privacy led the participants to favor a system like Open-Search. Open-Search and similar systems however, are in its infancy – a lot of problems still need to be solved technically and politically.

Jodi dean has a great post about the final discussion, which I quote here:

The conference closed on a shockingly optimistic and unified note [...] Why such a great note? Likely because of three crucial interventions that resulted in the sense of a politics around P2P/open search, a politics that would assert the failures and limits of a search (foregoing the knowledge claims of a god/subject supposed to know and thus attempting to divert transferential investments into authority) engine. This could seem counter-intuitive. Who wants a search engine that doesn't claim to be reliable, thorough, and objective? Perhaps those who recognize that there is no such search engine and take responsibility for this limited, partial, and shared knowledge. The three interventions:

Florian Schneider directly politicized open search. It had been implicit in the discussion, but he made it explicit and political. He also used the term exodus as a kind of movement constitutive of the political.

Daniel van der Velden rendered exodus as more of a decision, and thus as requiring a kind of awareness or even consciousness (which makes the projects demonstrating the failures and interventions of google--which doesn't live up to its anti-evil ideals--all the more important).

For me, these two ideas seemed to conflict. There is hardly an exodus from google, rather the opposite--the problem is the way people flock to it, rely on it--like Wal-Mart and McDonalds.

But Florian Cramer traversed this dilemma, refused the false choice--and gave a rousing speech that all agreed marked an appropriate end point for this phase of the conversation. So, he said that exodus is a metaphor, with limits, and that exodus can't mean here anything like a kind of neo-luddite movement/moment. And, he refused the demand for an image. More specifically, he said that the very question of 'what would a European search engine look like' should be eliminated (for good techie reasons involved API, available public interface). There isn't one answer, one image, one model.

This fits well with the theme of the imaginary that I took from the conference. It accepts neither the imaginary, nor calls for a symbolic (name, authority, law). It traverses these with a different kind of accountability (clearly not quite ready for release, but maybe soon in beta). Maybe this is something like an act in information politics.

And, if information is value and search engines add, create, and arrange value, what sort of value would P2P search engines create?

To me it was a great forum. A lot of interesting intellectuals shared their thoughts and theories about search engines. It gave me a lot of inspiration and deepened my conceptualization about the politics of search engines. I hope that this group of researchers goes on with the problematique and that follow-up conferences are organized. Further discussion of the pitfalls and benefits of a system like Open-Search is encouraged to take place in the comments and on the mailing lists.

… reply

Google censors 911truth.org

10 months, 2 days ago in by KoenMartens?
In a well documented case of censorship, google again shows its true meaning of "don't be evil".
In a university blog, open-search suspect Erik Borra writes about the decline and final demise of 911truth.org. in google search results. On 911truth.org, it has become apparent that several readers have noticed the disappearance of 911truth.org from google.

Google attracted more bad press, with the recent gmail vulnerabilities also posing a serious threat to users of google services.

… reply

Open-search to participate in Forum On QuaerO

10 months, 2 weeks ago in by KoenMartens?
Open-search is asked to participate in a public think tank on the politics of the search engine. The forum is brought together by Quaero, a consortium of technology firms and research labs working together on multimedia and web search projects.
From the flyer:

"QUAERO: isn't that the search engine that former French president Jacques CHIRAC declared to be the EUROPEAN challenge to Google? A pub-lic alternative to Silicon Valley-born commercial search engines, funded by the French state, in service of the PUBLIC GOOD, in the true tradition of the GRAND PROJET? An INFORMATION MACHINE capable of reclaiming European LANGUAGE and intellectual HERITAGE in the age of GLOBALIZATION?

NO. Quaero is the name of a CONSORTIUM of technology firms and RESEARCH LABS working together on multimedia and WEB SEARCH PROJECTS. It is a STATE-SPONSORED effort to stimulate PRIVATE French technological competitiveness.

According to Franc,ois BOURDONCLE, one of its participating developers, `Quaero is definitely not a project to build a web search engine, it is a project to make significant advances on the handling and indexing of multimedia content. It is completely out of the question to build a new, state-owned, state-operated, or even state-funded search engine. Only the R&D around these cutting-edge multimedia indexing technologies are in the scope of the project.'

But still, the issues that the idea of Quaero has raised – since its public launch by the former French president – constitute a formidable challenge. Internet search engines are political projects proper if only because they give and take power; they represent science, technology, (trans)national politics, private enterprise, culture, territoriality and language in ever different combinations. They are also social spaces. Internet search, the indispensable public tool that allows one to survey the ever increasing web, is currently in the hands of only a few global players, to whose private interests its setup corresponds."

… reply

Presentations and Articles

1 year, 1 month ago in by ErikBorra?
Koen's has written an article about Google as the Microsoft of the 21st century. It can be found here (in Dutch). Another interesting (Dutch) article about Google, by Michiel Leenaars from gridnet.nl is Google's verklikkernetwerk.

Apart from developing (see our previous post) we are also very busy giving presentations about open-search. Last week at versgeperst.info Tomorrow at ISOC.nl and Monday at the Holland Open Source Conference.

… reply

some technical details

1 year, 1 month ago in by RobinGareus?
closing in on the holland open.
Here's some technical details that we're gonna reveal in a prototype at the open-holland conference:

  • the oscar peer to peer layer uses a gossip-engine to multiplex traffic over a single TCP port. – it also allows to tunnel traffic eg. via HTTP to circumvent firewall, without much performance penalty.

  • in this prototype the data-signing is not part of open-search, but the P2P layer: Every p2p-packet sent or received contains a RSA-SHA; (yielding ~35% overhead at the moment) – the signature will identify the host responsible for providing the information (not the ID of the Source of the Information ie. crawler or user)

  • only "bad clients" not "annoying users" can be banned and the ban affect the local open-search-client only, not the whole network.

  • we're testing & debugging a crawler with a local perl::DBD database (currently sqlite) – The SQL interface can be replaced with a p2p layer and is currently used to develop and debug the latter at the moment.

  • privacy can be archived as an additional Feature. A user can select
any content of the p2p-network using it's identifier and checksum to sign it's content with a private key. A user's signed content can be referenced ,like any other content in the p2p-network (you can search for it) and it's up to the end-user-application to make use of it.

… reply

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
Copyright © 1999-2008 by the contributing authors. All material on this collaboration platform is the property of the contributing authors. Ideas, requests, problems regarding TWiki? Send feedback.