Towards a Web of Culture and Science
Contribution to the international CODATA conference 2004 in Berlin
Ladies and gentlemen,
Jürgen Renn, MPIWG
it is a great honor to address you and I would like to thank the organizers for providing me with the opportunity to present you an open-access initiative by the German Max Planck Society, the French CNRS, the Chinese Academy of Science, The Library of Alexandria, and other major international science and education organizations.
In the following, I will describe the crisis which culture and science are facing in what is still the beginning of the Internet Age.
I will present you a vision on how to overcome this crisis, and I will indicate steps towards the implementation of this vision, in particular the Berlin declaration on open-access to science and culture, signed by many international organizations in Berlin in 2003.
Let me begin with describing the crisis and first turn to the crisis of culture on the Web. It has two dimensions.
The medium of today and tomorrow, the Internet, might leave behind a culture which is the heritage of our past but urgently needed to meet the challenges of the future. This cultural heritage is presently in danger of being left behind, of missing the boat of the rapid technological developments carrying us into a new information age.
The bulk of information which forms the core of cultural heritage is largely excluded from the information system constituting the backbone of an ever-more knowledge based world.
The great works of art and literature, the multitude of languages of this world, traditions sometimes reaching back over millenia, the treasures of scientific, scholarly and philosophical writings going back to the dawn of our civilization are not being as substantially transferred to the new medium as is necessary for their preservation in view of wars and dwindling public funds menacing them with rapid degradation.
Take the example of some Websites which supposedly offer culture on the Web. That at least has been the motive for investing public funds into their creation....
As a matter of fact, however, they often merely present dead links or password protected information, hallmarks of the lack of longevity. transparency, and interoperability of culture on the present Web.
The deficit in the extent to which cultural information is available on the Web is accompanied by the underdevelopment of cultural techniques adequate to the new technologies.
Reading, writing, and calculation, the traditional cultural techniques have to be complemented by new cultural techniques allowing every single individual to optimally exploit the Internet as a representation of the collective human knowledge.
What we need are not just the relatively limited possibilities of today's browsers to display information offered by a server in different ways determined by the client.
What we need are possibilities for a much more active role of the users, ultimately overcoming the client-server assymetry presently shaping the Web altogether.
The question is, of course, how to overcome the present crisis which menaces the link between our past and our future.
Before coming to answers, let me turn to the second major crisis of this transitional period, the crisis of science.
It is most visible in the rising journal prices which effectively make science ever-more inaccessible, — to developing countries, for instance, but more generally to all those who have produced scientific knowledge, mostly with public funds. Scientific organizations are in fact forced to repurchase the information they produced in the first place.
But effectively, the so-called journals crisis amounts to a complete breakdown of the traditional distribution of labor in the traditional information circuit.
According to this traditional model, research results are produced by scientists. This is and will remain the most cost-intensive element of the information circuit.
The results of research are disseminated by publishers and archived by libraries.
Information is filtered by a process of evaluation performed by scientists (peers) and organized by publishers.
Only that which survives this filtering process is being disseminated.
This well-established traditional system is now endangered by technological changes with radical consequences.
Even within the system of printed information, these technological changes are felt by the rising prices charged by publishers for dissemination, which scientific organizations are no longer able to cover and which dramatically increase the divide between industrialized and developing countries with regard to the availability of scientific information.
The information revolution has radically changed the technical and economic basis for maintaining the scientific information flow. This radical change is evident from the as yet unexploited potential of the Web for scientific communication.
Dissemination is no longer a cost intensive component. It can, in principle, be handled by scientists without the services of the publishers.
In the electronic medium, evaluation follows and does not precede dissemination. It no longer has to amount to a simple "in/out" decision about publication.
There is no longer any reason to preclude access to the information hinterland, to observational and experimental data, software tools, or to historical sources, which presently only serve as a logistic background for published research results. Making such additional information available will help ensure the reliability of scientific information, to broaden the scope of available resources, and to avoid the duplication of efforts. Moreover, the Web offers completely new forms of scholarly publication reaching from digital libraries of cuneiform tables to entries into biological databases.
The new medium could facilitate and improve the quality of the selection process. The immediacy and in principle unrestricted scope of electronic dissemination increases the likelyhood of rapid responses, distinguishing valuable from non-valuable contributions.
On the background of this impressive potential of the Web for scientific communication, it becomes particularly evident what is wrong in the present system of scientific dissemination dominated by publishers with a quasi-monopolistic status:
As I have mentioned, there are, first of all, the increasing costs for scholarly journals corresponding to capital urgently needed to build up an infrastructure more adequate and efficient for scientific dissemination, thus corresponding to a waste of public money.
Then there are the commercial barriers to the connectivity of knowledge, enforcing a fragmented landscape of information islands rather than fostering the development of a global representation of human knowledge, constituted of interoperable contributions by all players.
It is also important not to forget that publishers do not offer a guarantee for the long-term archiving of information, again a challenge that remains with public institutions.
As I have mentioned before, we also lack an adequate access and retrieval infrastructure correponding to the needs of scientists and educators.
And finally, mapping the traditional commercial publication system into the new medium perpetuates the digital divide in science. In fact, simply creating a mirror image on the Web of the traditional system amounts to erecting an articificial boundary cutting off developing countries from the scientific information flow.
Let us now turn to approaches towards a solution of the double crisis of culture and science on the Web.
The two standard solutions are the big player solution and the scout solution. Both have failed to create an adequate infrastructure fostering the much needed dynamics of the transfer of scientific and cultural content from the old medium to the new one.
The big player solution is most familiar from the present debates on electronic journals where a few publishers use their near monopoly to erect new barriers of accessibility.
But for the digital availability of cultural heritage, the situation is perhaps even more problematic, resembling a gold rush where everybody tries to stake out claims.
The big players have in fact long since begun to secure exclusive rights on the reproduction of cultural artefacts, be they manuscripts of Leonardo da Vinci or representations of traditional cultures.
But in spite of their eagerness to control large domains of cultural heritage, the big players have so far failed to create an infrastructure that guarantees a steady and reliable flow of content from the old media into the new, an infrastructure offering equitable access to all nations and people, often deprived of their heritage by the pitfalls of history.
The scout solution, on the other hand, is based on the assumption that the transfer of cultural and scientific content to the new medium can essentially be achieved by pilot ventures.
It amounts to the realization that bringing culture and science to the Internet means settling a new continent, rather than just exploiting its resources in a gold rush.
But it also amounts to the problematic assumption that this can be done by merely sending out a few scouts to survey the new territory.
As a matter of fact, also the scout solution has largely failed to launch a self-sustaining dynamics of culture and science on the Web.
I have shown you before examples for the inadequacy of the presentation of culture on the Web.
One has to add, however, that it were precisely the scout solutions which first demonstrated the potential of the Web for creating new cultural spaces.
Here you see the homepage of the Perseus Project located at Tufts University, one of the most important pioneering projects, actually predating the Web itself.
It has demonstrated that the new information technologies make it possible to radically change the outlook on a culture such as ancient Greek culture, traditionally split among disciplines such as philology, archeology, art history, and the history of science and technology.
With the help of the Perseus Project it has become possible to associate, for instance, the mentioning of Greek musical instruments in classical texts with images of such instruments found on Greek vases. Tasks such compiling a vocabulary on architecture in ancient Greek which traditionally have taken the lifetime of a scholar can now be addressed in a day or two.
Support is also provided by language technology, offering automated morphology linked to dictionary entries.
As a result you can actually perform semantical searches in the corpus of classical texts covered by the project.
The important disadvantage even of this shining example is, however, that it has largely remained an island solution, a true representative of the scout approach, failing to launch a selfsustaining dynamics which should extend its achievements to the Web at large.
The right solution to the double crisis can only be found if we have a vision adequate not only to the needs of specific disciplines but to the Web as a global representation of human knowledge.
The vision I would like to sketch here is that of a Web of Culture and Science.
This is a vision concerning both the enrichment of the Web with content and its future technological development, hopefully turning it indeed into the global and accessible representation of human knowledge mentioned above.
For creating a self-sustaining dynamics enriching the Web with meaningful content represented in adequate structures, we need a support program for open access aiming at building up a technical and social infrastructure. Such a support program is the core of what we have called the Agora solution, in analogy to an institution of ancient Greece where the common good emerged from the contributions of all citizen.
For creating the tools which make it possible to adequately exploit this content for science and education, we need the development of a semantic Web in the sense of an infrastructure allowing future users to truly interact with the content they find.
Let me first turn to the Agora solution which aims at building up an infrastructure turning the consumers of the Web ever-more into producers.
In fact, we urgently need support for creating an open-access infrastructure for making resources freely available online with little effort and in a way that guarantees the interoperability with other contents and tools, thus creating an added value for every user.
Turning to the Web of the future I will here limit myself to a few remarks:
First of all, we need to realize that the future transformation of the Web will be driven not only by technical issues of speed and bandwith but by innovative usage scenarios, just as it was the case when the Web itself was invented here in Geneva at the CERN.
Making the Web more democratic, for instance, will also create a technological drive from the client-server assymmetry to peer-to-peer interactions, from browsers used by essentially passive clients to knowledge weavers used by active citizens.
Let me expand on this point:
The hyperlinks of the Web represent structures of meaning that transcend the meaning represented by individual texts, but, at present, these "webized" structures of meaning, lacking any longivity, can only be blindly used e.g. by search engines which at best optimize navigation by taking into account the statistical behavior of web users. However, these meaning structures can so far hardly be made themselves the object of interventions by the web community. There is at present no way to construct complex networks of meaningful relations between web contents. In fact, the providers have no influence on the links to the contents provided by them and the users have no impact on the available access structures to the content, except by becoming content providers themselves.
The Web of the future will probably continue to be essentially based on the representation of meaning by text. However, contrary to the existing web, its emerging paradigm is no longer constituted by the client-server assymmetry but by informed peer-to-peer interactions, that is, by a cooperation of equally competent partners who jointly act as providers and servers at the same time. Future users will work on shared knowledge by constructing new meaning while accessing the existing body of knowledge represented in the Web through meaningful links to texts and text corpora.
A basic faculty of human thinking is the ability to reflect on existing knowledge and to produce, so to speak, data on data. But when considering the future of the Web, the problem of reflection is predominantly perceived as that of creating, representing, and exploiting metadata in a much more limited sense. In a wider perspective, however, semantic linking should not be restricted to the use of specifically prepared metadata sets but should exploit the meaning structure of the Web itself in order to provide a content-based semantic access to information.
If, in the future, navigation can be based on content-specific metadata resulting in dynamically changing ontologies, and if powerful link editing functionalities become part of future "knowledge weaving web environments," a self-organizing mechanism of the Web could be implemented which in a middle-range perspective will improve the hypertext linking of the Web and hence increase its transparency.
In summary, a future Web of Culture and Science should be characterized by significantly enhenced longevity, interactivity, and transparency.
Let me conclude with some words on the implementation of this vision, coming back to the Berlin declaration.
The Berlin Declaration stands in the tradition of the Budapest and Bethesda declarations and other similar intiatives in the favor of openh access to scientific information on the Web.
It distinguishes itself by pointing to the need of ensuring open-access to cultural heritage as well as to scientific information, following up on the example set by the charter of the European Cultural Heritage Online Initiative supported by the European Commission.
A further hallmark of the Berlin Declaration is the creation of a concrete perspective of the institutional implementation of open-access, including the provision of measures of self-archiving and the creation of other support structures facilitating open-access.
The Berlin Declaration was presented at the World Summit for the Information Society 2003 in Geneva and has significantly contributed to place open-access to scientific knowledge and cultural heritage on the final declaration of the summit, considered likely to trigger a paradigm change all over the world.
The Berlin declaration was signed in October 2003 by major national and international governmental, scientific, cultural, and educational organizations.
They consider their mission only half complete if the information they produce is not made freely available to society.
Otherwise science is, according to their view, simply unable to reveal its full impact so that investments in science fail to reach the returns they could in principle attain.
Let me quote from the Berlin declaration
"In order to realize the vision of a global and accessible representation of knowledge, the future Web has to be sustainable, interactive, and transparent. Content and software tools must be openly accessible and compatible."
"Our organizations are interested in the further promotion of the new open-access paradigm to gain the most benefit for science and society."
The Berlin declaration also recommends specific measures for implementing the open-access paradigm.
Scientists are being encouraged to publish their work according to the principles of the open-access paradigm.
The holders of cultural heritage are encouraged to support open-access by providing their resources on the Internet.
Let me briefly show you a list of the present signatories of the Berlin declaration....
.....On the German side, it has been signed not only by the Max Planck Society but also by all major research agencies associated with the Max Planck Society in the so-called alliance, such as the German Research Foundation, the Fraunhofer Society, the Leibniz and the Helmholtz Associations, the German Science Council, and the Association of Universities. Taken together they organize and fund the lion's share of German basic and applied research. The Berlin declaration has also been signed by the Berlin Brandenburg Academy, one of the national galleries, and the German Library Association. All of these institutions have been pressed by the ever-more scarse funds for science and culture to use their resources as effectively as possible and to regain control over the knowledge they produce.
On the international level, the Berlin declaration has been signed by the French CNRS, by the National Hellenic Research Foundation, as well as other major research funding and governmental organizations from Belgium, Spain, Austria, Norway, Italy, and Hungary. It has furthermore been signed by transnational organizations such as the Academia Europea.
The core text of the Berlin declaration has been closely agreed upon with the American Bethesda group representing major research organizations in the US such as the Howard Hughes foundation, the National Institutes of Health, or the University of California.
Several international follow-up conferences are in preparation in order to achieve a closer coordination between all players involved.
As for the humanities, my own field, the Berlin declaration has been a great encouragement for many archives, museums, libraries, and research institutions to make their contents freely available.
Allow me therefore to focus on the realization of the vision of an open-access infrastructure I have sketched for this particular aspect.
What you see here is a timeline of so-called seed collections reaching from 3000 BC to the present, offering access to major collections of cultural heritage assembled within the framework of the European Cultural Heritage Online, the ECHO Initiative.
In order to assemble these collections from materials dispersed all over Europe and beyond, collaborating with museums, archives, and research institutions, it was crucial that we have been able, within the context of the ECHO Initiative, to offer such institutions an open-access infrastructure in the spirit of the Agora solution, helping these institutions to overcome the competence and technology thresholds separating them from the Web. The infrastructure built up by the ECHO (European Cultural Heritage Online) Initiative allows for the web-based collaboration on images and texts and automatically creates, for instance, links from any text embedded in the infrastructure to dictionaries for a variety of languages ranging from Ancient Greek to Chinese.
Here you see the example of a text from the early 17th century co-authored by a Chinese and a European scholar. The text is available both as an image and a transcription which is connected with a Chinese-English dictionary.
There are several options for image processing, for annotating images, as well as additional tools for editing and annotating XML texts.
At present, the Chinese text shown here is being compared in a web-based collaboration using these facilities with a number of European sources written in Latin, Italian, or Dutch, all of which are also freely available online and coupled with the appropriate language technology for these languages.
The previous example came from a seed-collection of documents related to Chinese scientific knowledge.
We use the term seed collection in the sense of a collection of digitized documentized freely available on the Web and associated with an infrastructure that makes it possible to easily add further documents related to the content of the seed collection.
Another example is a seed collection dedicated to Renaissance architecture. Here the example of a photographic documentation of construction details of the Florentine Cathedral.
Another collection pertaining to this seed collection is a collection of administrative documents related to the construction process of the cathedral.
This colletion is transcribed and richly annotated. It is, at the same time, the result and the presupposition of scholarly work.
It is indeed remarkable how, in the context of such an infrastructure, research results can be turned into navigational devices across the primary materials.
This example also shows how a seed-collection distinguishes itself from a traditional digital library. It comprises several autonomous collections which are, however, integrated not only by a common subject but also by a common, extensible infrastructure, open for the association of further documents and collections.
Another example is the cuneiform digital library initiative, a collaboration of numerous institutions from all over the world now also represented as a seed collection in the ECHO infrastructure.
CDLI has united material dispersed over museums world wide and makes accessible more than 70 000 cuneiform tables, as images, transcriptions, combined with search tools, with an electronic journal and other elements of a rich open-access infrastructure.
Here you see a manuscript from Leibniz.
While scholars have traditionally focussed on manuscript collections in order to prepare editions and sometimes worked over decennia or even centuries to produce such an edition, say of the works of Leibniz or Galileo, the new media now allow for an immediate access to such resources, overcoming the bottleneck of traditional editorial projects.
The sheer difficulty of getting access to such precious manuscript has implied a high degree of specialization to which also the cultural techniques of getting access to and working with such manuscripts belong.
Yet we should be careful not to mimick the fragemented structure of traditional scholarship in the new medium and rather focus on the novel potential of such an infrastructure allowing interoperability between, say a digital representation of Leibniz manuscripts and a digital collection of Galileo's manuscripts.
In the upcoming Einstein year 2005, celebrating the centennary of Einstein's formulation of relativity, we shall, of course, focus on making revelant historical and modern sources available (e.g. http://echo.mpiwg-berlin.mpg.de/content/relativityrevolution/gehrcke), addressing not only the public image of Einstein as in this newspaper collection, also available online within the ECHO framework, but also on enhencing the accessibility of mathematical formulae with physical meaning on the Web.
One of our aims is to integrate the infrastructure provided by the ECHO initiative for cultural heritage with the open-access platform developed by the Max Planck Society in order to create an electronic archive of its publications.
This integration has to be seen on the background of the Double Strategy adopted by the Max Planck Society.
In the context of implementing the Berlin declaration, the Max-Planck Society has in fact followed a double strategy, aiming at fostering access to electronic information in the traditional journal format (e-Lib), on the one hand, and developing — with the support of a newly founded innovation center, the Heinz Nixdorf center, new models of open-access electronic dissemination, on the other hand (e-doc)
The ECHO infrastructure is just one example of such an innovative infrastructure.
The e-doc platform of the Max Planck Society will be used to build up an electronic archive of all Max Planck publications in order to make sure that the society keeps full controll over this extremely valuable resource, ensuring optimal conditions for the accessibility of its research output.
With the help of a large grant from the Federal Ministery for Science and Education, the open-access platform under development at the Max Planck Society is presently being built up in a cooperation with the Fachinformationszentrum Karlsruhe. The aim is to create an open-access platform that can serve as a national pilot solution.
This brings me to my conclusion, the next steps. What are the next steps? Very easy: you can join the Berlin process and thus help paving the way to the science of the future which will have to be based on the open-access paradigm, if we want to exploit our scientific and cultural resources as effectively as possible to meet the global challenges of humankind.
The original signatories of the Berlin declaration agreed to hold follow-up conferences in regular intervals as well as to implement a detailed roadmap.
A first-follow up conference was held in CERN which has agreed to play a leading role in realizing the vision of the Berlin Declaration, the next conference will be held in Southampton, and then we shall return to the Berlin area.
The aim of these conferences is the implementation of the roadmap, comprising in particular:
- Raising the awareness of individual scientists for the need to implement open-access
- Communication within signatory organizations in order to avoid a duplication of efforts,
- Raising awareness in learned societies, also in order to identify discipline-specific needs
- and Raising awareness at the leadership level in order to stabilize the Berlin process on a plotical level.
There are, of course, many issues that require immediate action:
At present, particular efforts are being made to address the legal issues related to open-access.
As I have already mentioned, a critical issue is the creation of a sustainable, technical infrastructure.
But we have to find economically stable solutions for open access, acknowledging, of course, that open access costs are research costs.
We also have a responsibility to help overcoming the digital divide.
But first of all, we have to keep our own promises and create the necessary measures at the Institutional level to encourage scientists to use open-acces as their venue of publications, for instance by ensuring long-term funding and guaranteeing long-term operation of the necessary infrastructure.
The first results of our efforts are already encouraging. But they will fail if we are not successful in bringing together a broad allience in favor of the vision of an open Web of culture and science, which brings me to my conclusion.
"Governments, universities, research institutions, funding agencies, foundations, libraries, museums, archives, learned societies and professional associations are invited to join the present signatories."
If you wish to do so, please contact the President of the Max-Planck Society, Prof. Gruss, who has offered to coordinate the process.
For further information, also about contact addresses, either consult the website of the Max-Planck-Society or the brochure we have brought with us.
Thank you for your attention....