Proceedings of the AoIR-ASIST 2004 Workshop on Web Science Research Methods























The dynamic, self-organizing web – from models to measurement

Andrea Scharnhorst, Isidro Aguillo, Paul Wouters, Networked Research and Digital Information (Nerdi), NIWI-KNAW, The Royal Netherlands Academy of Arts and Sciences, PO Box 95110, 1090 HC Amsterdam, The Netherlands http://www.nerdi.knaw.nl, andrea.scharnhorst @ niwi.knaw.nl

The Web can be regarded as a self-organising system in a non-stationary state. Self-organization stands for processes of structure formation which take place spontaneously inside of the system and is not governed externally. The daily growth and change of the web is based on a lot of independent decisions. Surprisingly, these decisions do not lead to a random network but to a well-structured complex network. In such cases one speaks of collective behaviour. That means that the individual actions in the end appear be part of a coherent pattern of behaviour, producing a structure at the macro-level. In the case of the web this structure has been characterized as a scale-free network and a small world at the same time. The resulting network is not static. It is in a constant state of change. Thus, the state reached by the collective behaviour is not stable over time but in constant flux.

A conceptualisation of the web as a non-linear, non-stationary system has consequences for data gathering procedures on the web. Further, non-linear structures of the web network might influence any statistical analysis of web data. This aspect will influence different approaches in cybermetrics. However, a systematic reflection about this implication is still missing. Results such as scale-free degree distributions in hyperlink networks (Huberman, Barabasi) or power laws in the distribution of any measurable quantity on the web (Fauloutos, Katz) indicate the existence of non-linear mechanisms in the emergence of web structures. Non-linearity is an indicator for interaction taking place at the individual level. For instance, one could ask in which way the creation of a web page and hyperlinks is influenced by others activity on the web. Except for some qualitative studies on the motivation of webmasters this is still an open question. How are the non-linearities in probabilities to link to each other created by social interactions? In addition to this complexity, web structures are constantly changing. The web is self-organizing and evolving over time, and contrary to traditional databases for indicators it is in a far from steady state. The dynamic nature of the web concerns the growth of the web, the changing link structure, the change in content and the change in web technologies used.

This paper connects theoretical approaches of modelling the web as an evolving complex network and reflection about web data retrieval with the analysis of a unique data set of a sample of web pages. Using two different snapshots of a sample of around 1000 URLs we test phenomena such as the growth of web pages, the stability of URLs versus institutional stability (content stability), dead links, changes in file type structure and outlink networks. We relate the findings to other investigations of the invisible web and web page persistence. Eventually, we embed our findings in a more general reflection about the nature of the web as a media for science and technology activities. More specifically, we discuss the implication of the specific dynamic character of the web f or the development of web indicators, which usually are assumed to be based upon reliable, reproducible and standardised data. In case of the Web, indicator research is confronted with changing representations of the communications of different communities (Leydesdorff, Scharnhorst). This paper discusses the methodological problems and foundations of measuring science on the web.