Proceedings of the AoIR-ASIST 2004 Workshop on Web Science Research Methods
Small-world connectors across academic web spaces [Presentation]
Lennart Björneborn, Royal School of Library and Information Science, Denmark, http://www.db.dk/lb
The Web was developed as a tool to facilitate easy access to networked information sharing and browsing, with web links functioning as subject access points and guiding tools to discover optional and alternative directions and loopholes to encounter information. The traversal options and access points to information on the Web depend on where and whereto local web constructors have placed and targeted links. The Web is thus constructed through distributed knowledge organization by millions of local creators of web pages and links. The self-organization of hypertextual link structures on the Web may be conceived as macro-level aggregations of micro-level interactions; as an ongoing collaborative weaving of dynamic document networks conducted by a multitude of link creators.
An intriguing dimension of these giant document networks on the Web deals with small-world properties in the shape of short distances along link paths through intermediate web pages and web sites. Small-world networks are characterized by a combination of highly clustered network nodes and short average path lengths between pairs of nodes. In recent years, an avalanche of research has revealed small-world properties in a grand variety of networks, including biochemical, neural, ecological, physical, technical, social, economical, and informational networks. For instance, scientific collaboration networks and semantic networks may show small-world features. The coincidence of high local clustering and short global separation means that small-world networks consist of both small local and small global distances. This special small-world feature facilitates high efficiency in propagating information, ideas, contacts, signals, energy, or viruses, etc., both on a local and global scale in the concerned networks.
On the Web, small-world link structures are concerned with core library and information science issues such as navigability and accessibility of information across vast document networks. For instance, short link distances along link paths affect the speed and exhaustivity with which web crawlers can reach and retrieve web pages when following links from web page to web page. Further, small-world web topologies may have implications for the ways users explore the Web and the ease with which they gather information. Small-world link structures may reflect cross-social connections between different interest communities, or cross-disciplinary contacts in scientific networks. Such small-world connectors are important as they counteract ‘balkanization’ of the Web into insularities of disconnected and unreachable subpopulations.
This presentation briefly outlines a webometric research project  concerned with what types of web links, web pages, and web sites function as small-world connectors across topically dissimilar domains in an academic web space. A novel corona-shaped web graph model is introduced, as well as a five-step methodology in order to sample, identify and characterize small-world properties by zooming stepwise into more and more fine-grained web node levels among 7669 subsites harvested from 109 UK universities.
The methodology includes detailed case studies comprising 10 shortest path nets that contain all shortest link paths between pairs of topically dissimilar subsites in the strongest connected component in the investigated UK academic web graph. The network analysis program Pajek was used to extract all shortest link paths. The path nets were constructed to function as investigable small-world link structures – ‘mini small worlds’ – generated by the deliberate juxtaposition of topically dissimilar seed nodes.
The Internet Archive was used to retrieve and examine source pages and target pages with interconnecting links within the 10 path nets.
Indicative findings suggest that personal link creators, such as researchers and students, as well as computer science-related subsites may be important small-world connectors across sites and topics in the investigated academic web space.
Personal link lists (e.g.. bookmark lists) was the largest cross-topic page genre providing about 40% of transversal (cross-topic) outlinks in the 10 path nets. Over 80% of the identified transversal links were related to academic activities such as research or teaching.
The indicated connective role of computer-science-related (CS) subsites in academic link structures probably reflects the auxiliary function of computer science in many scientific disciplines in natural sciences, technology, humanities, and social sciences. Furthermore, this auxiliary function may be combined with a more experienced and extrovert web presence by CS-related persons and institutions.
The investigated UK academic web space showed small-world properties with a high clustering coefficient and a low average path length (3.5) between reachable subsites. These findings are in line with the concept of a fractal self-similar Web with subsets of the Web displaying the same graph properties as the Web at large. Further, the present study indicates a close relation between Kleinberg’s concepts of hubs and authorities on the Web and the social network analytic measure of betweenness centrality. No literature has been found discussing such a relation.
The close examination of the 10 path nets gives an intuitive support to how the Web may be conceived as a web of genres with a rich diversity of interlinked page genres and with genre drift, that is, changes in page genres along link paths. The presentation discusses hypothesized complementarities of topical uniformity and diversity (including topic drift and genre drift) in the formation of small-world link structures. A metaphor of crumpled-up paper is used to conceptualize how complementarities of convergent and divergent web structures – as reflected by topical uniformity and diversity, respectively – in ’crumpled-up’ web spaces can be explored and exploited by complementarities of convergent searching and divergent browsing (including serendipitous encountering).
Special decentralized algorithms have been developed that utilize local connectivity information for identifying short paths through a network where no global link data are vacant. In particular, well-connected hub-like web nodes may be exploited in such decentralized algorithms. The findings in the present study suggest that the rich diversity of inlinks and outlinks to and from computer-science web sites and personal link lists may be utilized for such computer-aided navigation along small-world shortcuts, which also might be exploited as starting-points for more exhaustive web coverage by search engine crawlers.
As academic web spaces increasingly include extensive scholarly self-presentations and link creations, the sociology of science may employ small-world approaches including social network analytic concepts as weak ties and betweenness centrality for automatic detection of informal social networks (‘invisible colleges’) and central gatekeepers. Such approaches may also be interesting to exploit in web mining, e.g., for identifying fertile areas for interdisciplinary exploration and innovative cross-pollination.
 Björneborn, L. (2004). Small-world link structures across an academic web space : a library and information science approach. PhD thesis. Royal School of Library and Information Science. Available: http://www.db.dk/lb/phd/phd-thesis.pdf
[Also available chapter-wise at http://www.db.dk/lb]