FAQ

What does “inlink page count” mean?

The data collected are “site inlinks” to the home pages of the web sites. A site inlink is a page on a different web site that contains a hyperlink to the home page of the studied site.

What does “inlink site count” mean?

A site inlink count is the number of web sites (defined by domain names) containing one or more links to the assessed web site. Site inlink counts are derived from the page inlinks by extracting the domain names from all of the inlink URLs and then converting the domain names to site names by chopping of the initial part of the domain name to leave just the last segment before the top-level domain (or second-level domain if one is used). For example, the URL

http://cybermetrics.wlv.ac.uk/audit/10101_main.htm

has domain name

cybermetrics.wlv.ac.uk

and its site name is the end of the domain name including the second level domain ac.uk and the immediately preceding segment wlv

wlv.ac.uk

Site inlink counts are then obtained by counting the number of unique site names extracted from the URLs of the inlink pages.

Which is a better measure, "inlink page counts" or "inlink site counts"?

The inlink site count is a more robust measure and should always be used, when available. Nevertheless, page inlink counts give broadly similar results in many cases. The reason that inlink page counts can be unreliable is that a link in a standard navigation bar or blogroll is repeated on every page of a web site. Hence

What is the significance of the link counts?

Links are used for many reasons and therefore the precise significance of a link count depends strongly upon context: why the particular set of links were created. Nevertheless, links imply at least recognition of the link target. A page or site with many links also tends to have more useful content. In an academic context, a page or site with many links to it tends to indicate more and higher quality research by the owning organisation, although no reliable research quality inferences can be made from link data for individual research groups.

Do the results depend upon the search engine chosen?

Yes the different search engines give different results and there is a good explanation. No search engine covers the whole web so the results are always underestimates and vary because of the size of each search engine's database and, to a lesser extent, its strategy for crawling the web and reporting results. But the results are suitable for comparisons between web sites of link counts.

Why wasn’t Google used?

Google is not recommended for links because it does not report all the links it knows about, only about 10% and this figure is highly variable. Also Google does not allow the exclusion of links from the same site (at least in its public interfaces), which is necessary for genuine comparisons between sites. Both Yahoo! and Windows Live Search are fine for link calculations, but Windows Live Search has currently (possibly temporarily) blocked access to its public link search features so can't be used. Hence Yahoo! and its partners like AltaVista seem to be the only choice at the moment.

Are the figures accurate?

The data is from the search engine Yahoo! and reflects the number of pages that Yahoo! has found (rather than the whole web) and not filtered out. Hence the link count numbers are normally underestimates, but it is nevertheless reasonable to compare link counts to different web sites.

Why are links from the same site excluded from the results?

Links between pages of the same site are often created for navigational purposes and in any case are less “impressive” indicators of recognition than links from other web sites. Hence it is standard practice in webometrics to exclude these from link counts.

How were lists of links obtained when there were more than 1,000 results?

Search engines normally return a maximum of about 1,000 results for a search. For searches with more results, a technique known as “automated query splitting” was used to get additional results. Essentially, this works by submitting a large set of derivative sub-queries and then combining the results.