The data collected are “site inlinks” to the home pages of the web sites. A site inlink is a page on a different web site that contains a hyperlink to the home page of the studied site.
A site inlink count is the number of web sites (defined by domain names) containing one or more links to the assessed web site. Site inlink counts are derived from the page inlinks by extracting the domain names from all of the inlink URLs and then converting the domain names to site names by chopping of the initial part of the domain name to leave just the last segment before the top-level domain (or second-level domain if one is used). For example, the URL
has domain name
and its site name is the end of the domain name including the second level domain ac.uk and the immediately preceding segment wlv
Site inlink counts are then obtained by counting the number of unique site names extracted from the URLs of the inlink pages.
The inlink site count is a more robust measure and should always be used, when available. Nevertheless, page inlink counts give broadly similar results in many cases. The reason that inlink page counts can be unreliable is that a link in a standard navigation bar or blogroll is repeated on every page of a web site. Hence
Links are used for many reasons and therefore the precise significance of a link count depends strongly upon context: why the particular set of links were created. Nevertheless, links imply at least recognition of the link target. A page or site with many links also tends to have more useful content. In an academic context, a page or site with many links to it tends to indicate more and higher quality research by the owning organisation, although no reliable research quality inferences can be made from link data for individual research groups.
Yes the different search engines give different results and there is a good explanation. No search engine covers the whole web so the results are always underestimates and vary because of the size of each search engine's database and, to a lesser extent, its strategy for crawling the web and reporting results. But the results are suitable for comparisons between web sites of link counts.
Google is not recommended for links because it does not report all the links it knows about, only about 10% and this figure is highly variable. Also Google does not allow the exclusion of links from the same site (at least in its public interfaces), which is necessary for genuine comparisons between sites. Both Yahoo! and Windows Live Search are fine for link calculations, but Windows Live Search has currently (possibly temporarily) blocked access to its public link search features so can't be used. Hence Yahoo! and its partners like AltaVista seem to be the only choice at the moment.
The data is from the search engine Yahoo! and reflects the number of pages that Yahoo! has found (rather than the whole web) and not filtered out. Hence the link count numbers are normally underestimates, but it is nevertheless reasonable to compare link counts to different web sites.
Links between pages of the same site are often created for navigational purposes and in any case are less “impressive” indicators of recognition than links from other web sites. Hence it is standard practice in webometrics to exclude these from link counts.
Search engines normally return a maximum of about 1,000 results for a search. For searches with more results, a technique known as “automated query splitting” was used to get additional results. Essentially, this works by submitting a large set of derivative sub-queries and then combining the results.