Search Engine Queries for Webometrics

This page summarises the main queries useful for webometric purposes in the major commercial search engines. Thanks Kim Holmberg, David Stuart and Liwen Vaughan and Isdro Aguillo for suggestions on earlier versions of this page.

February 2012: Hyperlink searches (e.g., linkdomain, link) no longer work in any of the major search engines except for linkfromdomain: in Bing (see below), and link: in Google (see below). Bing now has link search facilities similar to Yahoo!'s former Site Explorer (thanks to Han Woo Park for pointing this out). No APIs seem to give automatic access to link searches, with the exception of Bing's linkfromdomain: The best current (non-API) source of link data seems to be blekko.com (thanks to Rasmus Hagen for pointing this out). You have to register and log in to access the SEO tools link and access lists of links.

Evaluating search engine queries: A very important part of a webometric study that is rarely reported in academic papers is the testing and evaluation of search engine queries. This involves the following stages:

  1. Identification of query syntax and formulation of queries. For this, the URL of the search engine help page listing the syntax should be reported. In a few cases the syntax is not reported by search engines and so another source should be reported instead (e.g. a recent journal article or blog post).
  2. Description of queries used. This includes both the queries themselves and the types of web page that the queries are intended to match (e.g., the query site: wlv.ac.uk matches all pages with domain names ending in wlv.ac.uk).
  3. Testing the queries. This involves taking a sample of the queries (e.g., 10) and running them in the search engine and checking that the results are correct in the sense described in (2). A sample of 10 results should be enough for the check. If many results are incorrect then the query syntax is incorrect and must be fixed. For example, if you think that the query linkdomain:wlv.ac.uk matches pages containing a link to the University of Wolverhampton website then the above process will show you that it does not.
  4. Repeat 1-3 above for every search engine that you use because they do not follow the same rules.
  5. Write up the results of 1-4 above for any student projects because this is an important part of your work. For a research paper, a few sentences summarising the results are enough.

History: Yahoo! Site Explorer has shut down. Yahoo! is now owned by Bing and seems to be fully integrated into Bing now. Bing has stopped most of its link searches and Google has shut down its university API. The information below only relates to Google and Bing since Yahoo! and AltaVista are now owned by Bing and give its results.

The queries below give "URL Citation" alternatives when link searches are not available. These are described in the papers below, amongst others. "Title mention" queries are also possible but not described here (see the first paper below). See also the discussion of link analysis with webometric analyst.

Webometric Search Engine Queries

Number of pages in a Web site that has its own domain name D, or directory/path d

Number of pages containing a link to a web site with domain name D excluding all pages in the site D

Number of pages containing a link to a web site with domain name D (including all pages in the site D)

Number of pages containing a link to a page http://P

Number of pages containing a link to a page http://P excluding all pages in the site D containing P (i.e., site inlinks)

Number of pages containing a link from a web site with domain name D excluding all pages in the site D containing P (i.e., site outlinks)

Number of pages containing a link to any pages in both of two specified web sites with domain names D1 and D2, and excluding all pages in the sites D1 and D2 (i.e., co-inlinks)

Number of pages linked to by both of two specified web sites with domain names D1 and D2, and excluding all pages in the sites D1 and D2 (i.e., co-outlinks)

Tips

For a web site with domain name starting with www. remove the initial www. from D before running any of the searches above. This is important for big web sites with many domain names.

In all the above cases, where URL citation queries are described, it may be possible to substitute them with title mention queries, which may work better for organisations with distinctive names.