Search Engine Queries for Webometrics
This page summarises the main queries useful for webometric purposes in the major commercial search engines. Thanks Kim Holmberg, David Stuart and Liwen Vaughan and Isdro Aguillo for suggestions on earlier versions of this page.
November 2017: Hyperlink searches (e.g., linkdomain, link, linkfromdomain) no longer work in any of the major search engines. No APIs seem to give automatic access to link searches. The best current (non-API) source of link data seems to be blekko.com (thanks to Rasmus Hagen for pointing this out). You have to register and log in to access the SEO tools link and access lists of links.
October 2018: Bing phrase searches now sometimes need + before them to ensure accurate matches. This solution was discoved by Profesor Enara Zarrabeitia Bilbao.
Evaluating search engine queries: A very important part of a webometric study that is rarely reported in academic papers is the testing and evaluation of search engine queries. This involves the following stages:
- Identification of query syntax and formulation of queries. For this, the URL of the search engine help page listing the syntax should be reported. In a few cases the syntax is not reported by search engines and so another source should be reported instead (e.g. a recent journal article or blog post).
- Description of queries used. This includes both the queries themselves and the types of web page that the queries are intended to match (e.g., the query site: wlv.ac.uk matches all pages with domain names ending in wlv.ac.uk).
- Testing the queries. This involves taking a sample of the queries (e.g., 10) and running them in the search engine and checking that the results are correct in the sense described in (2). A sample of 10 results should be enough for the check. If many results are incorrect then the query syntax is incorrect and must be fixed. For example, if you think that the query linkdomain:wlv.ac.uk matches pages containing a link to the University of Wolverhampton website then the above process will show you that it does not.
- Repeat 1-3 above for every search engine that you use because they do not follow the same rules.
- Write up the results of 1-4 above for any student projects because this is an important part of your work. For a research paper, a few sentences summarising the results are enough.
History: Yahoo! Site Explorer has shut down. Yahoo! is now owned by Bing and seems to be fully integrated into Bing now. Bing has stopped most of its link searches and Google has shut down its university API. The information below only relates to Google and Bing since Yahoo! and AltaVista are now owned by Bing and give its results.
The queries below give "URL Citation" alternatives when link searches are not available. These are described in the papers below, amongst others. "Title mention" queries are also possible but not described here (see the first paper below). See also the discussion of link analysis with webometric analyst.
- Thelwall, M., Sud, P., & Wilkinson, D. (2012). Link and co-inlink network diagrams with URL citations or title mentions. Journal of the American Society for Information Science and Technology, 63(4),805-816.
- Thelwall, M. (2011). A comparison of link and URL citation counting. ASLIB Proceedings, 63(4), 419-425.
- Kousha, K. & Thelwall, M. (2007). Google Scholar citations and Google Web/URL citations: A multi-discipline exploratory analysis, Journal of the American Society for Information Science and Technology, 57(6), 1055-1065.
- Stuart, D. & Thelwall, M. (2006). Investigating triple helix relationships using URL citations: A case study of the UK West Midlands automobile industry. Research Evaluation , 15(2), 97-106 .
- Kousha, K. & Thelwall, M. (2006). Motivations for URL citations to open access LIS library and information science articles: Exploring characteristics of sources of Web citation. Scientometrics , 68(3), 501-517.
Webometric Search Engine Queries
Number of pages in a Web site that has its own domain name D, or directory/path d
- Bing, Google: site:D
- The site: command does not just work for domain names, it also works on directories or paths, and even on full URLs (although there is little point for full URLs.
Number of pages containing a link to a web site with domain name D excluding all pages in the site D
- Google, Bing: Not possible.
- Alternatives
- An alternative is the URL citation query "http://D" -site:D. E.g., "http://www.wlv.ac.uk" -site:wlv.ac.uk matches pages outside the wlv.ac.uk web site that contain a page containing "http://www.wlv.ac.uk. This URL citation may or may not be a hyperlink.
Number of pages containing a link to a web site with domain name D (including all pages in the site D)
- Not possible with any search engine. Try blekko.com SEO tools.
- An alternative is the URL citation query "http://D". E.g., "http://www.wlv.ac.uk" matches pages outside the wlv.ac.uk web site that contain a page containing http://www.wlv.ac.uk. This URL citation may or may not be a hyperlink.
Number of pages containing a link to a page http://P
Number of pages containing a link to a page http://P excluding all pages in the site D containing P (i.e., site inlinks)
- Google, Bing: Not possible
- Google alternative: The URL citation query "http://P" -site:D. E.g., "http://www.ncbi.nlm.nih.gov/pubmed/" -site:nih.gov matches pages outside the nih.gov web site that contain a page containing http://www.ncbi.nlm.nih.gov/pubmed/. This URL citation may or may not be a hyperlink.
- Bing alternative: The URL citation query "http://P" but you will have to manually filter out pages from D. E.g., "http://www.ncbi.nlm.nih.gov/pubmed/" matches pages that contain a page containing http://www.ncbi.nlm.nih.gov/pubmed/. This URL citation may or may not be a hyperlink. The -site: command no longer works in Bing (as identified by Carlos Vílchez-Román in May 2014). Bing does not allow -site: searches for site inlinks - links from the same site must be manually filtered out.
Number of pages containing a link from a web site with domain name D excluding all pages in the site D containing P (i.e., site outlinks)
- Google, Bing: Not possible any more
- Equivalent URL citation-based queries: not possible.
Number of pages containing a link to any pages in both of two specified web sites with domain names D1 and D2, and excluding all pages in the sites D1 and D2 (i.e., co-inlinks)
- Google, Bing: Not possible.
- Google URL citation alternative: The URL citation query "http://D1" "http://D1" -site:D1 -site:D2. E.g., "http://www.wlv.ac.uk" "http://www.ox.ac.uk" -site:wlv.ac.uk -site:ox.ac.uk matches pages outside the wlv.ac.uk and ox.ac.uk web sites that contain a page containing http://www.wlv.ac.uk and http://www.ox.ac.uk. URL citations may or may not be hyperlinks. Note that D1 and D2 in -site:D1 -site:D2 could be replaced by the main site of the domain name (e.g., bbc.co.uk for news.bbc.co.uk or www.bbc.co.uk) to avoid extra self-links.
- Bing URL citation alternative: The URL citation query "http://D1" "http://D1" but you will have to manually filter out pages from D1 and D2. E.g., "http://www.wlv.ac.uk" "http://www.ox.ac.uk" matches pages outside the wlv.ac.uk and ox.ac.uk web sites that contain a page containing http://www.wlv.ac.uk and http://www.ox.ac.uk. URL citations may or may not be hyperlinks. The -site: command no longer works in Bing (as identified by Carlos Vílchez-Román in May 2014).
Number of pages linked to by both of two specified web sites with domain names D1 and D2, and excluding all pages in the sites D1 and D2 (i.e., co-outlinks)
- Google, Bing: Not possible
- Equivalent URL citation-based queries: not possible.
Tips
For a web site with domain name starting with www. remove the initial www. from D before running any of the searches above. This is important for big web sites with many domain names.
In all the above cases, where URL citation queries are described, it may be possible to substitute them with title mention queries, which may work better for organisations with distinctive names.