Search Engine Queries for Webometrics
This page summarises the main queries useful for webometric purposes in the major commercial search engines. Thanks Kim Holmberg, David Stuart and Liwen Vaughan and Isdro Aguillo for suggestions on earlier versions of this page.
February 2012: Hyperlink searches no longer work in any of the major search engines except for linkfromdomain: in Bing (see below), and link: in Google (see below). Bing now has link search facilities similar to Yahoo!'s former Site Explorer (thanks to Han Woo Park for pointing this out). No APIs seem to give automatic access to link searches, with the exception of Bing's linkfromdomain: The best current (non-API) source of link data seems to be blekko.com (thanks to Rasmus Hagen for pointing this out). You have to register and log in to access the SEO tools link and access lists of links.
History: Yahoo! Site Explorer has shut down. Yahoo! is now owned by Bing and seems to be fully integrated into Bing now. Bing has stopped most of its link searches and Google has shut down its university API. The information below only relates to Google and Bing since Yahoo! and AltaVista are now owned by Bing and give its results.
The queries below give "URL Citation" alternatives when link searches are not available. These are described in the papers below, amongst others. "Title mention" queries are also possible but not described here (see the first paper below). See also the discussion of link analysis with webometric analyst.
- Thelwall, M., Sud, P., & Wilkinson, D. (in press). Link and co-inlink network diagrams with URL citations or title mentions. Journal of the American Society for Information Science and Technology.
- Thelwall, M. (2011). A comparison of link and URL citation counting. ASLIB Proceedings, 63(4), 419-425.
- Kousha, K. & Thelwall, M. (2007). Google Scholar citations and Google Web/URL citations: A multi-discipline exploratory analysis, Journal of the American Society for Information Science and Technology, 57(6), 1055-1065.
- Stuart, D. & Thelwall, M. (2006). Investigating triple helix relationships using URL citations: A case study of the UK West Midlands automobile industry. Research Evaluation , 15(2), 97-106 .
- Kousha, K. & Thelwall, M. (2006). Motivations for URL citations to open access LIS library and information science articles: Exploring characteristics of sources of Web citation. Scientometrics , 68(3), 501-517.
Webometric Search Engine Queries
Number of pages in a Web site that has its own domain name D, or directory/path d
- Bing, Google: site:D
- The site: command does not just work for domain names, it also works on directories or paths, and even on full URLs (although there is little point for full URLs.
Number of pages containing a link to a web site with domain name D excluding all pages in the site D
- Google, Bing: Not possible.
- Alternatives
- An alternative is the URL citation query "http://D" -site:D. E.g., "http://www.wlv.ac.uk" -site:wlv.ac.uk matches pages outside the wlv.ac.uk web site that contain a page containing "http://www.wlv.ac.uk. This URL citation may or may not be a hyperlink.
Number of pages containing a link to a web site with domain name D (including all pages in the site D)
- Not possible with any search engine. Try blekko.com SEO tools.
- An alternative is the URL citation query "http://D". E.g., "http://www.wlv.ac.uk" matches pages outside the wlv.ac.uk web site that contain a page containing http://www.wlv.ac.uk. This URL citation may or may not be a hyperlink.
Number of pages containing a link to a page http://P
- Bing: not possible, Google: link:P Note that Google only gives a small sample of the links it has found.
- An alternative is the URL citation query "http://P". E.g., "http://www.ncbi.nlm.nih.gov/pubmed/" matches pages outside the wlv.ac.uk web site that contain a page containing http://www.ncbi.nlm.nih.gov/pubmed/. This URL citation may or may not be a hyperlink.
Number of pages containing a link to a page http://P excluding all pages in the site D containing P (i.e., site inlinks)
- Google, Bing: Not possible
- An alternative is the URL citation query "http://P" -site:D. E.g., "http://www.ncbi.nlm.nih.gov/pubmed/" -site:nih.gov matches pages outside the nih.gov web site that contain a page containing http://www.ncbi.nlm.nih.gov/pubmed/. This URL citation may or may not be a hyperlink.
Number of pages containing a link from a web site with domain name D excluding all pages in the site D containing P (i.e., site outlinks)
- Bing: linkfromdomain:D -or- linkfromdomain:S -site:S *But note that this command seems to have strange and unpredictable behaviour (as pointed out by Liwen Vaughan in Jan 2009) so please test it extensively before using it. It seems to automatically exclude links between different pages in the same domain but not between pages in different subdomains of the same site. A sample command linkfromdomain:wlv.ac.uk site:wlv.ac.uk should give thousands of results but only gave 1 (Jan 15, 2009), which I am totally unable to think of any plausable explanation for, unless it is a deliberate Microsoft attempt to stop the linkfromdomain: command from working with other commands like site:. David Minguillo pointed out that the command does not seem to work for subdomains - e.g., linkfromdomain:cybermetrics.wlv.ac.uk gives no results but linkfromdomain:wlv.ac.uk seemed correct (June 15, 2010).
- Google: Not possible
- Equivalent URL citation-based queries: not possible.
Number of pages containing a link to any pages in both of two specified web sites with domain names D1 and D2, and excluding all pages in the sites D1 and D2 (i.e., co-inlinks)
- Google, Bing: Not possible . Note that D1 and D2 in -site:D1 -site:D2 could be replaced by the main site of the domain name (e.g., bbc.co.uk for news.bbc.co.uk or www.bbc.co.uk) to avoid extra self-links.
- An alternative is the URL citation-based query "http://D1" "http://D1" -site:D1 -site:D2. E.g., "http://www.wlv.ac.uk" "http://www.ox.ac.uk" -site:wlv.ac.uk -site:ox.ac.uk matches pages outside the wlv.ac.uk and ox.ac.uk web sites that contain a page containing http://www.wlv.ac.uk and http://www.ox.ac.uk. URL citations may or may not be hyperlinks.
Number of pages linked to by both of two specified web sites with domain names D1 and D2, and excluding all pages in the sites D1 and D2 (i.e., co-outlinks)
- Bing: linkfromdomain:D1 linkfromdomain:D2 David Minguillo pointed out that the linkfromdomain command does not seem to work for subdomains - e.g., linkfromdomain:cybermetrics.wlv.ac.uk gives no results but linkfromdomain:wlv.ac.uk seemed correct (June 15, 2010).
- Google: Not possible
- Equivalent URL citation-based queries: not possible.
Tips
For a web site with domain name starting with www. remove the initial www. from D before running any of the searches above. This is important for big web sites with many domain names.
In all the above cases, where URL citation queries are described, it may be possible to substitute them with title mention queries, which may work better for organisations with distinctive names.