Use Site Checker to See a Web Server the Way a Search Engine Spider Does

You can find out how a competitor’s Web server looks to a search engine spider by running the Site Checker utility. This allows you to look at the actual process that displays the page, which is on the server level.

Generally, an SEO-friendly site should be free of server problems such as improper redirects (a command that detours you from one page to another that the search engine either can’t follow or is confused by) and other obstacles that can stop a search spider in its tracks. When you run the Site Checker utility, it attempts to crawl the site the same way a search engine spider does and then spits out a report. If you use the free Site Checker that is part of the SEOToolSet on http://www.seotoolset.com, the report lists any indexing obstacles it encounters, such as improper redirects, robot disallows, cloaking, virtual IPs, block lists, and more. Even if a page’s content is perfect, a bad server can keep it from reaching its full potential in the search engine rankings.

You can use any Site Checker tool you have access to, but this example uses the Site Checker mentioned above:

  1. Go to http://www.seotoolset.com/tools/free_tools.html.

  2. Under the heading Site Checker, enter the URL of the site you want to check in the Web Page text box, and then click the Site Checker button.

    The first page of the Site Checker report for a classic cars Web page

    The first page of the Site Checker report for a classic cars Web page

    This example runs the Site Checker report for a classic cars site, as shown in the above figure.

In the report shown above, you can see that they have a Sitemap.xml file which serves to direct incoming bots. The more important item to notice, however, is the number 200 that displays in the Header Info section. This is the site’s server status code, and 200 means their server is A-okay and is able to properly return the page requested.

The chart in the following table explains the most common server status codes. These server statuses are standardized by the World Wide Web Consortium (W3C), so they mean the same thing to everyone. (The official definitions can be found on their site at www.w3.org/Protocols/rfc2616/rfc2616-sec10.html if you want to research further.) This table boils down the technical language into understandable English to show you what each server status code really means to you.

Server Status Codes and What They Mean
Code Description Definition What It Means (If It’s on a Competitor’s Page)
200 OK The Web page appears as expected. The server and Web page have the welcome mat out for the search
engine spiders (and users too). This is not-so-good news for you,
but it isn’t surprising either because this site ranks
well.
301 Moved Permanently The Web page has been redirected permanently to another Web
page URL.
When a search engine spider sees this status code, it simply
moves to the appropriate other page.
302 Found (Moved Temporarily) The Web page has been moved temporarily to a different
URL.
This status should raise a red flag. Although there are
supposedly legitimate uses for a 302 Redirect code, they can cause
serious problems with search engines and could even indicate
something malicious is going on. Spammers frequently use 302
Redirects.
400 Bad Request The server could not understand the request because of bad
syntax.
This could be caused by a typo in the URL. Whatever the cause,
it means the search engine spider is blocked from reaching the
content pages.
401 Unauthorized The request requires user authentication. The server requires a login in order to enter the page
requested.
403 Forbidden The server understood the request, but refuses to fulfill
it.
Indicates a technical problem that would cause a roadblock for
a search engine spider. (This is all the better for you, although
it may only be temporary).
404 Not Found The Web page is not available. You’ve seen this error code; it’s the Page Can Not
Be Displayed page that displays when a Web site is down or
nonexistent. Chances are that the Web page is down for maintenance
or having some sort of problem.
500 and higher Miscellaneous Server Errors The 500–505 status codes indicate that something’s
wrong with the server.

The other thing you want to glean from the Site Checker report is whether the page is cloaked. The Cloak Check runs through the site identifying itself as five different services — Internet Explorer, Mozilla Firefox, Googlebot, Slurp, and MSNbot — to ensure that they all match.

Cloak Check information from the Site Checker report

Cloak Check information from the Site Checker report

To manually detect whether a competitor’s site uses cloaking (showing one version of a page’s content to users, but a different version to the spiders), you need to compare the spiderable version to the version that you are viewing as a user. So do a search that you know includes that Web page in the results set, click the Cached link under that URL when it appears. This shows you the Web page as it looked to the search engine the last time it was spidered. Keeping in mind that the current page may have been changed a little in the meantime, compare the two versions. If you see entirely different content, you’re probably looking at cloaking.