Use Site Checker to See a Web Server the Way a Search Engine Spider Does

You can find out how a competitor’s Web server looks to a search engine spider by running the Site Checker utility. This allows you to look at the actual process that displays the page, which is on the server level.

Generally, an SEO-friendly site should be free of server problems such as improper redirects (a command that detours you from one page to another that the search engine either can't follow or is confused by) and other obstacles that can stop a search spider in its tracks. When you run the Site Checker utility, it attempts to crawl the site the same way a search engine spider does and then spits out a report. If you use the free Site Checker that is part of the SEOToolSet on http://www.seotoolset.com, the report lists any indexing obstacles it encounters, such as improper redirects, robot disallows, cloaking, virtual IPs, block lists, and more. Even if a page’s content is perfect, a bad server can keep it from reaching its full potential in the search engine rankings.

You can use any Site Checker tool you have access to, but this example uses the Site Checker mentioned above:

  1. Go to http://www.seotoolset.com/tools/free_tools.html.

  2. Under the heading Site Checker, enter the URL of the site you want to check in the Web Page text box, and then click the Site Checker button.

    The first page of the Site Checker report for a classic cars Web page
    The first page of the Site Checker report for a classic cars Web page

    This example runs the Site Checker report for a classic cars site, as shown in the above figure.

In the report shown above, you can see that they have a Sitemap.xml file which serves to direct incoming bots. The more important item to notice, however, is the number 200 that displays in the Header Info section. This is the site’s server status code, and 200 means their server is A-okay and is able to properly return the page requested.

The chart in the following table explains the most common server status codes. These server statuses are standardized by the World Wide Web Consortium (W3C), so they mean the same thing to everyone. (The official definitions can be found on their site at www.w3.org/Protocols/rfc2616/rfc2616-sec10.html if you want to research further.) This table boils down the technical language into understandable English to show you what each server status code really means to you.

Server Status Codes and What They Mean
Code Description Definition What It Means (If It’s on a Competitor’s Page)
200 OK The Web page appears as expected. The server and Web page have the welcome mat out for the search engine spiders (and users too). This is not-so-good news for you, but it isn’t surprising either because this site ranks well.
301 Moved Permanently The Web page has been redirected permanently to another Web page URL. When a search engine spider sees this status code, it simply moves to the appropriate other page.
302 Found (Moved Temporarily) The Web page has been moved temporarily to a different URL. This status should raise a red flag. Although there are supposedly legitimate uses for a 302 Redirect code, they can cause serious problems with search engines and could even indicate something malicious is going on. Spammers frequently use 302 Redirects.
400 Bad Request The server could not understand the request because of bad syntax. This could be caused by a typo in the URL. Whatever the cause, it means the search engine spider is blocked from reaching the content pages.
401 Unauthorized The request requires user authentication. The server requires a login in order to enter the page requested.
403 Forbidden The server understood the request, but refuses to fulfill it. Indicates a technical problem that would cause a roadblock for a search engine spider. (This is all the better for you, although it may only be temporary).
404 Not Found The Web page is not available. You’ve seen this error code; it’s the Page Can Not Be Displayed page that displays when a Web site is down or nonexistent. Chances are that the Web page is down for maintenance or having some sort of problem.
500 and higher Miscellaneous Server Errors The 500–505 status codes indicate that something’s wrong with the server.

The other thing you want to glean from the Site Checker report is whether the page is cloaked. The Cloak Check runs through the site identifying itself as five different services — Internet Explorer, Mozilla Firefox, Googlebot, Slurp, and MSNbot — to ensure that they all match.

Cloak Check information from the Site Checker report
Cloak Check information from the Site Checker report

To manually detect whether a competitor’s site uses cloaking (showing one version of a page’s content to users, but a different version to the spiders), you need to compare the spiderable version to the version that you are viewing as a user. So do a search that you know includes that Web page in the results set, click the Cached link under that URL when it appears. This shows you the Web page as it looked to the search engine the last time it was spidered. Keeping in mind that the current page may have been changed a little in the meantime, compare the two versions. If you see entirely different content, you’re probably looking at cloaking.

blog comments powered by Disqus
Advertisement

Inside Dummies.com