Avoid Using Session IDs in URLs to Improve Your Search Ranking

By Peter Kent

Session IDs can make search engine life interesting. A session ID identifies a particular person visiting the site at a particular time, which enables the server to track which pages the visitor looks at and which actions the visitor takes during the session.

If you request a page from a website,the web server that has the page sends it to your browser. Then if you request another page, the server sends that page, too, but the server doesn’t know that you’re the same person. If the server needs to know who you are, it needs a way to identify you each time you request a page. It does that by using session IDs.

Session IDs are used for a variety of reasons, but their main purpose is to allow web developers to create various types of interactive sites. For instance, if developers have created a secure environment, they may want to force visitors to go through the home page first. Or, the developers may want a way to resume an unfinished session.

By setting cookies containing the session ID on the visitor’s computer, developers can see where the visitor was in the site at the end of the visitor’s last session.

Session IDs are common when running software applications that have any kind of security procedure, that need to store variables, or that want to defeat the browser cache — that is, ensure that the browser always displays information from the server, never from its own cache. Shopping cart systems typically use session IDs — that’s how the system can allow you to place an item in the shopping cart and then go away and continue shopping. It recognizes you based on your session ID.

A session ID can be created in two ways:

  • Store it in a cookie.

  • Display it in the URL itself.

Some systems are set up to store the session ID in a cookie but then use a URL session ID if the user’s browser is set to not accept cookies. Here’s an example of a URL containing a session ID:

http://yourdomain.com/index.jsp;jsessionid=07D3CCD4D9A6A9F3CF9CAD4F9A728F44

The 07D3CCD4D9A6A9F3CF9CAD4F9A728F44 piece of the URL is the unique identifier assigned to the session.

If a search engine recognizes a URL as including a session ID, it probably doesn’t read the referenced page because each time the searchbot returns to your site, the session ID will have expired, so the server will do one of the following:

  • Display an error page rather than the indexed page or perhaps display the site’s default page. In other words, the search engine has indexed a page that isn’t there if someone clicks the link in the search results page.

  • Assign a new session ID. The URL that the searchbot originally used has expired, so the server replaces the ID with another one and changes the URL. So, the spider could be fed multiple URLs for the same page.

Even if the searchbot reads the referenced page, it may not index it. webmasters sometimes complain that a search engine entered their site, requested the same page over and over, and left without indexing most of the site. The searchbot simply got confused and left. Or, sometimes the search engine doesn’t recognize a session ID in a URL. One client had hundreds of URLs indexed by Google, but because they were all long-expired session IDs, they all pointed to the site’s main page.

These are all worst-case scenarios, as the major search engine’s searchbots do their best to recognize session IDs and work around them. Furthermore, Google recommends that if you are using session IDs, you use the canonical directive to tell the search engines the correct URL for the page. For instance, let’s say you’re using session IDs, and your URLs look something like this:

http://www.youdomain.com/product.php?item=rodent-racing-gear &xyid=76345&sessionid=9876

A search engine might end up with hundreds of URLs effectively referencing the same page. So, you can put the <link> tag in the <head> section of your web pages to tell the search engines the correct URL, like this:

<link rel=“canonical” href=“http://www.yourdomain.com/product.php?item= rodent-racing-gear “ />

Session ID problems are rarer than they once were; in the past, fixing a session ID problem was like performing magic: Sites that were invisible to search engines suddenly become visible! One site owner in a search engine discussion group described how his site had never had more than 6 pages indexed by Google, yet within a week of removing session IDs, Google had indexed over 600 pages.

If your site has a session ID problem, there are a couple of other things you can do, in addition to using the canonical directive:

  • Rather than use session IDs in the URL, store session information in a cookie on the user’s computer. Each time a page is requested, the server can check the cookie to see whether session information is stored there. However, the server shouldn’t require cookies, or you may run into further problems.

  • Get your programmer to omit session IDs if the device requesting a web page from the server is a searchbot. The server delivers the same page to the searchbot but doesn’t assign a session ID, so the searchbot can travel throughout the site without using session IDs. This process is known as user agent delivery, in which user agent refers to the device — browser, searchbot, or other program — that is requesting a page.

The user agent method has one potential problem: In the technique sometimes known as cloaking, a server sends one page to the search engines and another to real site visitors. Search engines generally don’t like cloaking because some websites try to trick them by providing different content from the content that site visitors see.

Of course, in the context of using this technique to avoid the session-ID problem, that’s not the intent; it’s a way to show the same content that the site visitor sees, so it isn’t true cloaking. However, the danger is that the search engines may view it as cloaking if they discover what is happening.