Remove Duplicate Content from Your Web Site for Better SEO Results
Duplicate content can create a lot of problems for search engines, and so for the best search engine optimization (SEO) results, you should remove it from your Web site.
Content on the Web and on your own site can become duplicated either intentionally or by accident. Whatever the copycat’s motivation is, you don’t want people copying your original content if you can help it.
There are two basic types of duplicate content:
Outside-your-domain duplicate content. This type happens when two different Web sites have the same text.
Within-your-domain duplicate content. This second type refers to Web sites that create duplicate content within their own domain (the root of the site’s unique URL, such as www.domain.com).
Sites can end up having within-your-domain duplicate content due to their own faulty internal linking procedures, and often Webmasters don’t even realize they have a problem. If two or more pages within your own site duplicate each other, you inadvertently diminish the possibility of one or the other being included in search results.
You can end up with duplicate content within your own site for a variety of reasons, such as having multiple URLs all containing the same content; printer-friendly pages; pages that get built on the fly with session IDs in the URL; using or providing syndicated content; problems caused by using localization, minor content variations, or an unfriendly content management system; and archives.
You should always stick with the best practice of having unique, original content throughout your site. Stay away from the edges of what might be all right with the search engines and play within the safe harbor.
To keep your site in the safe harbor, here are some ways you can avoid or remove duplicate content from within your own Web site:
Title, Description, Keywords tags. Make sure that every page has a unique Title tag, Meta description tag, and Meta keywords Meta tag in the HTML code.
Heading tags. Make sure the heading tags (labeled H#) within the body copy differ from other pages’ headings. Keeping in mind that your headings should all use meaningful, non-generic words makes this a bit easier.
Repeated text, such as a slogan. If you have to show a repeated sentence or paragraph throughout your site, such as a company slogan, you should consider putting the slogan into an image on most pages. Pick the one Web page that you think should rank for that repeated content and leave it as text on that page so that the search engines can spider it. If anyone tries to search for that content, the search engines can find that unique content on the page you selected.
For example, if you have a classic car customization Web site that uses the slogan, “We restore the rumble to your classic car,” you probably want to display that throughout your site. But you should prevent the search engines from seeing the repetition. Leave it as HTML text on just one page, like your home page or the About Us page. Then everywhere else, just create a nifty graphic that lets users see the slogan, but not search engines.
Site map. Be sure that your site map (a page containing links to the pages in your site, like a table of contents) includes links to your preferred page’s URL, in cases where you have similar versions. The site map helps the search engines understand which page is your canonical (best or original) version. Matt Cutts, head of Google's Web Spam team, defines canonicalization as "the process of picking the best URL when there are several choices." The canonical URL is the one that is chosen at the end of the process, with all others being considered duplicates (non-canonical.)
Consolidate similar pages. If you have whole pages that contain similar or identical text, decide which one you want to be the canonical page for that content. Then combine pages and edit the content as needed.
If you do need to consolidate pages to a single, canonical page, a few precautions are in order (see the below numbered step list for details). You don’t want to accidentally wipe out any link equity you may have accumulated. Link equity refers to the perceived-expertise value of all the inbound links pointing to your Web page. You also don’t want to cause people’s links and bookmarks suddenly to break if they try to open your old page.
When consolidating two pages to make one your main, canonical version, take these precautions:
Check for inbound links.
Do a “link:domain.com/yourpage.html” search in Google or use the backlink checker in Yahoo Site Explorer to find out who’s already linked to your page. If one version has 15 links and the other version has 4,000, you know which one to keep: the one that 4,000 people access.
Update your internal links.
Make sure that your site map and all other pages in your site no longer link to the page you decided to remove.
Set up a 301 redirect.
When you take down the removed page’s content, put in its place a 301 redirect, which is a type of HTML command that automatically reroutes any incoming link to the URL with the content that you want to retain.