Invite Search Engine Spiders to Index Your Web Site
You may discover that important pages on your Web site haven’t been indexed on a search engine. In this case, you can invite the search engine spiders to your site, to travel all of your internal links and index your site contents. What follows are several effective ways you can deliver an invitation to the search engine spiders:
External links. Have a link to your missing page added to a Web page that gets crawled regularly. Make sure that the link’s anchor text relates to your page’s subject matter. Ideally, the anchor text should contain your page’s keywords. Also, the linking page should relate to your page’s topic in some way so the search engines see it as a relevant site. After the link is in place, the next time the spiders come crawling, they follow that link right to your page. This sort of natural discovery process can be the quickest, most effective way to get a page noticed by the search engines.
Direct submission. Each search engine provides a way for you to submit a URL, which then goes into a queue waiting for a spider to go check it out. It’s not a fast or even reliable method to get your page noticed, but it won’t hurt you to do it.
Internal links. You should have at least two links pointing to every page in your Web site. This helps ensure that search engine spiders can find every page.
Site map. You should have a site map for your users, but for the search engines, you want to create another site map in XML (eXtensible Markup Language) format. Make sure that your XML site map contains the URL links to the missing pages, as well as every other page you want indexed. When a search engine spider crawls your XML site map, it follows the links and is more likely to thoroughly index your site.
The two versions of your site map provide direct links to your pages, which is helpful for users and important for spiders. Search engines use the XML site map file as a central hub for finding all of your pages. But the user’s site map is also crawled by the search engines. If the site map provides valuable anchor text for each link (for example, Frequently Asked Classic Car Questions instead of FAQs), it gives search engines a better idea of what your pages are about. Google specifically states in their guidelines that every site should have a site map.
There is a limit to the number of links you should have on the user-viewable site map. Small sites can place every page on their site map, but larger sites should not. Having more than 99 links on a page looks suspicious to a search engine because spammers have tried to deceive the search engines by setting up link farms for profit, which are just long lists of unrelated hyperlinks on a page. So just include the important pages, or split it into several site maps, one for each main subject category.
Unlike a traditional site map, an XML site map doesn’t have a 99 link limit. There are still some limitations, but the file is meant to act as a feed directly to the search engines. For full details on how to create an XML site map, visit sitemaps.org, the official site map guideline site run by the search engines.
The server record of every time a page could not be displayed on a Web site.
A comparison between the old version of a Web site (A) and the new version (B) to see which one performs better.
The measure of users that a Web site attracts. Acquisition metrics come from Web analytics data and include percent of new visitors, average number of visits per visitor, and average pages viewed per visitor.
One or more ads that target a set of grouped keywords.
Special terms that you can insert in your search engine query to find specific types of information that a general search can't provide.
A mathematical formula a search engine uses to establish which Web pages are the most relevant to a user’s search query. Algorithms can be fairly simple or multi-layered and complex.
The words that make up a hyperlink that a user clicks.
A storage area on a server where older Web content is out of the way, but still accessible.
A term used for a fake grassroots market campaign; this derives its name from AstroTurf, which is artificial grass.
A Web site that has been banished from a search engine index because they were caught spamming or using other sneaky methods to fool the search engines into giving them a better ranking.
The rate at which data moves from one point to another over an Internet connection.
A search engine that tries to guess what exactly a user is looking for based upon their previous search inquiries.
A list of sites suspected of illegal acts such as child pornography, e-mail spam, or hacking.
The integration of different content types onto a search results page, such as images, videos, news, blogs, books, maps, and so on.
Short for ‘Web log.’ An online conversation medium used to cover a wide range of subjects, including entertainment, politics, fashion, lifestyles, and technology. A blog can be anything from a personal journal to a media and corporate platform for describing new products and services.
A measure of the percentage of people who leave a Web site right after entering a page on that site, usually within seconds and without visiting any other page on the site.
Establishing a company name and associating it with that business; well-known examples include Nike and Xerox.
A software application that enhances a Web browser’s existing features.
The stored version of a Web page.
In the case of Web sites with duplicate content, the site the Webmaster prefers visitors and search engines to access. The primary, main Web site.
An acronym for Common Gateway Interface.
A Web icon or link that can be clicked to submit or vote for an article on a particular social news or bookmarking site.
A report that overlays Web pages on a site and indicates, on a per-page basis, which links visitors are clicking on to go to other pages on the site. Often, the most clicked-on links are bigger, have a richer color, or a note indicating how many clicks the link received.
The frequency (number of times) that an Internet ad is clicked on.
A technique in which the content of a Web page presented to a search engine spider is different than that presented to a user’s browser; as a result, the spiders see one page, while the user sees something entirely different.
An acronym for Content Management System.
Managing an organization’s reputation and building their brand via social networks.
Run words together without spaces.
Software that helps you create, edit, and manage a Web site.
A technique of organizing a Web site into subject themes by linking related pages together.
Writing the HTML for a Web site in such a way that the page content is delivered to the search engine spiders before any scripts or navigation elements.
Distribution of a Web site’s content to other sites.
An action taken by a Web site visitor that meets the sales or business goal of that site. This is also a term for site visitors who become customers.
The path that a visitor to a Web site takes to perform a desired action, most commonly a purchase.
Basic processes on a Web site that can be measured to determine whether visitors are performing desired actions, or abandoning the site.
The number of visitors who actually purchase something, sign up, or take whatever action is appropriate on a Web site.
A small text file that a Web server automatically assigns to a user's browser; cookies are used to track, authenticate, and maintain specific information about users.
The amount an advertiser pays each time their Web ad is clicked on a search engine results page.
The last part of an Internet domain name; specifically, the letters that follow the final dot of any domain name. A country-code TLD is specific to a particular country, such as .ca, .cn, .uk, or .mx.
A separate file that is used to control formatting of text and images on a Web site.
A feature that allows a Web advertiser to specify when during the day their advertising is shown on search engine results pages.
A type of coded command that redirects a browser to a different location than what was expected via the Web link that the user clicked.
A Web server that is used exclusively by one Web site, and not shared with any other sites.
User data such as gender, age, and so on.
Users who type a URL directly into their Web browser’s address bar.
A list of Web sites a search engine can search through that’s typically compiled by people, rather than by computer programs.
The placement of a keyword throughout a Web page.
The base address of a Web site, such as mydomain.com.
A company accredited and authorized to register Internet domain names.
A Web page submitted to search engine spiders that has been designed to satisfy the specific algorithms for various search engines, but is not intended to be viewed by visitors. A doorway page is usually filled with text content that makes it rank high for a certain keyword.
Identical or similar content that appears elsewhere on a Web site or on the Web.
Any Web site that sells a particular product or service.
A big word that is so laborious to type and so obscure in usage that only a very serious searcher would think of entering it in a search engine query.
Unsolicited e-mail that is sent indiscriminately to many people.
Converted into a form that cannot be understood except by knowing one or more secret decryption keys.
Any type of interactive media object on a Web site that engages the Web site visitors’ interest. Examples include images, videos, audio, games, and applications.
A message that a Web server sends to a visiting browser when something goes wrong.
The last page a visitor is on before they leave a Web site.
The navigational links found at the bottom of a Web page; these usually include the organization’s top navigation links, legal information, and a site map.
An HTML technique for combining multiple documents in different sub-windows, all within a single browser window.
The identification of a Web page as belonging to or being relevant for a particular country.
A technique used by a search engine to personalize search results to include local listings for search terms, based on a computer’s IP address. This technique is often used when a person searches for items that involve brick-and-mortar businesses or services that need to be provided locally.
A visual heat map showing how people’s eyes scan a search results page and how long they look at a particular result before moving on.
A Web search where the search engine makes suggestions on queries; examples include Google or Yahoo! Suggests.
Attempt to break into computer networks and bypass their security.
A third-party company that leases out Web space by the month or year, similar to office space.
A control file that allows server configuration changes on a per-directory basis. The file controls that directory and all of the subdirectories contained within it. Usually, this file is placed in the root folder of a Web site.
The number of times an ad appears in search results to search engine users.
The database of Web sites that a search engine pulls results from in response to search queries.
The process of taking raw data and categorizing it, removing duplicate information, and generally organizing it all into an accessible structure (think filing cabinet versus paper pile).
A search engine that searches only a particular Web site.
The main hub connections of the Internet, which are primarily located in major cities around the world (Los Angeles, Denver, New York, and so on).
A message board where users can log on and post about topics on a related subject.
The numeric code that identifies the logical address where a computer, server, or site resides on the Web.
An acronym for Internet Service Provider.
A scripting language used to add functionality to Web sites.
The yardstick by which an organization measures the success of its Web site. KPIs are based on a company’s overall business goals, and the role their Web site plays in achieving those goals. KPIs are specific to a company, and are not influenced by industry averages or competitors' KPIs.
A word relating to Web site content that search engines use to determine whether the site is relevant.
The measurement of the number of times a keyword or keyword phrase appears on a Web page, compared to the total number of words on the page.
A search query containing two or more words that Web page content relates to.
Putting every word, not just relevant ones, into the content of a keyword tag; repeating keywords over and over again.
The Web page that a visitor first goes to when clicking on an ad in a search engine.
A search engine specializing in Web sites that are tied to a limited physical area (also known as a geotargeted area).
Similar to a cookie, an LSO is a text file that can be read only by the Web site creating it. Browsers and anti-spyware programs can't delete them, and most users don't know how.
A statistical concept that says that items that are in comparatively low demand can nonetheless add up to quite large volumes.
Keywords, or search queries, made up of several words or a phrase.
Formatting and other types of HTML code, such as Font tags to define the font style.
An HTML command in the head section (top part) of a Web page’s HTML code that gives instructions to search engine spiders whether to index the page and whether to follow its links.
Text that is added to the HTML of a page to describe it for a search engine.
The automatic reloading of a Web page.
A search engine that does not maintain a database of its own, but instead combines results from multiple search engines.
A quantitative measure of something, such as a process, rate, or amount.
A full copy of a Web page or site.
A program that measures the keyword density of multiple Web pages.
Testing small changes to a Web site, such as the change of a certain font, or a button instead of an arrow. Typically, this involves testing many small changes to the same page at once instead of two totally separate pages, as in A/B testing.
A list that is made up of the keywords that an advertiser does not want their ads to show up for.
A software program that allows users to subscribe to and read RSS feeds.
Generic text that isn’t customized for a Web site’s various subject themes and keywords.
In A/B testing, a test that is run on two A pages (pages on which no changes have been made) in order to establish a baseline and make sure the traffic isn’t adversely affected.
Web sites where people can meet and interact with one another; examples include MySpace and Facebook.
The HTML tags and the visible content on a Web page.
The changing of the underlying code of a Web page for the purpose of search engine optimization.
A term for a Web site that allows anyone to access and edit its content.
A computer program whose source code is available for free to the public.
Web pages that the search engines find on their own using their spiders. These results are not paid for by advertisers.
Google’s patented algorithm that assigns weight to a page based on the number, quality, and authority of links to and from the page (along with other factors).
Ads or sponsored links that site owners have paid a search engine to display on its results pages, based on keywords.
Automatically generated URL characters that carry information to the receiving Web page about a user.
Reviewing the flow, page by page, that a user takes while visiting a Web site.
Paid advertising that appears in search results, for which advertisers pay a fee every time their ad is clicked.
A profile that represents a target audience based on calculated averages of their buying processes and demographics.
An acronym for Pay Per Click.
A separate version of a Web page that is designed specifically for printing. A printer-friendly page contains the same content as a regular Web page, but without the heavy images and advertisements that require a lot of printer ink.
The parts of a URL that pass data to a Web page. Query strings aren’t readable because they contain symbols (such as ?, &, and +) as well as codes, session IDs, and so on.
The degree to which a Web site reaches customers. Measures of reach include overall traffic volumes, number of visits, number of new visitors, ratio of new to returning visitors, and visitor geographic data. Reach metrics depend on information from various sources.
A method of distributing links to new content in a Web site; the recipients are people who’ve subscribed to the RSS feed for that site.
An HTML command that automatically forwards incoming links to a different Web page.
An element that visitors are responding to on a Web site, such as an image, a newsletter, or an e-mail.
How fast a Web server is and how long it takes for users to load a page from that server.
A measure of how many customers remain once they come to a Web site.
Looking at what Web server a user is coming from.
Content that enhances a Web page, such as images, video, and audio.
An acronym for Return On Investment.
An acronym for Really Simple Syndication.
News feeds that automatically show updates to a Web site that offers RSS content.
The ability to expand Web server resources as needed.
A powerful technology that enables a Web designer to use a wide variety of fonts on their Web pages without sacrificing search-engine friendliness or accessibility.
A person who sends a robot to a Web site to copy (or scrape) the entire site and then republish it as their own.
The text box in a search engine page where users type their search queries, or whatever it is that they’re looking for.
A Web application designed to hunt for specific keywords and group them according to relevance.
The word or phrase that a user types into the search box of a search engine.
Links to specialized vertical search engines that narrow a search into a specific type of result, such as images or news. Clicking one of these links takes a user to a results page with only images or only news.
Testing the variables in incoming Web traffic, such as finding out the demographics of the incoming traffic by asking them to answer certain questions.
An acronym for Search Engine Optimization.
An acronym for Search Engine Results Page.
The software and hardware that runs a Web site.
Records that measure the amount of traffic that a Web site receives. A server automatically creates a server log of all the activity it performs during a given time period, be it hours, days, or minutes.
The current time period a user is active on a Web site after logging on.
The process of organizing a Web site’s content into distinct subject categories, in order to group related content. Each silo has its own landing page and supporting pages.
A Web page containing links to the pages in that Web site, similar to a table of contents.
Link structure for moving around a Web site.
A measure of how quickly visitors leave a Web page.
Any sort of online environment that allows social interaction, including blogs, social news sites like Digg and Reddit, social networking sites like MySpace and Facebook, and others.
A Web site where people can meet and interact with one another.
A site where news stories and articles from anywhere on the Web can be voted on by users, and the importance of a story or article is determined by the audience rather than by the editors of the site or source. Examples include Digg, StumbleUpon, and Reddit.
The plain HTML code used to create a Web page.
Any tactic or Web page that is used to deceive a search engine into a false understanding of what the whole Web site is about or its importance.
A small program that search engines use to read and rank Web sites. Spiders are also referred to as robots, bots, or crawlers.
A situation where a spider gets caught in an endless loop and is forced to abandon the Web page because it has no other alternative.
Using multiple forms of a word (such as customize instead of customization).
A measure of how long a user stays on a Web page.
Very common words such as the, a, to, if, who, and so forth, which serve to connect ideas in search phrases but don't add much in the way of meaning to the phrases’ content. Stop words are ignored by search engines.
A dependent domain set up within a primary domain. For example, in http://events.classiccarcustomization.com/, events is the subdomain, .classiccarcustomization is the domain, and .com is the top-level domain.
Web traffic that is interested in a site’s product or service, and is thus more likely to provide conversions to the site.
The (usually attractive and styled) links found at the top of a Web page, such as the Home Page, About Us, Contact Us, and specific category links; these top links are also referred to as global navigation.
The root of a Web site’s URL.
The number of visitors a Web site receives.
The act of deliberately being rude and offensive just to make people angry on blogs and other Web forums; trolling generally gets a user banned from the blog or site.
An acronym for User Generated Content.
The percentage of time a Web site is up and running, not including scheduled maintenance periods.
An acronym for Uniform Resource Locator.
Information describing whether a visitor to a Web site is a person or a search engine robot.
A process that identifies spiders coming to a Web site, enabling the Webmaster to keep out bad spiders.
An easy-to-remember Web address used to market a specific product, person, or service.
Search engines that restrict their search by industry, geographic area, or file type.
An organizational scheme in a Web site that connects related pages using links instead of moving related Web pages into new directories.
Following HTML standards set by the World Wide Web Consortium (W3C).
The study of visitor activity on an individual Web site.
The complete records of a user’s previous Google searches and the Web sites they’ve visited or bookmarked.
The measurement of what's happening on the Internet, such as the number and types of people online; the number of broadband versus dial-up connections, advertisers, and advertisements (shapes, sizes, level of annoyance); and all things related to the Internet as a whole.
A collection of Web sites from around the Internet that join together through interlinking in a circular structure.
The combination of software and hardware that runs a Web site. A Web server receives each user request and serves back the requested pages to the user’s browser.
A list of Web sites or agents (such as spiders) that are approved to enter a Web site.
A piece of HTML code that can be embedded in a Web page and that a user can interact with.
A type of information site that contains all user-generated content, such as Wikipedia.
An international consortium where member organizations, a full-time staff, and the public work together to develop Web standards.
An acronym for eXtensible Markup Language.
A document designed specifically to be readable by a search engine.