This site is for sale, Learn More

The Importance Of robots.txt

Control Search Engine Spiders

Originally Published: January 14, 2003

Editor's Note: The robots.txt file allows controlling search engine spiders on a per-page basis. Use the new (1/19/05) 'nofollow' anchor tag to control spiders for individual links.

What is robots.txt?

http://www.yourwebsite.com/robots.txt

How do I create a robots.txt file?

( Editor's Note: We have created the Simple robots.txt Creator to make creating robots files easy.)
User-agent: googlebot
Disallow: /cgi-bin/
User-agent: googlebot
Disallow: /support
User-agent: *
Disallow: /cgi-bin/

Where can I find user agent names?

Things you should avoid in robots.txt

  1. Don't use comments in the robots.txt file

    Although comments are allowed in a robots.txt file, they might confuse some search engine spiders.

    "Disallow: support # Don't index the support directory" might be misinterepreted as "Disallow: support#Don't index the support directory".


  2. Don't use white space at the beginning of a line. For example, don't write

    placeholder User-agent: *
    place Disallow: /support

    but

    User-agent: *
    Disallow: /support


  3. Don't change the order of the commands. If your robots.txt file should work, don't mix it up. Don't write

    Disallow: /support
    User-agent: *

    but

    User-agent: *
    Disallow: /support


  4. Don't use more than one directory in a Disallow line. Do not use the following

    User-agent: *
    Disallow: /support /cgi-bin/ /images/

    Search engine spiders cannot understand that format. The correct syntax for this is

    User-agent: *
    Disallow: /support
    Disallow: /cgi-bin/
    Disallow: /images/


  5. Be sure to use the right case. The file names on your server are case sensitve. If the name of your directory is "Support", don't write "support" in the robots.txt file.


  6. Don't list all files. If you want a search engine spider to ignore all files in a special directory, you don't have to list all files. For example:

    User-agent: *
    Disallow: /support/orders.html
    Disallow: /support/technical.html
    Disallow: /support/helpdesk.html
    Disallow: /support/index.html

    You can replace this with

    User-agent: *
    Disallow: /support


  7. There is no "Allow" command

    Don't use an "Allow" command in your robots.txt file. Only mention files and directories that you don't want to be indexed. All other files will be indexed automatically if they are linked on your site.

Tips and tricks for robots.txt:

1. How to allow all search engine spiders to index all files

2. How to disallow all spiders to index any file

User-agent: *
Disallow: /

3. Where to find more complex robots.txt examples.

Editor's Note: Use our Simple robots.txt Creator to generate robots files and our robots.txt tester to check your robots file.

Copyright by Axandra GmbH, publishers of SEOProfiler, a complete SEO software solution.

Try SEOProfiler for free.

All product names, copyrights and trademarks mentioned in this newsletter are owned by their respective trademark and copyright holders.

Site Promotion Articles Indexes:

Guaranteed Top 10 Search Engine Ranking
Get A Top 10
Search Engine Ranking

Bullet-Proof Google Promotion

Google's Disavow Links a Trick?

5 Things About SEO

Google's Disavow Links Tool

Google +1 Clicks & SEO

Manual Penalties & Your Rankings

10 Recent Google Algorithm Changes

Google's New Web Page Spider

Search Engine Safe URL Redirect

KEI: A Good Indicator For Useful Keywords?

Google PR Improved With Domain Redirection

Dealing With A Google Ban

All You Need To Know About Google Suggest

Is Your WebSite Guilty By Association?

 

Search Engine Promotion Help Site Map

Valid HTML 4.01!

© 2024 Search Engine Promotion Help
April 25 2024 01:41 UTC