This site is for sale, Learn More
Why You Should Validate Your HTML
Originally Published: February, 2004
Editor's Note: For more information about indexing and spidering problems see site indexing problems and why search engine spiders miss pages. Consider hiring a professional webmaster to make your site HTML compliant.
If you're a regular reader of the MarketPosition newsletter, then you already know about the potential pitfalls of using frames, Flash, and dynamically generated pages on your site. Search engine spiders struggle with these technologies, so using them heavily can limit your success in the search engines.
Well, here's another item to add to that list: invalid HTML code. Bad HTML can hurt your site in the search engines without you ever realizing it.
What exactly is HTML validation? It's the process of checking the syntax of your HTML code to find places where you've violated the rules of the language. The official rules for writing HTML are defined by the World Wide Web Consortium (W3C). Those rules include strict definitions stating which HTML tags are legitimate parts of the language, and how you should structure your HTML documents.
HTML errors that violate these rules include things like badly nested tags (where you incorrectly close one element before another), content model violations (where you nest tags that aren't allowed inside one another), and badly formed tables.
Sound confusing? Don't worry - many HTML editors include a built-in validator that will check your page and point out this sort of error. In addition, online services like WDG HTML Validator and W3C itself offer free page validation.
So what exactly is the impact of this sort of error? It depends upon who's reading the page. Errors may have no impact at all in your browser, or they could cause the text to appear in the wrong place or in the wrong size on the page. At their worst, HTML errors can keep sections of your Web page from displaying.
To be honest, validation is about as much fun as a trip to the dentist. Sometimes it feels like you're getting a root canal. The first time you validate your page, you could see dozens of errors. That's especially true if you coded your pages by hand which tends to result in more errors. Even if you use FrontPage or another WYSIWYG editor, they don't always produce code that validates cleanly. Of course, there's some assurance in the idea that a search engine would try to be compatible with common HTML errors created by the most popular editors like FrontPage, it's still not a sure bet.
What makes matters worse is that it's hard to see the value of fixing all of these problems when your page displays just fine under Internet Explorer. Indeed, the whole reason why the W3C stresses validation is because following the official rules of the HTML language makes it easier for browsers to interpret your page correctly. If the latest browsers do that already, why bother with HTML validation?
The reason is simple: search engine spiders also need to interpret your HTML. And while the Microsoft and Netscape browsers are very forgiving of your HTML errors, search engine spiders aren't nearly as kind. It helps to think of a search engine spider as a web browser - just like a browser, the spider needs to interpret your page and figure out what you're saying. Only then can it properly index your page. Search engine spiders also care about the structure of your Web page because they give extra weight to keywords placed inside certain HTML tags.
But there's a big difference between web browsers and search engine spiders: web browsers are under pressure from the marketplace to correctly display as many Web pages as possible. Any browser hoping to be taken seriously needs to understand the latest web technologies and be able to understand all those badly written pages out there on the Web. Users would quickly abandon any browser that fails to render the average Web page.
One would think that same market pressure would push the search engines to improve their spiders, making them more forgiving. After all, there's a great deal of competition in the search engine world. Yet that doesn't seem to be happening, partly because its difficult to tell when this issue has come into play. It's surprising how far search engine spiders lag behind the major browsers.
I don't want to give you the impression that any and all HTML errors will wreck your search engine ranking. Spiders do tend to forgive many errors, such as badly nested tags. But I have direct experience with bad HTML hurting a search engine ranking. A few months ago I helped a webmaster who had lost his Top 10 ranking because of a simple typo in his HTML. One badly placed angle bracket kept Googlebot from correctly parsing the home page, causing it to fall completely out of the index. The page displayed correctly under all the major browsers, but it still caused problems for Googlebot.
So validating your pages is a wise precaution, particularly if you write the HTML code by hand. Clean, well-written HTML is important if you want to ensure a good search engine ranking. It also helps guarantee that your page will be displayed properly on older, or more obscure browsers that are less forgiving. Therefore, you have two compelling reasons to validate your HTML today.
Christine Churchill is President of KeyRelevance.com a full service search engine marketing firm. She is also on the Board of Directors of the Search Engine Marketing Professional Organization (SEMPO) and serves as co-chair of the SEMPO Technical Committee.
This article is copyrighted and has been reprinted with permission from FirstPlace Software.
Site Promotion Articles Indexes:
Search Engine Promotion Help