Google to start crawling HTML Forms

14 April, 2008 (09:09) | Google

One of the accepted truths of SEO today fell as Google announced that Googlebot (the software they use to crawl and index the web) now has the capability to fill in and submit HTML forms on webpages.

The basic way it will work is to take prominent keywords from the website the form is on and use these to fill in the form parameters (or in the case of dropdown lists, radio buttons and tick boxes it will presumably pick an option) and then submit the form. If it gets what it thinks are sensible results returned it will index the page and carry on it’s crawl.

This announcement only applies to GET forms, and if you don’t wish your forms to be submitted then simply use robots.txt to exclude them.

Full details can be found at: http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html-forms.html

Comments

Comment from zohai
Date: May 21, 2008, 12:28 am

Pretty smart eh? But then what’s the point for it to crawl into forms which most of the time only are just application forms. Nevertheless neat stuff =)

Comment from Social Picard
Date: May 21, 2008, 2:37 am

Google will always be my first search engine. But being at the top certainly give them a pressure to maintain it. I don’t know whether this move will benefit us the users but from what I understand from the articles given it will lead a much more headache to users.

Comment from info
Date: May 25, 2008, 7:02 pm

Informative article. Thanks for putting it up. Google seems to be outdoing themselves each and everytime. Hope it stays that way.

Comment from George
Date: June 3, 2008, 4:07 am

That is good new specially for webmasters. I am learning seo , I came here and found this blog very much informative.

Comment from Mark
Date: June 5, 2008, 4:32 pm

Very good information. Glad I visited this blog.

Comment from Singapore SEO Consultant
Date: August 26, 2008, 5:41 pm

For whatever reasons that you do not wish search engine robots to reach certain pages on your website, simply use the robots.txt. Googlebot may be getting smarter but still it must continue to work and crawl without affecting webmasters.

Write a comment