December 7, 2006
Notes From the Meet the Crawlers Session
I’m currently sitting in a session at Search Engine Strategies called “Meet the Crawlers.” It’s made up of engineers from all the big engines. I thought I’d bullet some of the more interesting comments they have put out:
These are the engineers from Google, Yahoo, MSN and Ask.
I’ve always loved this session because the information doesn’t get any better than this. If they say something, it’s pretty much gospel.
- Google, MSN and Yahoo are adhering to a sitemap standard format. But Ask is not currently doing so.
- Having multiple variations of the a url (usually because of session id’s) does not hurt you. The engine simply picks one.
- When rebuilding a site on the same domain, 301 redirect from old urls to new urls, create a new sitemap and submit it, possibly submit the old site map to detect the 301 redirects.
- If people are putting content on your site (such as comments) that have links, wrap the link in nofollow.
- The “site:” search is much more accurate in Yahoo than in other engines(Yahoo said). So if you are seeing bit variations of number of pages between Google and Yahoo, keep in mind that those are just estimates.
- Sitemaps are used as signals to augment the engines’ listings. It is a “hint” but it’s not the be all end all.
- Crawlers used to have an issue with a depth of crawl so they didn’t go many levels deep in sub-folders. But now its not an issue. “Reasonably deep” levels should not be an issue. If it makes sense for a user then it should be fine. Usability is a good proxy for this.
- In Google, don’t worry about having listings in the “supplemental results”. It just takes time. Supplemental is not a penalty and you will get traffic from supplemental results.
- Supplemental results is an extra layer of the Google index. It’s a bit larger and takes a little longer to update. It’s a perfectly normal thing to see some of your pages within supplemental.
- Ask would not comment on whether they will implement something like Yahoo Site Explorer and Google Webmaster Central. MSN says they are taking feedback on that.
- If you have a shopping cart that has products in HTTP and HTTPS urls, you should tell the engines which version to use via robots.txt and even better… don’t have products in both versions. The engines support indexing HTTPS urls.
- If you have text that describes a flash animation and “gracefully degrades” for someone that doesn’t have flash, you won’t have much of a problem.
- Imagine the crawler as a vision impaired user. Convert your site to play on a browser reader via sound and look at your site through a text-only browser. This will show you very clearly what they see.





Comments