Inefficient Search Bots

Caught a story in a feed from which basically asks if Google is indexing itself. In this example, Google turns 3000 pages into more than 11000. That thread mentions another where the index has jumped over 10 times. Now I’ve mentioned Google eating this domain before so I’ve just checked. The site backup I have weighs in at around 108meg. The bandwidth Googebot has consumed this month is 2.77gig. In the same money, I make that .1 / 2.77 – that’s a bit of a difference. (The Inktomi Slurp bot was second wth 1.69gig).
But why ? Those numbers show that whatever Google may be, it is wildly inefficient. Now before I get accused of picking on Google, I’m not .. sort of. Several engines appear in the stats, and they account for around 6.5gig of data transfer. If Google is seen as the target to beat, and it consumes 2.77gig of data, what happens if they all up their bandwidth habit in order to gain ground ? This is just a blog, nothing special, nothing peculiar to mark it out, yet around once every 30-36 hours Googlebot is re-indexing every single page. I’ve nothing against that – I have no bandwidth worries – but it just seems so inefficient and if another MSN and Yahoo only matched Google then that would equate to 10gig a month. In their race to gather, they are going to end up costing people money. That they might get returned higher in results does not matter if they have no ads, and the very people that this could affect are those that have chosen smaller site packages because they see what they are doing as being of a low interest – which it is until the bots come in to play. Or do they index according to some other algorithm ?

One thought on “Inefficient Search Bots

  1. You can’t really control Googlebot (which is a shame) and if you complain to Google that it’s hammering your site then it’ll most likely never return again. But with the others they respond to the crawl delay parameter in robots (which is good). I had to calm down msnbot, but I think it works for slurp and inkotomi (I banned slurp because it was worse than Googlebot for me … and I was in a bad mood).

    User-agent: thebotname
    Crawl-delay: 3600

    Crawl delay in seconds (If I remember right)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.