Google: One week in

So far, Googlebot has visited some 2600 3574 7139 times, and it has taken 46.53 64.66 175.25 meg of data.

If I search on Google for “tamba.org.uk”, I see Sorry, no information is available for the URL romanticrobot.net

If I search then for the options offered below which is “Find web pages from the site romanticrobot.net”, I see Your search – site:romanticrobot.net – did not match any documents.

So Google lovers, why the hell is it taking my data ?

(On 9 other search engines, a search for ‘Mark tamba2’ gives this domain as #1. So either Google is losing it’s touch, or I am banned. And if I am banned, why take my data ? Why crawl my site ?).

Information: The figure I just added is 175 MB. My domain right now stands at 30 MB. They have taken my whole domain nearly SIX times. Why precisely should I not consider this to be theft ?

10 thoughts on “Google: One week in

  1. At a guess I’d say googlebot finds your site because it is heaviy linked by other people. The crawler searches around your site and pulls what information it finds and then dumps it for submission to the engine … but it looks like you’re banned. So when it comes to submitting all the information that googlebot found to the directory it all just get’s dumped.

    I’d say ban checking isn’t part of googlbots remit. So it just trawls sites for information as a drone. Only when it comes to loading up the directory does the ban check come into play and your site ends up in the proverbial bin.

    It may sound harsh, but if you’ve ended up on Googles ban list they aren’t going to give two hoots about your feelings.

    I’ve got similar problems with Yahoo as well … but it’s probably down to an htaccess or robots issue – even though I’m sure I have it rght … just can’t be bothered to become a htaccess/robots guru right now to resolve it.

  2. I actually do not give a damn if I am banned – I am perfectly aware of why this happened and I will use that header image permanently if I do not get a reasonable answer.

    What annoys me (I am being SO polite this morning) is what I see as the pointless theft of my data. I can afford the bandwidth, but it’s a principle here. What is also annoying – and this should disturb others – is that I cannot see how I can exclude Googlebot yet allow others in – they do NOT obey the standard and write the bot to act in a way to be sure they get what others can. that is insidious behaviour.

  3. If Else – they have a month. What Google have done is added two urls to their database which are listed here and only here, so given that they will list a url which has only one other website pointing to it, I cannot see why my site is not listed unless it is banned.
    They tell me in an email that:

    Although our robots may visit your site, we cannot guarantee that your pages will be thoroughly crawled or indexed. For site indexing and ranking recommendations, please visit http://www.google.com/webmasters/guidelines.html

    I did, and submitted a sitemap. The domain is xhtml valid. It is linked.
    That “cannot guarantee” is their get-out clause.

    What bugs me is NOT the fact they will not list me, but that they take my data anyway. THAT is the issue here, not my PR / linkage / inclusion or whatever. If they will not list me, why crawl me ? Their bots are not stupid.
    It’s taken as a deal I guess … they crawl and display your data under what they would term “fair use” and for that, you get traffic. You “spend” bandwidth on the bot to get traffic. So if I’m “spending”, what do I get ?

  4. I have mailed Google numerous times.
    They assure me I am not banned or penalised – but they would do that.

    Google have had every possible opportunity to address my concerns – they are actively choosing not to.

    It would be a pleasure to post that I had got it wrong and i was back in G’s search results. Somehow though, this won’t happen.

Leave a Reply

Your email address will not be published. Required fields are marked *