Filed Under (Geekspeak, Website) by Justin on 2005-06-27

Do any of you use MSN as your primary search engine? The MSNbot that crawls my site is pretty much just stealing bandwidth, so I’m about to ban that bot from crawling here. For comparison purposes, allow me to present you with a few statistics from the month of June 2005.

    Exhibit A – MSNbot

  • Crawl hits = 9561
  • Bandwidth used = 124.43 MB
  • Visits to wantmoore.com as a results from searches @ Google = 683*
    Exhibit B – Googlebot

  • Crawl hits = 3415
  • Bandwidth used = 51.74 MB
  • Visits to wantmoore.com as a results from searches @ MSN = 41

The cost/benefit just isn’t enough for me. Not that I’m starving for bandwidth, and not because it’s causing performance issues (a href=”http://mindstormhosting.com”>Go Mindstorm Hosting!), but it just annoys me. And since it’s my site, I can do what I want right?

So, effective immediately:

User-agent: MSNBot
Disallow: /

If you notice any errors anywhere or your access gets blocked for some reason (I’m blocking some other bots and things as well), please email me and let me know.

* Includes both standard web searches as well as Google Images searches.

(9) Comments   


Steve on 27 June, 2005 at 1:42 pm #

I seem to have something very different completely to you:

(#1) Googlebot – 17166 hits – 334mb – 2137 visitors (plus 167 from Images)
(#5) MSNBot – 1546 hits – 33.74mb – 70 visitors

Very bizarre, but MSNBot doesn’t seem to be abusing me half has badly as its hit you, but Googlebot sure his hitting my website a damn large number of times!!

Syntax on 27 June, 2005 at 10:59 pm #

Ahh, the mysterious ways of the search engine!

[…] As part of the Bigdaddy infrastructure switchover, Google has been working on frameworks for smarter crawling, improved canonicalization, and better indexing. On the smarter crawling front, one of the things we’ve been working on is bandwidth reduction. For example, the pre-Bigdaddy webcrawl Googlebot with user-agent “Googlebot/2.1 (+http://www.google.com/bot.html)” would sometimes allow gzipped encoding. The newer Bigdaddy Googlebots with user-agent “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” are much more likely to support gzip encoding. That reduces Googlebot’s bandwidth usage for site owners and webmasters. From my conversations with the crawl/index team, it sounds like there’s a lot of head-room for webmasters to reduce their bandwith by turning on gzip encoding. […]

WebMetricsGuru on 23 April, 2006 at 8:30 pm #

Google’s Cwarl Caching Proxy…

Matt Cutts went over BigDaddy in several Webmasterworld Sessions last week and just did a post on his blog that sums that up.  "When you surf around the web, you fetch pages via your ISP. Some ISPs cache web pages……

the life of justin moore » five minutes of fame on 23 April, 2006 at 9:11 pm #

[…] Matt Cutts, famous Google employee, has linked to an old post of mine reloated to search engine crawling and bandwidth usage. Welcome to all the visitors stopping by. « TDM60 Dual T1/E1/J1 PCI Card   […]

ffl on 10 November, 2006 at 5:51 pm #

I get hit a lot by the MSN bots, but usually they are the first to index and bring in results. Lately it has been yahoo though.

Sometimes yahoo bots piss me off. They crawl hundreds of megs and I end up with about 50 visitors 🙁

Mattg on 10 January, 2007 at 6:19 am #

Strange to know about MSNbot crawler. I will check this.

forum overzicht on 2 February, 2007 at 2:02 pm #

Same goes for me googlebot visits more often. Msn has more links indexed too compared to google index so kind of strange i might call it

Prashant on 19 February, 2007 at 4:59 am #

This is really a common problem with MSNbot that it steals the bandwidth during its crawling process. I have also banned it. Use Google instead.

Post a Comment