View Full Version : Can EE generate excessive Googlebot traffic?
MikeCaine
2nd of December 2006 (Sat), 04:39
I seem to be suffering from excessive googlebot traffic and I'm wondering if it's possible that EE might be the cause? I had a similar problem in May when an upgrade to a genealogy script caused googlebot to seemingly get stuck in a section of the site and I had to apply a patch update to sort it. I've now noticed another rise in googlebot traffic and I'm wondering what the cause is and if anyone else here has noticed a similar rise recently?
Would having the additional sitemap generate a lot more googlebot traffic?
My googlebot stats are -
May 65,575 hits / 2.08GB bandwith (genealogy script problem)
June 23,101 hits / 117MB
July 11,052 hits / 56.55MB
August 5,487 hits / 25.39MB
September 8,750 hits / 53.48MB
October 52,833 hits / 3.04GB
September 74,429 hits / 3.86GB
Nothing else has changed on my site for a while, except for the EE upgrade and about 20 new photos added.
The other search bots are nowhere as active on the site as googlebot.
Pekka
2nd of December 2006 (Sat), 06:03
I would be happy to see Google indexing the site thoroughly! EE 2 upgrade will make Google reindex your site because the source code has changed totally.
You can always add robots.txt to your site if you wish to restrict them.
MikeCaine
2nd of December 2006 (Sat), 07:20
So it should die down again soon?
I only have 490 photos in EE at the moment and there seems to be a huge jump in googlebot traffic and bandwidth used. I had a quota of 5GB for that site and googlebot has chomped its way through 3.86GB of it last month. I'm reluctant to restrict googlebot if the crawling is necessary, but I was wondering if it was getting stuck in some sort of recursive links type thing?
Pekka
2nd of December 2006 (Sat), 11:11
It sees what you see. Recursive loops are not happening because the structure arrays are prebuilt and checked for that (actually, if you would get a recursive endless loop you get an PHP error page or page not found error).
Maybe you could say in robots.txt that you do not allow image downloads, that would cut down spent bandwidth a lot. It would still crawl the content.
wkitty42
2nd of December 2006 (Sat), 12:58
i'll also add that search engine's spider crawling your site is not "necessary"... at least not if you don't want them to index you and/or have your site(s) in their search engines...
you might also want to add a line to robots.txt, like pekka mentions, for googlebot to not download graphic files from your exhibition directories... however, if you do want google images and google video to access that info, then you may not want to restrict them...
this goes for not only google but also msn, ask.com, and all the others out there that perform these tasks... remember, too, that google has at least two user agents... one for the regular content and one for the graphic stuff... there may be a third for the video stuff but i'm not sure about that...
one place to check on some of this stuff would be over at webmasterworld (http://www.webmasterworld.com/home.htm)... i've had an account there for years and the assistance has been fantastic when i needed it... they even used to post notices when google used to do what was known as the "google dance" where the spiders came out and you could watch the search indexes change for your phrases and they'd finally settle down to whatever the new configuration was... you could even see the new indexes travel around the world to the google centers in other countries... it was fun times, back then, watching the google spiders and indexes dance like that :)
MikeCaine
3rd of December 2006 (Sun), 08:25
I hadn't considered that google would be downloading the images all over agian. Most of my traffic comes via google so I don't want to cut it out all together, I just wish it didn't consume so much bandwidth - it's well ahead of all the others in hits and bandwidth. E.g. for November
Googlebot 74,429 hits 3.86 GB
Inktomi Slurp 3,526 hits 16.41 MB
MSNBot 3,012 hits 38.84 MB
I'll check out webmasterworld, it looks interesting
wkitty42
4th of December 2006 (Mon), 20:29
many search engines download all content over and over and over again... it is their way of "ensuring" that they get everything right... i had a couple that used to download archive files (ie: zip, arj, arc) a lot more than they needed to to determine the file type and other info... many archives contain their signatures within the first 100 bytes... after talking with them, they modified their procedures so that they don't suck so much of my (and everyone elses) bandwidth when it comes to archive files...
vBulletin® v3.6.12, Copyright ©2000-2012, Jelsoft Enterprises Ltd.