PDA

View Full Version : Robots and robots.txt


Oceanwatcher
8th of August 2005 (Mon), 14:13
On one hand it is nice to be able to be found through search engines, on the other hand you do not want anyone else to just "steal" your images and put them in their own system.

So how do we balance this? What should we let the robots index? I tried searching for robots.txt, but I think denying all is a bot overkill.

I am looking for some good suggestions to what dirs we should allow the robots to index. Or actually, in robots.txt lingua - what NOT to disallow...

Suggestions, anyone?

Oceanwatcher
12th of August 2005 (Fri), 22:16
Don't tell me that nobody has given this a thought?

MikeCaine
13th of August 2005 (Sat), 04:38
I think you need to ask why you are putting photos on the net in the first place.

If it's for anyone to view them then you will have to let search engines find them. You'll also have to accept that any one viewing a photo can "steal" it, no matter what you try to do to stop it. All you can do is choose to spoil or deface the photo by way of a watermark or copyring statement through the middle of it.

It it's just for selected clients then make use of password access for the photos.

You have your URL in your sig so I assume you want anyone to veiw the site?

Oceanwatcher
13th of August 2005 (Sat), 08:23
It it's just for selected clients then make use of password access for the photos.

You have your URL in your sig so I assume you want anyone to veiw the site?

As far as I have understood the EE program so far, the password part is not worth too much since if you know the picture path and names, you can view them.

But I would definitely let the search engine hit my front page and index that. And generally it would be nice if it indexed everything that is in the database - text - except for the images itself.

So I set robots.txt to exclude the folder with images. But there are probably other folders that search engines should not worry about. What are they? Maybe Pekka could tell us what parts of the EE installation is irrelevant for a search engine, and what is not?

Pekka - I think it would be a good idea to have a sitemap in EE, because this makes it easy for the search engines to find the stuff they need.

Yes. I would like people to see my pics. And I am desperately waiting for the next version of EE that is said to have a better security system so I can put personal pics online for my family to see (and eventually for clients) without any fear of exposing these pics to the world.

But for now, I just want to try to come up with a generic robots.txt for EE that includes the necessary folders for the site to be correctly indexed, but excludes things that is not needed.

ArtM
14th of August 2005 (Sun), 01:26
Using the include & exclude protocols from

http://www.robotstxt.org/

you should be able to narrow search engine indexing to desired directories only.


- Art

Oceanwatcher
14th of August 2005 (Sun), 09:03
Using the include & exclude protocols from

http://www.robotstxt.org/

you should be able to narrow search engine indexing to desired directories only.

My problem is not how... I know all about that. The problem is what. What directories should be excluded?

What I have done so far is actually to exclude everything except the root directory. Are there any directories that should definitely NOT be excluded?

ArtM
14th of August 2005 (Sun), 10:16
My problem is not how... I know all about that. The problem is what. What directories should be excluded?

What I have done so far is actually to exclude everything except the root directory. Are there any directories that should definitely NOT be excluded?

I think that depends on how you have your directories & paths set up. My images are all in a seperate (set) of directories - outside EE. So I can deny indexing to EE seperately from the images.

I don't see any reason for allowing search indexing of anything within EE - except the 'gallery' or subset directories you wish. So excluding all EXCEPT those desired makes sense. If I remember you can deny or only allow by file names & partials as well.

- Art


- Art