View Full Version : Wayback machine - stealing content?
Pekka
21st of May 2005 (Sat), 12:32
I came accross this site: http://www.archive.org/web/web.php
Basically they store websites on their hard disks and give public access to view them. They store fullsize photos, too, for everybody to view and link to. Like
http://web.archive.org/web/20040516141051/photography-on-the.net/gallery/D30_photos/medium/CRW_8413_00001.jpg
In my book this clearly violates copyright laws. I could just as well grab whole Dpreview and display it for educational purposes! And in http://www.archive.org/about/terms.php they say: "If you provide any content to the Archive, you grant the Archive a nonexclusive, royalty-free right to use that content." Do I provide content when I put a website on net???
They give a method of removing the site from archive by adding robots.txt file. This sounds like: we steal from you until you stop us. That is not the way copyright works.
:evil:
Opinions? Anyone know more about that site? Has anyone sued them for copyright infringement?
Pekka
21st of May 2005 (Sat), 12:55
Quote from http://www.jisc.ac.uk/uploaded_documents/archiving_legal.pdf , referenced in Stanford University page http://fairuse.stanford.edu/commentary_and_analysis/2003_11_hirtle.html about "Digital Preservation and Copyright":
In short, the Internet Archive largely ignores copyright law in the process of collecting its material, provides only a limited (and, arguably, effectively valueless) protection for the material once stored, and in effect disclaims any responsibility for what is done with the material by the end user, as well as any liability that the end user may incur in accessing the material. Given the litigious nature of the US, it will be interesting to see if the Internet Archive’s success in avoiding litigation over its activities will continue for much longer.
tommykjensen
21st of May 2005 (Sat), 13:05
That kind of behavior stinks :evil:
I think I came across this site a long time ago. In any case I have a robots.txt file on my site to block this archive from collecting.
We're sorry, access to http://klein-jensen.dk has been blocked by the site owner via robots.txt.
primoz
21st of May 2005 (Sat), 14:26
Nice one... NOT! Another reason to have watermark over my photos on web. I actually knew about this "archive" but never though of it this way. And especially I never bother to read their "terms of use".
Btw... I'm not lawyer but how does it go "if you provide content TO Archive...". In this case I did NOT provide content to Archive... they just took it. Sorry maybe it's completely wrong legaly but for me this is stealing not providing.
Rob612
21st of May 2005 (Sat), 14:55
I hate to say it, but I am afraid that legally its going to be a huge and probably useless battle, they can just move the system in another country (i.e. any palce in Russia, South America or the Eastern Europe).
The use of robots.txt it the only feasible possibility. :(
I am sure that ther ARE breaking the law somehow - not a lawyer thou - but the internet is a difficult beast to manage when it comes to copyright.
CyberPet
21st of May 2005 (Sat), 15:08
How do you make one of those robot.txt files?
Wazza
21st of May 2005 (Sat), 15:27
Yes, I know very much of this site.
I haven't really thought about the copyright infringements.
I'm sorry Pekka, in your case, it's not helping you in anyway. And sharing your personal pics unnotified. It basically is just stealing and locking a memory there.
In my case, most my old sites haven't archived, except one which I used for a game. But it's actually a great benefit having it archived. All the track records etc. There should be an easier way in requesting to remove pages off their database.
A couple of links I use:
http://web.archive.org/web/20030626081343/www.deadlineracers.net/cgi-bin/nfs5/nfs5stats.cgi?action=showwrs&lang=eng&db=1030000000
http://web.archive.org/web/20030706001814/www.deadlineracers.net/cgi-bin/nfs5/nfs5stats.cgi?action=showuser&usrid=259&lang=eng&db=1030000000
(fortunately in my situation it's good to have them backed up, as the site is now down. But I can fully understand being on the other side, with valuable personal subjects, and photos. :()
tommykjensen
21st of May 2005 (Sat), 15:33
How do you make one of those robot.txt files?
In the very simplest form You just create a file named "robots.txt" in the root of the web containing
User-agent: *
Disallow: /
This will disallow all robots, including google and other search engines. You can specify more options so google are allowed. Read more here. (http://www.searchengineworld.com/robots/robots_tutorial.htm)
CyberPet
22nd of May 2005 (Sun), 09:35
Thanks Tommy! :)
gkuenning
22nd of May 2005 (Sun), 09:49
I find it sad that people are getting so up in arms about what is essentially a library service.
The Internet Archive is an invaluable resource that operates according to well established ethical principles. Have we also only just "discovered" that Google caches every Web page it visits, unless the publisher requests otherwise? Are we up in arms at Google for "publishing" our content?
The reality is that if you publish something openly on the Internet, you are (in practice) granting a non-exclusive royalty-free license to anyone and everyone who decides to download it. You can't stop them, and most of the time you won't even know. Contrast that with Google, the Internet Archive, and similar services. All you have to do is follow a long-established and simple standard saying "Please ignore me" and they will comply.
Of course, the bad guys will ignore your robots.txt file and steal your stuff anyway.
You can stress out about it, or you can decide that it doesn't really make a difference whether somebody halfway across the world uses your photo for a screen saver.
And also (of course) by prohibiting legitimate archival services from saving your content, you won't be available for historians to study and you won't be able to prove in court that your content first appeared on June 1st, 1998. But that's your choice.
Pekka
22nd of May 2005 (Sun), 10:17
I find it sad that people are getting so up in arms about what is essentially a library service.
The Internet Archive is an invaluable resource that operates according to well established ethical principles. Have we also only just "discovered" that Google caches every Web page it visits, unless the publisher requests otherwise? Are we up in arms at Google for "publishing" our content?
Google cache is under same legal doubts, see Google about it. E.g. http://news.zdnet.co.uk/internet/ecommerce/0,39020372,2137329,00.htm
"well established ethical principles" - does that include reinventing copyright laws to justify own behaviour?
The reality is that if you publish something openly on the Internet, you are (in practice) granting a non-exclusive royalty-free license to anyone and everyone who decides to download it. You can't stop them, and most of the time you won't even know. Contrast that with Google, the Internet Archive, and similar services. All you have to do is follow a long-established and simple standard saying "Please ignore me" and they will comply.
"non-exclusive royalty-free license" - no way! Placing materal on the the net does not put them out of reach of copyright laws - you may view the content and store a personal copy of it just like you may listen to a CD and make a personal copy of it, but you may not redistribute it without permission.
Of course, the bad guys will ignore your robots.txt file and steal your stuff anyway.
And this means? Are you saying it is not worth having any copyright laws? everything should be free to use and share, art, music, movies?
And also (of course) by prohibiting legitimate archival services from saving your content, you won't be available for historians to study and you won't be able to prove in court that your content first appeared on June 1st, 1998. But that's your choice.
There are many ways to prove origins of content - by getting testimony from your clients for example. And as archive.org does not archive all sites on the net your point is moot.
Persian-Rice
22nd of May 2005 (Sun), 10:41
That is one reason why I never ever post my images on the net except in the rare occassion. Only on a private website in a flash movie might I consider putting up a gallery. It's so hard to trust people these days.
FlyingPete
22nd of May 2005 (Sun), 16:11
It is one thing to be able to water mark images, what about other content?
You THIS IS COPYRIGHTED CONTENT could THIS IS COPYRIGHTED CONTENT put THIS IS COPYRIGHTED CONTENT text THIS IS COPYRIGHTED CONTENT posted THIS IS COPYRIGHTED CONTENT on THIS IS COPYRIGHTED CONTENT the THIS IS COPYRIGHTED CONTENT web THIS IS COPYRIGHTED CONTENT up THIS IS COPYRIGHTED CONTENT like THIS IS COPYRIGHTED CONTENT this :D
I don't think so.
Citizensmith
22nd of May 2005 (Sun), 19:21
Google have been annoying people otherways as well. Their broadband accelerator service was causing great concerns as it was another variety of caching service. The primary concerns were site that rely on click through for payments would lose lots of clicks as the site would be loading from Googles servers and not the sites host. Secondly, people with bandwidth concerns were worried as the severs would regularly cache entire popular web sites so you'd be hot for the bandwidth whether or not anyone actually used your site.
The service is currently on hold. Another example of a good idea that has some major flaws that people seem to have ignored.
Jesper
23rd of May 2005 (Mon), 02:23
In my book this clearly violates copyright laws. I could just as well grab whole Dpreview and display it for educational purposes! And in http://www.archive.org/about/terms.php they say: "If you provide any content to the Archive, you grant the Archive a nonexclusive, royalty-free right to use that content." Do I provide content when I put a website on net???
They give a method of removing the site from archive by adding robots.txt file. This sounds like: we steal from you until you stop us. That is not the way copyright works.The line "If you provide any content..." is probably meant for people who consciously add content to the archive, not for any website in the world that they put in the archive without the owner of the website knowing about it.
Ofcourse they can't just copy any publicly accessible website and assume to have all the rights to the content they copied. I don't believe anything like that would ever hold in a court of law.
I don't think it's as nasty as it may seem at first sight, I don't think we need to worry too much about this.
Their terms also mention explicitly that you must not infringe anyone's copyright when you're using the archive:
You agree to abide by all applicable laws and regulations, including intellectual property laws, in connection with your use of the Archive. In particular, you certify that your use of any part of the Archive's Collections will be noncommercial and will be limited to noninfringing or fair use under copyright law. In using the Archive's site, Collections, and/or services, you further agree (a) not to violate anyone's rights of privacy, (b) not to act in any way that might give rise to civil or criminal liability, (c) not to use or attempt to use another person's password, (d) not to collect or store personal data about anyone, (e) not to infringe any copyright, trademark, patent, or other proprietary rights of any person, ...
gkuenning
23rd of May 2005 (Mon), 05:45
Google cache is under same legal doubts
True, but the key word here is "doubts". This is an area that the law did not foresee, and it is going to take a long time to resolve it. But I'll note that it is analogous to cases that tried to allege that in-computer copying (e.g., loading an image into RAM) was illegal. The last time I was an expert witness on such a case, I never even testified, because the judge could see the silliness of the argument.
As a practical matter, there is no difference between the content I post on my Web site and Google's or Wayback's copies of it. All three make my content available to anybody who wants to see it. None charge a fee. All three give me a reasonable way to remove my content. (To be fair, it's obviously more work to remove from three instead of one, unless you use a robots.txt file ahead of time.)
The article Pekka linked to (NY Times is upset about Google cache) is nicely balanced by a recent research study showing that major newspapers are losing influence because they use various methods to control how people read their content. For example, I never pay the NY Times a fee to read an old story. I just go to AP or the BBC instead. In the same way, every restriction you put on your pictures will reduce the number of people who see them. You may have valid reasons for doing so, and it's your choice. But I think it's silly to get excited because Google has a physical copy that duplicates your own physical copy.
"well established ethical principles" - does that include reinventing copyright laws to justify own behaviour?
Actually, I was referring to centuries of standard practice by libraries. The Internet Archive is trying to be a library (it was invented by academics for academic purposes), and they've tried very hard to find reasonable ways to apply the ethical principles of libraries to the modern world. That includes giving you ways to remove things you don't think they should archive.
If you study the history of copyright, incidentally, the major cases of "reinventing copyright laws to justify [one's] own behavior" have essentially always come from large corporations seeking to line their pockets. Does the name Disney ring a bell? :D Even in the Feist case, a rare win for the public, the company doing the copying (Feist) didn't reinvent the laws to their own advantage; they engaged in perfectly reasonable and time-honored behavior and found themselves sued by a very well-funded company.
"non-exclusive royalty-free license" - no way! Placing materal on the the net does not put them out of reach of copyright laws - you may view the content and store a personal copy of it just like you may listen to a CD and make a personal copy of it, but you may not redistribute it without permission.
I said in practice. Legally, you are 100% correct (well, 99.9%--it's not yet settled what protections apply to stuff posted without a notice, but case law suggests that it's probably protected as "All rights reserved"). But in practice if you post it, you've made it possible for people to copy it and redistribute it without permission. If you're like me, you simply lack the resources to track down and prosecute any violators. So in practice you've granted a license even though legally you have done no such thing.
And this means? Are you saying it is not worth having any copyright laws?
I don't recall saying anything of the sort, although (in other places) I have been known to point out that the current laws don't match the modern world very well.
everything should be free to use and share, art, music, movies?
That's EXACTLY what the copyright law says. Limited times, remember? That's why people are free to use and share Shakespeare, Mozart, da Vinci, and even Edison's movies.
There are many ways to prove origins of content - by getting testimony from your clients for example. And as archive.org does not archive all sites on the net your point is moot.
Testimony from my "clients" would be very expensive; they're far-flung and hard to find. OTOH, since archive.org archives my site, it doesn't matter what else it archives. I can prove that my content existed on my site in a certain form in 1997. I can't prove that it did NOT exist on another site in 1996, any more than I can prove that it wasn't in the British library in 1935. But the archive.org evidence is very strong and is cheap to acquire.
As I said before, I'm sad that people are getting excited. Archive.org is a wonderful service, as is Google. I'm very glad that they store a copy of my copyrighted content. If others don't want that, there are lots of ways to protect a Web site, including robots.txt files and registration systems. Just understand that by doing so, your Web presence will be greatly reduced. (See this thread (http://photography-on-the.net/forum/showthread.php?threadid=74268).)
vBulletin® v3.6.12, Copyright ©2000-2012, Jelsoft Enterprises Ltd.