Mobile app version of vmapp.org
Login or Join
Chiappetta492

: Am I legally allowed to cache other people's web pages on my site? I'm developing a small content aggregation app that frequently relies on posts from other web sites, which occasionally go

@Chiappetta492

Posted in: #Cache #Legal

I'm developing a small content aggregation app that frequently relies on posts from other web sites, which occasionally go down or remove the content. Would it be in breach of any legal structure if I cache the page on my website's server? I would keep the other website's URL, and only show the cached content if the URL was unavailable.

I know websites like Google and the WaybackMachine do similar things.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Chiappetta492

1 Comments

Sorted by latest first Latest Oldest Best

 

@Ann8826881

Be careful! You cited two web sites that cache content and both are not equal. For the record, you should always consult a lawyer before making copies of any work regardless of how small or honest your effort may be. Copyright is a tricky business.

Here is a link to the U.S. Copyright Law: www.copyright.gov/title17/
How both sites are not equal:

There is a primary difference between the two sites. One respects copyright law (by default) and the other violates the law routinely (by default).

Google.com

Google is a opt-in/opt-out website. Google respects the robots.txt file. This is paramount to understanding the difference. You can block Google from indexing your site with relative certainty using the robots.txt file. As well, Google respects the noindex meta-tag which can be used on a page-by-page basis. Add to that, Google obeys the noarchive meta-tag.

Robots.txt: support.google.com/webmasters/answer/6062596?hl=en&ref_topic=6061961
Robots Meta-tag: developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
NoArchive Meta-tag:

<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">


Using these options, Google will not store a local copy of your website or pages. This is important to know and remember.

Google offers full control over how your website is seen within search if at all.

Archive.org

Archive.org is an opt-out only site. It is said that archive.org respects the robots.txt file. However, this is not fully the case. Sites that have correctly used the robots.txt file to block archive.org from indexing it's pages can find that their pages appear in the archive anyway. Why? Because by default, archive.org will index your site and pages regardless of any directive and will (theoretically) honor your wishes when the archive is presented. This mechanism has several flaws primary of which is the robots.txt directive which must exist always even after the site has been dismantled. Even then, archive.org has been sloppy in how they implement the noindex directive and will from time to time display archives of sites and pages against the site owners wishes. This is a clear violation of copyright law which specifically blocks any and all copies of another's work by default without express permission or license with small and limited carve-outs for educational purposes and reference in other work.

Using the one option that archive.org provides by default, it will index your pages anyway which is a clear violation of copyright law and will display your pages from time to time or in perpetuity when it errors in respecting the noindex directive.

Archive.org has been sued quite a few times and remains vulnerable to lawsuits for copyright infringement despite the attempt to carve-out an exception in the Digital Millennium Copyright Act. It does not take too many searches in Google to find lawsuits filed and legal take-down notices to archive.org.

You can The Huffington Post to your list.

The Huffington Post has routinely violated copyright law and established it's dominance by doing so. I will not get into the details, suffice it to say, the HuffPo has also been sued a number of times.

Your rights to other work.

By default, you do not have rights to others work regardless of what you intend to do with it. There are carve-outs in the law that allows for reference to other work or for special purposes within the academic community. I have written several answers on the subject which I have linked below (no particular order), though some may be very specific to the question asked, they will give insight to the issue.

I would not recommend indexing, archiving, or otherwise making a copy of any work that is not your own.

Can you be sued for using publicly used images on a blog without copyright permission?

How much of your content needs to be copied before you can file a DMCA complaint?

How (il)legal is it to get data from a 100% accessible but not "exposed" API

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme