: How (il)legal is it to get data from a 100% accessible but not "exposed" API I found a website that provides a huge filterable table with up-to-date data about cities in a country. This site
I found a website that provides a huge filterable table with up-to-date data about cities in a country. This site uses an infinite scrolling approach to load the rows for the table.
By exploring the site using Chrome's Developer Tools, I found it makes AJAX requests to some internal URL to get the data. This URL includes a lot of query parameters corresponding with the filters.
I tried to access that URL directly in my browser and I am getting all the data in a nice JSON format. I can even play around with the filters to get the concrete data I need. This URL is actually accessible, so I do not need to do anything hacky, I'm just calling a URL that is there in the net.
So my question is: how legal or illegal is it for me to use that URL to retrieve the data for my own purposes?
Note: I do not want to create the same kind of cities list, but I want to use that data to create a little online game, potentially to earn a little money...
IMPORTANT Notes about some responses and comments
This is just an example scenario, I'm not looking for a place to grab data about cities. Consider a website with data about updated football players' performance in a season if you want.
As for the concrete country, again, I'm not thinking in a particular legislation, if you know a country where is clearly (il)legal, that'll be useful info.
More posts by @Vandalay111
7 Comments
Sorted by latest first Latest Oldest Best
(IANAL, and laws and norms vary widely throughout the world, but certain things tend to remain consistent due to IP treaties. If you have a professional issue outside of your specialty, consult with a profesional.)
Generally, legally, an API is not considered to be "intended for public consumption" unless it's actively documented as a public API, with specified terms of service. The fact that the public can reach the API does not make it public.
In cases where the status of the data itself isn't starkly public-domain, and in a few cases where it clearly is public domain, the intent of the entity providing the API matters a great deal. If the website operator intended the API to be used to feed a dynamic webpage, or a mobile application (to name two common examples) any other usage is "unauthorized" unless specifically authorized somewhere. If the intended consumer was a snippet of dynamic code in a specific webpage, for the purpose of placing human-understandable pixels on a display in a specific, (hopefully)pleasing and useful manner, any other usage is unauthorized.
The technical ability to enter a building through an open window without opening or breaking anything won't protect you from being arrested for criminal trespass...
Also, it is almost never wise to play "technical ability" vs "original intent" games with an intellectual property lawyer. If nothing else, remember that the lawyers who consistently lose those cases don't keep getting paid for them.
What you are talking about is likely fine. You are hyperlinking to information then transforming it.
In Perfect 10, Inc. v. Amazon.com, Inc.,[19] the Ninth Circuit again
considered whether an image search engine's use of thumbnail was a
fair use. Although the facts were somewhat closer than in the Arriba
Soft case, the court nonetheless found the accused infringer's use
fair because it was "highly transformative." The court explained:
We conclude that the significantly transformative nature of Google's
search engine, particularly in light of its public benefit, outweighs
Google's superseding and commercial uses of the thumbnails in this
case. … We are also mindful of the Supreme Court's direction that "the
more transformative the new work, the less will be the significance of
other factors, like commercialism, that may weigh against a finding of
fair use."
In addition, the court specifically addressed the copyright status of
linking, in the first US appellate decision to do so:
Google does not…display a copy of full-size infringing photographic
images for purposes of the Copyright Act when Google frames in-line
linked images that appear on a user's computer screen. Because
Google's computers do not store the photographic images, Google does
not have a copy of the images for purposes of the Copyright Act. In
other words, Google does not have any "material objects…in which a
work is fixed…and from which the work can be perceived, reproduced, or
otherwise communicated" and thus cannot communicate a copy. Instead of
communicating a copy of the image, Google provides HTML instructions
that direct a user's browser to a website publisher's computer that
stores the full-size photographic image. Providing these HTML
instructions is not equivalent to showing a copy. First, the HTML
instructions are lines of text, not a photographic image. Second, HTML
instructions do not themselves cause infringing images to appear on
the user's computer screen. The HTML merely gives the address of the
image to the user's browser. The browser then interacts with the
computer that stores the infringing image. It is this interaction that
causes an infringing image to appear on the user's computer screen.
Google may facilitate the user's access to infringing images. However,
such assistance raised only contributory liability issues and does not
constitute direct infringement of the copyright owner's display
rights. …While in-line linking and framing may cause some computer
users to believe they are viewing a single Google webpage, the
Copyright Act, unlike the Trademark Act, does not protect a copyright
holder against acts that cause consumer confusion.
State of US law after Arriba Soft and Perfect 10
The Arriba Soft
case stood for the proposition that deep linking and actual
reproduction in reduced-size copies (or preparation of reduced-size
derivative works) were both excusable as fair use because the
defendant's use of the work did not actually or potentially divert
trade in the marketplace from the first work; and also it provided the
public with a previously unavailable, very useful function of the kind
that copyright law exists to promote (finding desired information on
the Web). The Perfect 10 case involved similar considerations, but
more of a balancing of interests was involved. The conduct was excused
because the value to the public of the otherwise unavailable, useful
function outweighed the impact on Perfect 10 of Google's possibly
superseding use.
Moreover, in Perfect 10, the court laid down a far-reaching precedent
in favor of linking and framing, which the court gave a complete pass
under copyright. It concluded that "in-line linking and framing may
cause some computer users to believe they are viewing a single Google
webpage, [but] the Copyright Act . . . does not protect a copyright
holder against acts that cause consumer confusion."
Emphasis mine. linky
You are simply using a hyperlink, you aren't making a copy, you aren't displaying a copy, you aren't harming business interests, and you are highly transforming it. I'd say youhave every element needed to be fine. But, IANAL.
While closetnoc has discussed the issue of the data itself, there's a larger legal concern: you are not authorized to access the API offering the data.
The baseline for most computer crime laws involves the notion of "unauthorized access to a computer system". You should not confuse this reference to authorization in the legal sense with the concept of authorization when it comes to access control. The owner of a system does not have to secure his system for access to it to be illegal, just as you are still trespassing when you enter a house with an unlocked door.
In this case the apparent lack of security does not imply an authorization to use it. The concept of the internet has little precedent so far in case law but you can imagine the use of the HTTP 80 protocol to imply public authorization to view a website. Contrarily, background RPC protocols (even if they might run on HTTP requests) are not typically understood to be publicly available unless the operator publishes the service as such, granting authorization for use to third parties.
So ongoing use of the API to retrieve data would be illegal. The act of taking a data dump from the API to build your own dataset would also be illegal. Whether use of the data after that is illegal is a giant grey area but closetnoc has covered most of the concerns.
Of course if you modify the data dump after the fact to be unrecognisable it will be next to impossible to prove that you committed a crime. But if you're going to that much trouble why not source the data from a legal source instead?
One thing that doesn't seem very clear in the other answers here...
Whether it's "legal" or not, first and foremost, depends on the country. If we're talking about the United States, for example, then using the data itself is not illegal. However, I'd advise you to use the real data from the US Census. They offer tons of data through what they call TIGER products. This data set is the same data set that GIS professionals use to populate Bing maps, Google maps, etc.
However, while the data may be freely available, that does not necessarily mean the data from this exposed API is legally available. You say it's in JSON form, which suggests it's been 'massaged' from its original format into this format - and that custom format could fall under intellectual property. That, I believe, would be illegal to use unless you have the license to use it. Like others here, I am not a lawyer, but the company doesn't even need to point the finger at you and call you a hacker. Proprietary data is proprietary data, even if it is handed out unintentionally. You should contact the company and let them know all of this data is exposed to the outside world, and ask for permission to use it. Without doing that, and by having this question on stack exchange as evidence, it'd be easy to build a case against you. You've essentially said "This doesn't look legit, but I like it anyway and I want to make money off of it." Again, I'm not a lawyer, but that doesn't look like a great way to start a trial.
The thing is, though, if you're interested in city names and other geographic data, almost all of it is freely available, regardless of country. Last I knew, the US publishes the most data, but there's data out there for virtually every country. I'm hesitant to say all only because I'm a programmer and proving a "for all" statement is hard...if you pick an arbitrary country, the chances are better than good that the data is out there. If you have a specific country in mind, head to the GIS Stack Exchange. The main thing you're looking for are called "shapefiles", so ask a question like "Where can I get shapefiles for __________?" There's also OpenStreetMap which is an open source map. I'm not sure how easy it is to get their shapefile data, but if you can get it from them(and I don't see why you wouldn't be able to, you're able to run offline maps based on locally stored information), then you have all the data you need and you're in the clear legally. You'll have to spend time massaging the massive amounts of data down to what you want, but shapefiles are always very-well defined and easy to parse.
Let me be clear. There is one thing I know fairly well, it is copyright law. I am not a lawyer, however, knowledge of copyright was a constant requirement of my consultancy for 30 years. As an added bonus, I consulted primarily to telecos and often worked with subscriber data and data analysis and presentation of said data for sale and re-use. I am at least, uniquely qualified to answer this question on this forum.
I will explain this the best that I can by: one, defining proprietary verses ordinary means; two, defining the cited case exception and other related copyright considerations; and three, being clear on the answer.
Let me clarify copyright some. The example of a phone book is a misnomer. When you get a telephone, you have entered into a private contract agreement as a private citizen with a private company and the resulting information, made public or not, is private proprietary data and therefore the contents of a phone book is proprietary (pay attention to this word) simply because it cannot be obtained generally through any other means except through company data sources- the subscriber data. If data can be derived through ordinary means, such as walking around and writing down house numbers and street names, then that is publicly available data and clear to use. This is not to say that telephone numbers cannot be obtained through ordinary means. It can be.
To clarify further. To quote from: www.lib.umich.edu/copyright/facts-and-data
In no case does copyright protection for an original work of
authorship extend to any idea, procedure, process, system, method of
operation, concept, principle, or discovery, regardless of the form in
which it is described, explained, illustrated, or embodied in such
work.
This paragraph is misleading. This exception described in this paragraph is covered by patent and other laws. Copyright only extends to the creation of a work.
The:
“sweat of the brow” doctrine
...refers to any activity such as going house to house and gathering the data manually. This is the definition of ordinary means. It is possible to knock on doors and ask for the same telephone data. Only in as much as you can gather the facts by ordinary means is that data or portion of the proprietary data public.
The ordinary way around using telephone data is to: one, obtain the original data through legal means; and two, apply the fair use doctrine. This would entail getting a copy of the phone book directly from the company which may be free or for a charge, and organizing the facts within in a different way as to create a new work. Have you tried to get a Seattle phone book when you are in Chicago? You will find that the telephone company will likely charge you a surprising fee for it. However, if you are a telephone subscriber in Seattle and you ask for a Seattle phone book, the fee would be far less or even free. I have had to do this many times. There are people who's job it is just to obtain telephone books from telecos in person and paying the fee if required.
The ruling cited in case Feist Publications v. Rural Telephone in the above link (in this answer) hinges on two facts: one, being that the data by the rural cooperative operator as a local monopoly was required to be made publicly available by operational agreement; and two, that the presentation of the work was copyrighted and not the facts contained within due to fact #1 . Therefore, only within narrow parameters can this case be considered as a precedent case and must be discarded. Ordinarily, private company subscriber data is not required by agreement to be made public. You have to remember that rural cooperatives are established as public trusts/entities for the public good and owned by the public and/or cooperative members and therefore operate under legal restrictions that allow it to be approved to operate or exist. Each case is different. The citing of the above case (on the linked page) as an argument without explanation of the carve-out exceptions is misleading.
In the early days of the Bell Telephone company, the company was required as a monopoly to make telephone data public unless restricted by the subscriber. When the Bell company was split into the baby Bells, Bell Atlantic, Bell South, and so on, these companies were still required as monopolies to make telephone data public as defined before. But with deregulation and indeed with VoIP, cellular, and other options, monopolies are rare. Only in monopoly scenarios can the above cited argument be made.
Continuing to cite the link above (in this answer):
Just because data is not protected by copyright, does not mean there
are not other legal considerations that may come into play when you
wish to use someone else’s dataset.
Keep this in mind.
Any given dataset and the presentation there of, regardless of the data origin, is a work unto itself. The public presentation of the facts, irregardless of the means, is a work unto itself.
Given that you are not obtaining the data through ordinary means, even though the data is made public, and regardless of the original origin of the data, it is not free to use as you described and you could be criminally charged and held civilly liable for potential copyright infringement as well for criminal trespass and illicit use of computer and other communications equipment not ordinarily authorized and can fall under RICO statutes.
Is it legal to use? No! Absolutely not! It was not obtained through ordinary means nor is it likely the intent of the website operator to expose proprietary data. Any absence of an AUP (acceptable use policy) will not help you. There are assumptions made under the law as to the "reasonable man", "reasonable standard", and "reasonable assumption" that protects the website owner in this case. It is not reasonable that a clever person would use a "vulnerability in the design/creation" of the website to obtain data for other use. As well, if the site profits from it's activities, further protections come into play.
It is legal as long as you do not have to enter a password to get it, but some less sophisticated companies may claim hacking and sic a lawyer on you anyways. You must be prepared to defend yourself. You will be found not guilty, as they are publishing their data to the public, but it might cost to defend yourself. Prosecutors and cops defer to corporations. This happens often in security notices, where someone will notify a company of a security hole then the company will charge them with hacking. The company is also free to change the API without notice, possibly breaking your app.
www.extremetech.com/computing/146323-canadian-college-expels-student-for-white-hat-security-probing
Stop me if you’ve heard this before: A technology enthusiast gets slightly overzealous in checking for security holes, finds a significant vulnerability, comes forward with the information, and legal and personal threats are then made.
www.bostonglobe.com/metro/2014/03/29/the-inside-story-mit-and-aaron-swartz/YvJZ5P6VHaPJusReuaN7SI/story.html
The intruder was lurking somewhere on the MIT campus, downloading academic journal articles by the hundreds of thousands.
It probably depends on the nature of the data. Pure data (think telephone directory) cannot be copyrighted. So a list of cities from an API should be fair game to copy and show to users. However, if that API has descriptions of the city those descriptions would fall under copyright law and you wouldn't be able to use them without violating copyright.
If you can legally copy the data, I would recommend copying it to your own site to prevent your API usage from being shut down prematurely.
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.