: What is the way to prevent Google from scraping and crawling a URL in plain text on a page? Problem I have a website with documentation that includes examples with plain text URLs. The other
Problem
I have a website with documentation that includes examples with plain text URLs. The other day, I noticed that Google Webmaster was telling me that one of those URLs generated a Page Not Found error.
Question
What is the best way to prevent Google from scraping such plain text URLs? (other than using example.com because I am using my domain name in those sample URLs, which I think makes more sense.)
Hide directory solution
Note that I found out that I could add a folder, in my case /api, to the robots.txt and at least all of those URLs were ignored.
User-agent: *
Disallow: /api
However, all the URLs in my documentations are not just about the REST API and I still have the problem with some other pages that I just cannot add to robots.txt (at least, to me that would not make sense, adding each page individually to robots.txt, when those pages do not exist in the first place?!)
That being said, I'm not so sure that this is a good solution as far as SEO is concerned since in effect those pages still generate an Internal Link 404 error (or maybe it's considered to be a 403?).
More posts by @Nickens628
2 Comments
Sorted by latest first Latest Oldest Best
(you might want to confirm if the pages are html or text - I'm guessing html from your response to the JavaScript query)
You can modify the links with ref="nofollow" as a parameter. As per support.google.com/webmasters/answer/96569?hl=en this is something Google advocates.
Generate the text with a simple javascript function, so the literal
you want will appear when read, but not when scraped.
in the header something like
<script>function hide(str)
{
document.write('http://example.com'+str);
}
</script>
In line something like
<script>hide("/foo/bar.html")</script>
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.