Mobile app version of vmapp.org
Login or Join
Nickens628

: What is the way to prevent Google from scraping and crawling a URL in plain text on a page? Problem I have a website with documentation that includes examples with plain text URLs. The other

@Nickens628

Posted in: #Googlebot #Url

Problem

I have a website with documentation that includes examples with plain text URLs. The other day, I noticed that Google Webmaster was telling me that one of those URLs generated a Page Not Found error.

Question

What is the best way to prevent Google from scraping such plain text URLs? (other than using example.com because I am using my domain name in those sample URLs, which I think makes more sense.)

Hide directory solution

Note that I found out that I could add a folder, in my case /api, to the robots.txt and at least all of those URLs were ignored.

User-agent: *
Disallow: /api


However, all the URLs in my documentations are not just about the REST API and I still have the problem with some other pages that I just cannot add to robots.txt (at least, to me that would not make sense, adding each page individually to robots.txt, when those pages do not exist in the first place?!)

That being said, I'm not so sure that this is a good solution as far as SEO is concerned since in effect those pages still generate an Internal Link 404 error (or maybe it's considered to be a 403?).

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Nickens628

2 Comments

Sorted by latest first Latest Oldest Best

 

@Gretchen104

(you might want to confirm if the pages are html or text - I'm guessing html from your response to the JavaScript query)

You can modify the links with ref="nofollow" as a parameter. As per support.google.com/webmasters/answer/96569?hl=en this is something Google advocates.

10% popularity Vote Up Vote Down


 

@Jennifer507

Generate the text with a simple javascript function, so the literal
you want will appear when read, but not when scraped.

in the header something like

<script>function hide(str)
{
document.write('http://example.com'+str);
}
</script>


In line something like

<script>hide("/foo/bar.html")</script>

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme