Mobile app version of vmapp.org
Login or Join
Bethany197

: Stop Google from crawling my site (already blocked with robots.txt) Annoyed of YouTube's music deletion in Germany, I started my own "YouTube clone" just for private use. It automatically downloads

@Bethany197

Posted in: #Google #Googlebot #RobotsTxt #Youtube

Annoyed of YouTube's music deletion in Germany, I started my own "YouTube clone" just for private use. It automatically downloads my subscriptions and some videos with special keywords or from special YouTubers. All of that works fine. And all of that is not accessible from the outside (you'd need a username and a password, which only I have).

On my start page, there are a lot of links to videos that are in my subscriptions, but aren't downloaded yet. When I click one of these links, I get that video included with the original YouTube embed feature. All of that works fine.

But now my problem: a few minutes ago I watched a video with that embed-thing and I just saw this in my Apache log:

66.249.89.90 - - [20/Dec/2014:21:40:52 +0100] "GET my_youtube_clone HTTP/1.1" 200 2780 "-" "Mediapartners-Google"


I already have all bots blocked via robots.txt, so obviously Google is using the YouTube referrers to crawl the pages and while it is doing that, it is ignoring the robots.txt.

Google didn't get anything useful from that, as I said, you'd need a password for that, but I am quite annoyed by Google ignoring the robots.txt and using the referrers of YouTube as crawl URL sources.

Is there any way to completely stop that?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Bethany197

2 Comments

Sorted by latest first Latest Oldest Best

 

@Heady270

Mediapartners-Google is the user agent that Google uses to crawl pages with AdSense ads on them. The crawling is likely related to Ads that are shown by the video.

Remove the ads and Google will stop trying to crawl like this.

10% popularity Vote Up Vote Down


 

@Smith883

You can use tags in your headers (<head></head>) to prevent crawling from most search engines:

<meta name="robots" content="noindex">


They also specify that if you only want to block google specifically then you can use this:

<meta name="googlebot" content="noindex">


Google throws up the same: support.google.com/webmasters/answer/93710?hl=en

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme