Mobile app version of vmapp.org
Login or Join
Si4351233

: Google is displaying a URL in my site's index with the cache of that result being completely different Ok, this may turn out to be a silly question but I have observed Google displaying a

@Si4351233

Posted in: #Google #Indexing #Search

Ok, this may turn out to be a silly question but I have observed Google displaying a URL in the index with the cache of that result being completely different when it shouldn't even be there in the first place.

Description:

I built a random function for the website docur.co

The function initiates with a request to:
docur.co/random
The robots are blocked from this URL:
docur.co/robots.txt
However Google has followed this URL and produced the following search result:



This is the cache:



My question is: Can anyone tell me what exactly is going on here? As aforementioned, I may have done something wrong...

Update:

Maybe adding the re="nofollow" directly to the anchor on top of the robots directive will ensure that Google will not follow the URL?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Si4351233

1 Comments

Sorted by latest first Latest Oldest Best

 

@Berumen354

You have an error in your robots.txt file.

On line 11 you have Allow: /, a robots.txt file doesn't say what files and directories you can allow, only what you can disallow. The only supported commands for the robots.txt file are "User-agent" and "Disallow".

As the Disallow: /random command is after the invalid command it is possible the Google Searchbot detected an invalid command and because it couldn't process it stopped processing the entire robots.txt file as if it didn't exist at all.

You can validate your robots.txt file using a tool such as the one located at tool.motoricerca.info/robots-checker.phtml
As for why the cacheed version is different to the live version the cached version it what Google see's at the time the spider went through which in the case of your cached link was 6 April 2016 at 16:05:27 GMT.

A new version of your robots.txt file which you could use is...
#The date is August 29th, 1997. #Robots have taken over the world and documentaries cease to be created by humans. #what will happen next?

#Want to join the Docur team?
#E-mail jonbonsilver//at//gmail//dot//com

#Full access for the internet archive.

User-agent: ia_archiver
Disallow: /random

#Every robot that honours the robots.txt standard:

User-agent: *

#Request file from Docur once every second:

Crawl-delay: 1

#Disallowed urls:

#Lets not send bots on a random documentary mission:

Disallow: /random

Disallow: /new-documentaries
#Above is a temp line due to indexing problems.
Disallow: /?page
Disallow: /live-search
Disallow: /vote
Disallow: /favourite
Disallow: /watch-later
Disallow: /save-list
Disallow: /comment
Disallow: /commentlike
Disallow: /commentdislike
Disallow: /add-review
Disallow: /submit-review
Disallow: /add-to/*
Disallow: /post-list
Disallow: /edit-list
Disallow: /documentary-search
Disallow: /new-list-item
Disallow: /settings
Disallow: /notificationread
Disallow: /documentary/*/l
Disallow: */newest
Disallow: */oldest
Disallow: */highest
Disallow: */lowest

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme