: Google is displaying a URL in my site's index with the cache of that result being completely different Ok, this may turn out to be a silly question but I have observed Google displaying a
Ok, this may turn out to be a silly question but I have observed Google displaying a URL in the index with the cache of that result being completely different when it shouldn't even be there in the first place.
Description:
I built a random function for the website docur.co
The function initiates with a request to:
docur.co/random
The robots are blocked from this URL:
docur.co/robots.txt
However Google has followed this URL and produced the following search result:
This is the cache:
My question is: Can anyone tell me what exactly is going on here? As aforementioned, I may have done something wrong...
Update:
Maybe adding the re="nofollow" directly to the anchor on top of the robots directive will ensure that Google will not follow the URL?
More posts by @Si4351233
1 Comments
Sorted by latest first Latest Oldest Best
You have an error in your robots.txt file.
On line 11 you have Allow: /, a robots.txt file doesn't say what files and directories you can allow, only what you can disallow. The only supported commands for the robots.txt file are "User-agent" and "Disallow".
As the Disallow: /random command is after the invalid command it is possible the Google Searchbot detected an invalid command and because it couldn't process it stopped processing the entire robots.txt file as if it didn't exist at all.
You can validate your robots.txt file using a tool such as the one located at tool.motoricerca.info/robots-checker.phtml
As for why the cacheed version is different to the live version the cached version it what Google see's at the time the spider went through which in the case of your cached link was 6 April 2016 at 16:05:27 GMT.
A new version of your robots.txt file which you could use is...
#The date is August 29th, 1997. #Robots have taken over the world and documentaries cease to be created by humans. #what will happen next?
#Want to join the Docur team?
#E-mail jonbonsilver//at//gmail//dot//com
#Full access for the internet archive.
User-agent: ia_archiver
Disallow: /random
#Every robot that honours the robots.txt standard:
User-agent: *
#Request file from Docur once every second:
Crawl-delay: 1
#Disallowed urls:
#Lets not send bots on a random documentary mission:
Disallow: /random
Disallow: /new-documentaries
#Above is a temp line due to indexing problems.
Disallow: /?page
Disallow: /live-search
Disallow: /vote
Disallow: /favourite
Disallow: /watch-later
Disallow: /save-list
Disallow: /comment
Disallow: /commentlike
Disallow: /commentdislike
Disallow: /add-review
Disallow: /submit-review
Disallow: /add-to/*
Disallow: /post-list
Disallow: /edit-list
Disallow: /documentary-search
Disallow: /new-list-item
Disallow: /settings
Disallow: /notificationread
Disallow: /documentary/*/l
Disallow: */newest
Disallow: */oldest
Disallow: */highest
Disallow: */lowest
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.