Mobile app version of vmapp.org
Login or Join
Cooney921

: Stop bots from crawling old links with extensions I've recently switched to MVC3 which is extension-less for the URL's, but Google and Bing have a wealth of links that they are crawling which

@Cooney921

Posted in: #Googlebot #RobotsTxt #WebCrawlers

I've recently switched to MVC3 which is extension-less for the URL's, but Google and Bing have a wealth of links that they are crawling which no longer exist.

So I'm trying to find out if there is a way to format robots.txt (or by some other method) to tell google/bing that any link that ends in an extension isn't a valid link... Is this possible?

On pages that I'm concerned about a User having saved as a fav I'm displaying a 404 page that lists the links to take once they are redirected to the new page (I decided to not just redirect them as I don't want to maintain these forever). For Google/Bing sake I do have the canonical tag in the header.

User-agent: *
Allow: /
Disallow: /*.*


EDIT: I just added the 3rd line (in text above) and it APPEARS to do what I'm wanting. Allow a path, but disallow a file. Can anyone confirm this?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Cooney921

1 Comments

Sorted by latest first Latest Oldest Best

 

@Cofer257

First, the "Allow" directive in your robots.txt does nothing as robots spider everything by default.

Blocking robots from *.* is probably OK in some situations, but remember that you are blocking every URL that simply contains a dot. A more reliable method may be blocking individual extensions (if there are not too many) eg *.html and *.php on separate lines.

The preferred method of moving to new pages is a 301 Redirect, which should always be used unless technically difficult. (Although they are 'permanent' redirects you do not need to maintain them forever: a few months is fine.) It's better for users too, as they get a seamless experience.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme