Mobile app version of vmapp.org
Login or Join
Kaufman445

: Robots.txt Disallow command How do I use robots.txt to disallow folders which are being crawled due to a bad URL structure? They're currently causing a duplicate page error. The URL which been

@Kaufman445

Posted in: #DuplicateContent #RobotsTxt #SearchEngines

How do I use robots.txt to disallow folders which are being crawled due to a bad URL structure? They're currently causing a duplicate page error.

The URL which been crawled incorrectly is:
abc.com/forum/index.php?option=com_forum

However, The correct page is:
abc.com/index.php?option=com_forum

Is robots.txt a correct way of excluding this? I'm thinking about using:

Disallow: /forum/


Will that not block legitimate content in the /forum/ folder of my site?

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Kaufman445

1 Comments

Sorted by latest first Latest Oldest Best

 

@Dunderdale272

If abc.com/forum/index.php?option=com_forum is the only link that's been mis-read then you can just add Disallow: /forum/index.php to your robots.txt file. Adding Disallow: /forum/ will tell robots to ignore everything in that directory, which doesn't sound like what you want.

You could also add a 301 redirect from abc.com/forum/index.php?option=com_forum to abc.com/index.php?option=com_forum to tell robots what the URL should be. This will also help any users who accidentally land on the wrong URL.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme