Mobile app version of vmapp.org
Login or Join
Angela700

: Apache directory pages in default setup = duplicate content? I'm trying to figure out if google or any other service would treat apache folder pages as duplicate content if in each access attempt

@Angela700

Posted in: #Apache2 #Directory #DirectoryListing #DuplicateContent #Url

I'm trying to figure out if google or any other service would treat apache folder pages as duplicate content if in each access attempt a random number of slashes are used. For example, these four URLs would produce the exact same file listing:
example.com/folder http://example.com//folder example.com/folder/ http://example.com/folder// example.com//folder///

If so, am I better off making an apache module to redirect the URLs above to just one, or should I make a module that reproduces the directory listing but adds a directive for robots to not index the pages, or is there a special apache directive I can use to solve this issue, or can I just do nothing and assume Google treats directory listings differently from regular webpages?

I don't want to disable directory listings tho as some clients with lack of knowledge of computing require access to them.

10.01% popularity Vote Up Vote Down


Login to follow query

More posts by @Angela700

1 Comments

Sorted by latest first Latest Oldest Best

 

@Ann8826881

Potentially, having the the same resource accessible on multiple URLs (ie. multiple slashes) is duplicate content. However, whether this is really a duplicate content problem is another matter.

For it to be a "problem" the search engines need to be finding references of these URLs before it is going to start crawling them. And it's likely to need a significant number of these "malformed" URLs before it's going to start competing with the preferred URL.

Unless your site is generating (and linking) these malformed URLs (this would definitely need to be fixed) you probably don't need to do anything.

You can see from your access logs whether these malformed URLs are being accessed.


if in each access attempt a random number of slashes are used.


How would this be happening? This seems highly unlikely. If I saw this in the logs I would suspect a possible bad-bot (after discounting a rogue script) and consider blocking the request.


is there a special apache directive I can use to solve this issue


There is no single Apache directive that I am aware of. Apache simply "collapses" (behind the scenes) the slashes when requesting a resource. The slashes are still present in the URL. But what would be an "accepted solution"? Both redirecting and rejecting such a malformed request could be valid.

You could also set a rel="canonical" HTTP response header to resolve any ambiguity.


assume Google treats directory listings differently from regular webpages?


A webpage is a webpage, however a "directory listing" is likely to be considered low quality (content?) anyway and is probably only going to be returned in the SERPs (if at all) if the user is specifically looking for this. So, whether it appears in the SERPs or not, may not be an issue anyway?

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme