: Why treat these as URLs with different path capitalization and trailing slash as different? These are all strictly different urls: http://www.example.com/page http://www.example.com/pAge http://www.example.com/page/
These are all strictly different urls:
www.example.com/page http://www.example.com/pAge www.example.com/page/ http://www.example.com/paGE/
I get that it conforms to the strict ISO rules, but why? How many websites are there out there that actually treat page and page/ as different url's you can visit? Or actually use capitalisation to differentiate content? If they did I would tell them they are probably doing it wrong.
Why do we have to waste our time conforming to these rules? Isn't it quite trivial for Google to work out that page and page/ are the same page and probably shouldn't be treated as duplicate content?
More posts by @Moriarity557
3 Comments
Sorted by latest first Latest Oldest Best
No offense intended, but Case Sensitivity is VITAL to urls today - they are used millions of times a day:
bit.ly
bit.ly/ri2LhQ http://bit.ly/ri2LHq
Two vastly different sites - only possible because of case sensitivity
I get that it conforms to the strict ISO rules, but why?
There are different operating systems behind the various servers on the net, and for some of them a directory or file named page is not the same as one named Page. The result is that those really are two different locations and not even necessarily the same type of location(dir/page). The web server might be configured as case-insensitive, but you can't assume that. Therefore, the rules have to assume things do care about case and if they don't then whatever. Realistically, it's probably not a great idea to rely on case differences, but the situation does exist and so it has to be accounted for, sometimes with things like mod_speling.
How many websites are there out there that actually treat page and page/ as different url's you can visit?
They are different. It's just almost always hidden from you:
When you go to example.com/foo/ the web server is aware you're going to a directory, and so looks for a file in there matching whatever it's configured to recognize as a directory index. So eventually you end up at example.com/index.html for example.
If you go to example.com/foo the server does actually look for a file in the root directory named just foo. If it doesn't find one, then it checks if there's a directory named /foo and you can go up to #1 .
What you seem to be reading as "normal" behavior in #2 is actually a fallback to handle a likely case.
How many do use extension-less filenames is irrelevant. Again: real problem; needs to be accounted for.
If they did I would tell them they are probably doing it wrong.
That is an opinion.
You can back it up with various practical arguments about case-insensitivity and how to handle extension-less URLs that I don't necessarily disagree with, but factually you would be wrong to say this.
This is not a Google policy, they are basics rules.
From a windows user point of view it is difficult to understand case-sensitive filenames. However, under unix/linux systems, pAge and page are not the same files nor directories, and so on webservers.
The trailing slash is a configuration issue (or choice).
Keep in mind that on most web servers, the server will issue a 30x redirect on /page two /page/, thus, requiring a second request to your server.
You can make your web server case insensitive and configure it in any way you want to comply to your own rules.
But again, it is not related to Google at all
Terms of Use Create Support ticket Your support tickets Stock Market News! © vmapp.org2024 All Rights reserved.