Mobile app version of vmapp.org
Login or Join
Heady270

: Does Googlebot add any parameters to a URL when crawling? I have set up a memory caching system that will use the URL as a key to cache a copy of a HTML page in memory in order to make

@Heady270

Posted in: #Googlebot #UrlParameters

I have set up a memory caching system that will use the URL as a key to cache a copy of a HTML page in memory in order to make it faster. The result is bringing page load down from around 4 seconds to under a second.

The flaw in this is that if someone were to add something like ?test=1 to the URL it would not get a hit in the memory cache. This is fine as there are several pages on the site such as filters that require the URL variables make the page unique.

My issue is that I am worried Google spiders add custom variables to the URL when crawling that cause them to miss the cache. In order to get the best possible search result ranking, I want the page to load super quick. I was thinking I could filter out specific URL parameters when checking the cache so something like a timestamp from a spider wouldn't cause it to refresh the cache.

Anyone know what URL parameters, if any, Googlebot will add when crawling a web page?

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Heady270

2 Comments

Sorted by latest first Latest Oldest Best

 

@Nickens628

To answer your basic question, Googlebot only uses URLs to your site it finds on any of your webpages as well as from webpages from other domains that link to you. It doesn't create any other "magical" URLs.


I have set up a memory caching system that will use the URL as a key to cache a copy of a HTML page in memory in order to make it faster.


You probably want to reword your question to... How do I make the page load faster in all environments

First, let's point out a big issue that you posted in comments:


apache is adding latency, usually around 600ms which i cant seem to reduce despite me best efforts so i have had to focus on making up for it on the php side of things.


PHP doesn't solve apache speed issues. In fact, using the wrong configuration in any web service program can slow things down.

Test your configuration locally. Login to the same server where the webpages are stored via shell and try to access one of your pages. If your server is linux based, login to the shell and use a command like wget <insert URL> and you'll get basic stats on how long the page took to download and process. Convert the speed into seconds. The result should be lower than about 20ms (or 0.020 seconds). If the number is bigger than 50ms (or 0.05 seconds) then create an HTML only page on the same server and try accessing that page instead. If the numbers look more normal, then your PHP code needs severe optimization. If the numbers still look very high, then you need to reconfigure apache itself and eliminate any modules you don't need.

If you really want to go crazy on the details on why a page takes so slow to load and you're advanced enough of a computer user, then login to the server and watch the application run internally. In linux, there is a program strace that will allow you to do just that.

If even after all my advice your local speed is still too slow, then consider having your web server disk and memory checked for errors (including data corruption). If there are some then slowdowns will result in all services on the server.

Once speed is ok on the server, go to webpagetest.org and test your pages from different areas. That site allows you to test your page from all over the world. Start locally first and you should see your pages load very quickly. It also offers you suggestions if your grades are low on that site.

And please put all static content in the user's browser cache by using appropriate HTTP headers, so that the exact same image doesn't get constantly loaded from the server for every page visited. Redbot.org can help you determine if your caching setup in HTTP is correct and optimized.

10% popularity Vote Up Vote Down


 

@Holmes151

Googlebot does not add any additional URL parameters of its own when it crawls your site.

The "complete" URLs that Googlebot crawls (which may or may not include URL parameters) are URLs that have been discovered, either on your site or on external sites that link to you.

If you find that Googlebot is crawling with unexpected URLs / URL parameters, it may indicate a misconfiguration on your own site or some other site(s) are targeting you and maliciously linking to keyword-rich URLs (if your site is susceptible) in order to control your SEO.

A related question, although not necessarily relevant to you unless you are using tracking parameters. Although these are perhaps URL parameters that could be ignored from your caching algorithm:


Is there a set of well-known tracking parameters besides utm_*?

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme