: How does Google discover and index URLs that aren't linked or in the sitemap I can see multiple URLs of my website are crawled by google. I see it using site: in Google search. I was wondering

Posted in: #Google #GoogleIndex #Indexing #Seo

I can see multiple URLs of my website are crawled by google. I see it using site: in Google search.

I was wondering what are all possible places from where Google picks these URLs? I checked many of my crawled URLs are not in the sitemap, and we haven't put link of these URLs on any other page too. How would Google discover such content?

Is there anyway I can check all my Google indexed URLs and get information regarding how Google discovered those pages?

10.02% popularity Vote Up Vote Down

: SEO Ranking: Can we show pop ups after few seconds as google recommends not to show pop up that covers main content My question is related to this question: Google Search Console "Fetch

@Reiling115

Posted in: #GoogleSearchConsole #Modal #PopUp #Seo

1 Comments

: Bazaarvoice subdomain indexing My old website had reviews and ratings on a subdomain from bazaarvoice. Later the website migrated and bazaarvoice blocked the subdomain. However, all of a sudden

@Reiling115

Posted in: #RichSnippets #Seo

0 Comments

: How do I configure gzip to work on External Resources In order to do this I put code to .htaccess: <ifModule mod_gzip.c> mod_gzip_on Yes mod_gzip_dechunk Yes mod_gzip_item_include file .(html?|txt|css|js|php|pl)$

@Reiling115

Posted in: #Apache2 #Compression #Gzip #Htaccess #Seo

1 Comments

: Why doesn't Edge/16.17025 send referrers? Does anyone know why Edge/16.17025 wouldn't send referrers? Are there any registry keys or options to control this? There are meta tags on the page

@Reiling115

Posted in: #Browsers #BrowserSupport #Microsoft #Referrer

0 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Cugini213

Recently had the same issue and was puzzled about how Google knew about an internal URL at my site.

The directory in question for me was /piwik (an open source alternative to google analytics)..

So Google also crawls links in your source files (like html). If there are links in there like in <meta> or <script> urls in here </script>, Google will crawl and index away..

10% popularity Vote Up Vote Down

@Nimeshi995

There are many places Google can go to index your site pages. Your sitemap, and what's on your live site, are only a small part of it. Your XML sitemap is merely a signal to Google, Bing, and other search engines to index your most important pages and to take note of new content (if you're using a CMS and a plugin that automatically updates the sitemap.)

When Google gets into your site, it follows all kinds of links, not just page-level links. It can index files, taxonomies, multiple versions of pages... In a CMS like Drupal, where everything is a node, it can even index portions of pages.

This is why it's important that you know your CMS and how it works on the backend. You have to use a combination of noindex meta, canonicalization, redirects, robots.txt, and Search Console / Bing Webmaster to control what's being crawled/indexed and what isn't.

Using Search Console to look at inbound links, Moz's Open Site Explorer to analyze the linkscape of any individual page, and a tool like Screaming Frog SEO Spider (the first one is free, the second and third are freemium) will allow you to analyze both internal and external links. Between all of these, you should be able to diagnose the source.

10% popularity Vote Up Vote Down

Feed

: How does Google discover and index URLs that aren't linked or in the sitemap I can see multiple URLs of my website are crawled by google. I see it using site: in Google search. I was wondering

More posts by @Reiling115

: SEO Ranking: Can we show pop ups after few seconds as google recommends not to show pop up that covers main content My question is related to this question: Google Search Console "Fetch

: Bazaarvoice subdomain indexing My old website had reviews and ratings on a subdomain from bazaarvoice. Later the website migrated and bazaarvoice blocked the subdomain. However, all of a sudden

: How do I configure gzip to work on External Resources In order to do this I put code to .htaccess: <ifModule mod_gzip.c> mod_gzip_on Yes mod_gzip_dechunk Yes mod_gzip_item_include file .(html?|txt|css|js|php|pl)$

: Why doesn't Edge/16.17025 send referrers? Does anyone know why Edge/16.17025 wouldn't send referrers? Are there any registry keys or options to control this? There are meta tags on the page

Login to post a comment!

2 Comments

Back to top | Use Dark Theme