Mobile app version of vmapp.org
Login or Join
Nimeshi995

: Site was hacked months ago. How do i remove created pages from googles listings My site was hacked about 4-6 months ago and as a result lots of pages were created on my site. These pages

@Nimeshi995

Posted in: #GoogleSearch #GoogleSearchConsole #HackedSite

My site was hacked about 4-6 months ago and as a result lots of pages were created on my site. These pages no longer exist and were removed a week after the hack but they still exist in googles search results and i have over 1,000 crawl errors in google search console. I thought google would have removed these pages by now.

The pages all seem to refer to the following path:

mydomain.com/glpkvn[number here]/lity/[number here]

Where [number here] is a randomly generated number.

What is the best way to remove these from the google search results and also tidy up my google search console?

Thanks

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Nimeshi995

2 Comments

Sorted by latest first Latest Oldest Best

 

@Alves908

While I am a firm believer of a 410 over a 404 error response, this is highly dependent upon Google actually visiting each page one at a time. If your site does not enjoy frequent visitations from Googlebot as a result of being considered a highly fresh and trendy site, then this would mean that it could take quite some time for Google to find each page before removing them.

When a site is hacked, it is often quite impossible that each URL be removed using the Remove URL option in Google Search Console though this remains an option with limitations of course. More on this later.

One potentially faster option is to use the robots.txt file.

Google will visit the robots.txt each time it visits your site providing that it has not fetched a fresh copy of the robots.txt file within 24 hours. This is seen as a reasonable compromise to fetching the robots.txt file each time Google visits or fetching the robots.txt too infrequently. Prior, there was no standard for this and there were always detractors for either reading the robots.txt file too frequently or not frequently enough. Yes. Sometimes Google cannot win.

When the robots.txt is fetched, it is saved within the index and applied as Googlebot goes about it's business. However, there is also a process that applies regular expressions (regex) rules easily derived from the rules found within the robots.txt and removes URLs and pages found within the index. This is not done immediately, likely to avoid short-term mistakes made by the webmaster, however, because robots.txt is taken very seriously as pivotal rules mechanism for well behaved robots, Google will apply it fairly quickly. It may still take days or weeks, however, it is done in bulk.

For this reason, the robots.txt is often the fastest way to remove URLs providing that they can be specified by a pattern. While not every search engine treats the robots.txt directives equally, fortunately, Google does allow wildcards giving you a serious advantage.

User-agent: Googlebot
Disallow: /glpkvn*/


According to page: support.google.com/webmasters/answer/6062596?hl=en&ref_topic=6061961 under Pattern-matching rules to streamline your robots.txt code, you will see a similar example.

Google does not guarantee that the URLs will be removed and states that it will take some time to remove the URLs.
support.google.com/webmasters/answer/7424835?hl=en&ref_topic=6061961#h17 support.google.com/webmasters/answer/7424835?hl=en&ref_topic=6061961#h18
However, it has been my experience that this method works and works faster than waiting for Google to fetch each page one at a time.

One warning. If you do block Google from fetching these pages via the robots.txt file, Google will not see a 404 or 410 error for the page. You have to choose one method or another. Google does recommend using the Google Search Console to remove URLs.

I prefer to wait for Google to remove pages naturally using a 404. A 410 error is faster since each 404 is retested several times before removing. However, given that your site has been hacked and these pages remain within the search results, it may be wise to attempt to remove the pages using another method. I have personally removed pages in bulk using this method though it was a couple of years ago. Which one you use is up to you.

10% popularity Vote Up Vote Down


 

@LarsenBagley505

What is the best way to remove these from the google search results


Make such affected pages return a 410 HTTP code.

You can use the apache mod_rewrite module (or equivalent) and write a server configuration script that checks to see if a certain pattern exists in the URL and if it does, then user will see a 410 page.

If your webserver is apache, then in the document root folder create a file named .htaccess and depending on your specific situation, you can add any of these lines:

RewriteRule ^glpkvn([0-9]+)/lity/([0-9]+)$ [R=410,L]


This line (above) checks to see if the URL is example.com/glpkvn####/lity/#### (where #### is any number of numerical digits) and if theres a match, then rule processing stops and user gets sent to a page with a 410 HTTP status.

RewriteRule ^glpkvn(.*)$ [R=410,L]


This line checks to see if the URL starts with example.com/glpkvn and if it does then return the 410 HTTP status page.

If you want to allow case insensitivity (meaning you want the URL to start with example.com/glpkvn or example.com/GLPkvn) then add an NC in the options like so:

RewriteRule ^glpkvn(.*)$ [R=410,L,NC]


The reason you need to use 410 status is because 410 stands for gone forever and it signals to Google that it should never attempt to access the page again.


and also tidy up my google search console?


Do the above steps first then when you access search console, delete the bad URLs.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme