Mobile app version of vmapp.org
Login or Join
Holmes151

: GWT: Generate more complete crawl error report I'm a developer in charge of managing Webmasters and related issues (including correcting crawl errors) for dozens (hundreds, maybe?) of active sites

@Holmes151

Posted in: #CrawlErrors #GoogleSearchConsole #Reporting

I'm a developer in charge of managing Webmasters and related issues (including correcting crawl errors) for dozens (hundreds, maybe?) of active sites and as part of my duties I create a report of every discrepancy, including all pages generating a 404 and all pages that link to those pages.

Currently within Webmaster Tools I'm able to download a csv file of all pages with a 404 response, but I'm then having to manually click on every single one of those links and copy the "linked from" field to paste into my spreadsheet. This is extremely tedious and seems unnecessary; I would expect the ability to download all that data at once. I'm ultimately looking for the end result of one csv file that has every url with a 404, but also has every url that links to each one of them.

Am I overlooking this functionality somewhere or does anyone have a good solution?


Edit 1 (2/11/2013):


Example of what the csv output looks like now:

URL,Response Code,News Error,Detected,Category www.abcdef.com/123.php,404,,11/12/13,Not found www.abcdef.com/456.php,404,,11/12/13,Not found


Which is great, but let's say 123.php has 5 pages that link to it. Now I have to duplicate that row in my spreadsheet 4 more times, then go into Webmasters, get all the url's that link to the page, and add that data to my spreadsheet.

The output I would prefer:

URL,Response Code,Linked From,News Error,Detected,Category www.abcdef.com/123.php,404,http://www.ghijkl.com/naughtypage1.php,,11/12/13,Not found www.abcdef.com/123.php,404,http://www.ghijkl.com/naughtypage2.php,,11/12/13,Not found www.abcdef.com/123.php,404,http://www.ghijkl.com/naughtypage3.php,,11/12/13,Not found www.abcdef.com/456.php,404,http://www.ghijkl.com/naughtypage1.php,,11/12/13,Not found www.abcdef.com/456.php,404,http://www.ghijkl.com/naughtypage2.php,,11/12/13,Not found www.abcdef.com/456.php,404,http://www.ghijkl.com/naughtypage3.php,,11/12/13,Not found


Note the (hypothetical) addition of a "Linked From" column, as well as the fact there are only 2 unique URL's now (like before) but all of the "Linked To" pages are shown in one report.


Edit 2 (2/12/2013):


To clarify, my question is less about detecting and correcting 404's, but more about generating a report of what Google has listed as errors. Oftentimes, these errors aren't even valid anymore but I still need documentation to show that Google detected a problem and that problem is now fixed.
Many of the "linked from" url's I find are actually outdated, cached resources. For example, I'll frequently see that the linked-from url is the sitemap, which is actually an old sitemap cached by Google that points to an old page. Neither the sitemap or old page exist, but they still appear in my crawl error reports because they are cached resources.

10.03% popularity Vote Up Vote Down


Login to follow query

More posts by @Holmes151

3 Comments

Sorted by latest first Latest Oldest Best

 

@Phylliss660

There is another tool out there that I use called Screaming Frog. It's kind of a Swiss Army knife - www.screamingfrog.co.uk/seo-spider/

From the way you describe your duties, you would need the commercial version as there is a cap on the free version. One key feature is it crawls your site and provides a listing of URLs by Response code whether it's a 404 or some other one. You can also see the referencing links and export the data. There are options that allow you to use or ignore your site's robots.txt file. The data can be exported as well.

And if you use Stephen's suggestion about reading the log file, you might want to check out Splunk.com. It allows you to deep dive the logs and create reports.

10% popularity Vote Up Vote Down


 

@Heady270

Instead of relying on webmaster tools, you can get your own list of 404 errors from your servers log files.

I find that the most important 404 errors are the ones that users actually encounter when clicking on a link. Those errors typically have a referrer associated them. I generate separate reports of errors with referrers and without. Both reports get sorted by number of occurrences in the log file. The whole process can be automated easily with a bit of shell scripting.

10% popularity Vote Up Vote Down


 

@Nimeshi995

Your question seems a little vague and hard to understand exactly what the problem is so If my answer is incorrect then it might be wise to alter your question and make it more direct and simple what you can't do and what you want it to do.

Downloading 404's from Google Webmaster Tools

You can download all the URLS using the Google Webmaster Tools as you know - you may not know but you don't need to check box any boxes nor do you have to increase the view to 500 rows as it'll download the lot as CSV. CSV files are basically plain text files with columns and rows nothing fancy - so as you would expect they do not contain a href based hyperlinks within them and all they have is the text. Normally you can make these links by entering the entry and pressing enter and then clicking them and this can be daunting if you have 1,000s to do.

Converting CSV Text Links to Click-able Links in Excel 2010

Now you also failed to mention what program your using as the methods vary from spreadsheet program, but I'm going to use Excel as this generally is the most comonly used and if your using something else then your need to Google but least you know where to start.

In Excel and other spreadsheet programs you can make macros that pretty much do anything you want them too within the boundaries of their coding of course. In excel you can easily convert all text links to actual clickable links by running a macro.

Finding the Macro' Section in Excel 2010

To find the Macro's section or well at least anything in Excel 2010 is annoying but you can find it in...


Tab View > Macros (Far Right)


Making your own Macro

You want to make your own macro to convert the text links into hyper links, so ensure that you have the spreadsheet loaded and highly all entries that you want to make links.

Go to the Macros button and click the view macros.

Type a Macro name in such as "Make Hyperlinks" and click the create button.

Once your in the Macro Maker highlight everything and press the delete key (assuming you don't have any of your own macros already made)

Then copy in this code in:

Sub HyperAdd()
'
' Google Webmaster Tools CSV Hyperlink Maker
'
For Each xCell In Selection
ActiveSheet.Hyperlinks.Add Anchor:=xCell, Address:=xCell.Formula
Next xCell
'
'
End Sub


Then look for the tiny GREEN play icon on the same page, assuming you find it and you have all text links highlighted this will make them clickable hyperlinks.

It will save the macro to the spreadsheet for later use.

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme