Mobile app version of vmapp.org
Login or Join
Heady270

: Google News Crawl Errors Recently one of our clients had their news listings taken off Google listings for many crawl errors, our attempts to get them back online by fixing most errors pointed

@Heady270

Posted in: #GoogleNews #GoogleSearchConsole

Recently one of our clients had their news listings taken off Google listings for many crawl errors, our attempts to get them back online by fixing most errors pointed in our dashboard were successful, however we have now received a strange looking error


The article body that we extracted from the HTML page appears to be too long to be a news article. We generated this error to avoid including what might be an incorrect piece of text. Common causes include news articles that contain user-contributed comments below the article, or HTML layouts that contain other material besides the news article itself.


This has been indicated as an error in a page there is no user content, even the advertisements and related content are being pulled with ajax now. I am trying to find how we can solve this error.

10.02% popularity Vote Up Vote Down


Login to follow query

More posts by @Heady270

2 Comments

Sorted by latest first Latest Oldest Best

 

@Berryessa370

I'm very confused as to the structure of your site, based on Inspect Element. It appears that different sections of your site, such as the sidebars, footers, comments, and other stories sections are strangely mixed together - this may be what is confusing Google as well.

(The following is based on the link you posted that was edited out of your original post)
For instance, if I right click any of the actual text in your story and go to Inspect Element in Chrome, I see the structure as:

html -> body -> div#content -> div#main-body.DIrconflict -> div#main-body-content -> div -> div.wid81.fill.blueline -> div -> div -> div.wid69.fll.lh20.mgr5.mtg5 -> div -> actual text


This seems pretty excessive - are you using a CMS for this site? If it's been customized, I suspect that it wasn't customized particularly well - no offense meant.

As a further point, it appears that the actual text is not in <p> tags, and is instead just regular text inside of the parent <div>. Normally, to achieve space between <p> tags, you could just make a margin on the tag itself - instead, it appears that you place an empty div with a 15px margin between every paragraph - definitely not best practice.

For some further reading about site structure, please check out this page from w3.org - www.w3.org/wiki/HTML/Elements/aside - you'll see in the "Point" section, it lists things like quotes, sidebars, advertising, and groups of nav elements. This will help Google figure out your site in order to crawl it - search results will only come from the main content, but it will still learn about new links to crawl from the navigation sections.

10% popularity Vote Up Vote Down


 

@Eichhorn148

If you have navigation and/or footer with many links, it could be the cause of error alert too. Try to markup navigation, footer and article with html5.

Another thing: HTML source page can be up to 256KB

10% popularity Vote Up Vote Down


Back to top | Use Dark Theme