3. Site Autopsy – Check #1 2

When I start looking at a website to find possible reasons for poor rankings, or penalties, I always start on the site itself, and carry out a number of checks.  If the site itself is poor, then there is often little need to look further.

One of the first things I check when I look at any site is the sitemap.  Using the sitemap, it is often easy to spot pages that might have duplicate content.  Now, the Cayenne site does does have a sitemap, created in HTML, and I could look at that.  However, I use a tool that I have developed for my own personal use (it is not currently available to buy, but will be later this year).   This tool offers me a lot of different reports on a site, including the sitemap and I’ll be using the same tool for a few different checks in this autopsy.  However, for my tool to work, it needs to spider a site, from an XML sitemap.  The Cayenne site does not have an XML sitemap, so we’ll need to create one that my tool can use.

It’s fairly easy and takes only seconds to do. If you want to do it for one of your sites, then head on over to XML-Sitemaps.

NOTE: This tool assumes that all pages are ultimately reachable from the homepage, so if you have a bad site structure, it may not find all of your pages, but that is also a major concern for Google rankings.  Also note that this tool has a limit of 500 pages, so you’ll need to look elsewhere if you want to analyze massive sites.

On the XML Sitemaps site, enter in the URL of the site homepage. 


.. and click the Start button.

You will then be able to follow the progress as the script creates an XML sitemap:


When it is complete, you’ll be given a link to download the XML Sitemap.


I actually don’t need to download it.  I can just right click and copy the URL.  All I need to do then is to enter a couple of bits of information into my tool:


The top line is the homepage URL of the site we are interested in.  The second line is the URL of the XML sitemap.  I can then parse the sitemap by clicking a button.  Depending on the size of your site, it can take a while, since every page is downloads and analyzed.

Once done, you can then run various reports.  For this first check, I want perhaps the simplest of all reports – a list of website URLs:


This simply returns a list of all pages found on the sitemap, and by ordering them by file name, I can quickly check that filenames are sufficiently different to warrant separate pages of content.  I’ll show you an example of a bad site in a moment, but first, here is a snapshot of the URLs of the Cayenne site:


Each page on the site has been named according to the title of the article, so I can check that each article is sufficiently different.    I am looking for two or more pages that might contain the same information.  I cannot find any on this site, which is great.  All of the content seems to be quite narrow in it’s focus (e.g. concentrating on one disease problem and how Cayenne pepper can help it) with very little overlap.

So the Cayenne site passed the first test.  Let me show you an example of a site that gives me some concerns.  Here are a few pages on the sitemap (and you can see how ordering the URLs by filename helps):


There is one file called “new home construction checklist 2” and another called “new home construction checklist”.  Is there a need for two articles that are apparently about the same thing?  If they are totally different articles, on totally different aspects of the work, then give them better or more descriptive titles and filenames.  If they are similar in content, merge them into one article.

Let’s look at another example on the same website:


And another example of some pages I would need to look into:


And this one:


It’s this last example that really highlights a problem of many older websites.  Content was built around keyword phrases.  These keywords were identified using keyword research tools, to find high demand low competition phrases, and then a page was set up for each one, with the sole purpose of ranking high for that one phrase.  Clearly patio design and patio designs were two separate phrases this webmaster was trying to rank for, with the page per phrase tactic, that used to work.

Keep an eye out for the next post in this autopsy, as we continue to evaluate the Cayenne pepper website, to try to find out why it was apparently penalized.

Leave a comment

Your email address will not be published. Required fields are marked *

2 thoughts on “3. Site Autopsy – Check #1

  • Marshall Estes

    Andy > In my site, I have a couple of pages where page 2 is a continuation of an article. I split it into two pages so the pages were about 500 to 600 words each rather than one long page more than 1000 words. Has Google stopped penalizing for long articles like I heard they used to do. If so, I will combine them and do a redirect in my htaccess file.

    • Andy Williams Post author

      Two things here. Firstly, I have never believed Google penalized for long pages. I have some pages on my sites that have 5000 words. A more important question is, does splitting a long article make it easier for your visitor? Probably. However, if you are just splitting for SEO reasons, don’t. Think what is best for the visitor. Some sites overdo the article splitting and have several pages for an article that really only should be on one page, and they do this for SEO reasons, and to expose their visitors to more ads during the read.

      OK, second point. If you do split a long article into shorter ones, I’d recommend you rename the files so that the filename accurately reflects the content on each page. Adding the -2 to the end of the main article title (and filename) is lazy, and something WordPress will do by default if you try to save two posts with the same filename.