Case Study #4 – Evidence for the critics 3


Does themeing play a part in Search Engine Rankings? (Original report  data published 31st July 2008)

The Idea behind this report

I believe that:

“With all other things being equal, the page that is the best themed for a search term, will rank higher than the less well themed pages”.

Now, this is not something that I can prove because all “other things” will never be equal. All I can do is look for evidence to back up my theory. This isn’t as easy as it may sound, since ranking well in Google depends on a lot of factors, including off-page factors such as inbound links.

It’s not even a question of choosing pages with identical page rank for a study like this, since page rank is achieved through inbound links, but not all inbound links are equal. Each one can have an entirely different thing to say about the page it links to.

E.g. Two pages about “widgets” may both have a Page Rank 3. One page may have a predominance of inbound links using the link text “blue widgets”, while the other page may have a predominance of inbound links saying simply “green widgets”. If I were to search for “Blue widget”, I would expect to see the former page appear higher, yet both pages are about widgets. Searching simply for “green widgets” might bring a different order.

Since it is impossible to get “all other things” to be the same, I thought to heck with those “other things”. I simply wanted to see how themeing was spread across the search results irrespective of off-page factors, so this is what I did.

  1. I searched for the term gestational diabetes at Google. The term was chosen for no other reason than I am interested in this topic, and there is a lot of competition for this term.
  2. I copied the on-page text from pages that ranked 1-10, 100-109 and 361-372 (I had to skip a couple because they were PDF files, and I wanted to compare web pages). To do this, I loaded the web page in Firefox, clicked inside the page somewhere, and pressed CTRL + A to highlight everything, and CTRL + C to copy it to the clipboard.
  3. I pasted (CTRL + V) all 30 articles into the Fat Content Creator software that comes with my “Fat Content Course”.
  4. I used KRA Pro to find some theme words for this topic. KRA Pro returned a lot of potential theme words (over 100), which needed whittling down. Fortunately, it returns them in order of importance, so I could work from the top, and trim any words that I felt were not 100% related to this topic. I ended up with 39 words that I felt were related to this topic.
  5. I loaded the theme words into the Fat Content Creator for each of the 30 articles, and began checking the articles using the built in “Theme Report”.
  6. I took a screenshot of each Theme Report Summary, so I could include them in this short report. You will see all 30 screenshots from the theme reports later in this report.
  7. I then took data averages from each of the three groups of articles, so that we could compare the themeing of the pages in the top 10 of Google, with those ranking around the 100 mark, and those ranking below 340.

Be aware that the tool I am using in this study to check themeing of web pages, is an article editor, it is not a tool to reverse engineer Google or any other search engine. This tool merely looks at words on pages (which is what I am interested in since I want to see if themeing is important).

Before I show you the results, let me give you the theme words I chose, since these are obviously of interest.

age

baby

birth

blood

body

care

cause

child

condition

diabetes

diet

disease

doctor

during

eat

exercise

exercising

food

gestational

glucose

health

help

high

info

insulin

late

level

low

pregnancy

pregnant

prevent

research

risk

sugar

test

treatment

type

weigh

women


And the Results…….


Top 10 pages Theme Reports
These are screenshots from the theme report generated by the Fat Content Creator.

clip_image002clip_image004
clip_image006clip_image008

clip_image010clip_image012

clip_image014clip_image016

clip_image018clip_image020

Averages of top 10
Different Theme Words Used – 34.9

Themed % – 19.6 (I theme word every 5.1 article words

Percentage of Theme Words Used – 89.5%

Quality Theme Score – 86.9%

Results for 100 – 109
These are screenshots from the theme report generated by the Fat Content Creator.

clip_image022clip_image024

clip_image026clip_image028

clip_image030clip_image032

clip_image034clip_image036

clip_image038clip_image040

Averages 100-109
Different Theme Words Used – 27.8

Themed % – 18.2 (I theme word every 5.5 article words

Percentage of Theme Words Used – 71.2%

Quality Theme Score – 64.9%

Results for Pages 340-351 (two pages were omitted because they were PDFs)
These are screenshots from the theme report generated by the Fat Content Creator.

clip_image042clip_image044

clip_image046clip_image048

clip_image050clip_image052

clip_image054clip_image056

clip_image058clip_image060

Averages 340-351
Different Theme Words Used – 19.2

Themed % – 14.4 (I theme word every 6.9 article words

Percentage of Theme Words Used – 49.2%

Quality Theme Score – 52.0%

Comparing the Summaries Side By Side

Parameter

 

Top 10

100-109

340-351

# Different Theme Words Used

 

34.9

27.8

19.2

Themed %

 

19.6

18.2

14.4

Percentage of Theme Words Used

 

89.5

71.2

49.2

Quality Theme Score*

 

86.9

64.9

52.0

*Quality Theme Score is a calculation that the software makes to give an idea of how well themed an articles is. It takes into account the size of the article, how many (and the range of) theme words are used as well as trying to look for keyword stuffing. It is a calculation that was created to give an idea of how well the article is themed for the selected theme words.
It is also a useful metric when themeing content because it will tell you if you are “over-themeing” your content. While it is true that pages reach the top of Google with very high theme percentages, as an author of new content, it is better to take the cautious route.

This score was never intended to help “reverse engineer” Google rankings, because the score is calculated using theme words chosen by the user. The type of analysis I have done will fail if your theme words are not the right ones. It is for this reason that when I found the theme words for this study, I turned to Google and used KRA Pro to tell me what was important.

Looking across the row, we can see that as we move down the search engine positions:

  1. Number of different theme words used DECREASES
  2. Themed Percentage DECREASES
  3. Percentage of Theme Words Used DECREASES
  4. Quality Theme Score DECREASES

I will leave you to draw your own conclusions from this study, but to my mind it’s yet more evidence that themeing is important in the search engine algorithms. However, as always, we must be aware that on-page factors are only part of the equation.

My personal belief is that themeing is very important in search engine rankings. It is common knowledge that Latent Semantic Indexing is a technology used by the search engines in their ranking algorithms, so this should not be a surprise to anyone.

What do you think?

Resources:

KRA Pro – Keyword Analysis & Site Blueprinting Tool. KRA Pro was used in this report to find theme words for “gestational diabetes”.

Creating Fat Content Course – including the Fat Content Creator Software used in the analysis of web pages in this report.

 

Appendix

After completing this report, I realized that people who do not know me might think I just made up the Theme Report Summaries and that they were not based on real pages.

I therefore went back to Google, and found the sites again that I analyzed, taking screenshots of the Google results pages to show you exactly which sites I used in this report and where they ranked in Google. The screenshots also show that I did not pick and choose the sites in this study to make my theory fit (which I was accused of doing by one person in the earlier theme reports). Instead I searched for gestational diabetes, and then took 10 consecutive web pages from the top, middle and bottom of the search results from Google’s main index.

The first screenshot is the last page of Google results for this phrase. Although Google initially reported 1.2 million matching pages, in fact, it only serves up 449 pages from its main index. Screenshot of the last page of results is shown on the next page.

The last page of the search results show that in fact there were only 449 pages that Google considered important and unique enough to show us.

clip_image062
The next screenshot shows the top 10 results used in this study:

clip_image064

And the pages ranking 101-110

clip_image066
Next screenshot shows the pages from 340 – 351. These are split across multiple results pages because of a shift in rankings:

clip_image068

And the next 8:

clip_image070

and the last of the 10:

clip_image072

The pages chosen in this last batch, while split across three search engine pages represent 10 consecutive pages in the results ignoring the two PDF files that were there.


Leave a comment

Your email address will not be published. Required fields are marked *

3 thoughts on “Case Study #4 – Evidence for the critics

  • Andy Beard

    Take a look at that “last page of SERPs” again – every result is from books.google.com

    I think you need to cross reference your data more, as for instance there is a strong correlation between length of article and SERP, and length of article and keyword use.

    Those first 10 results smell of just one thing – massive authority

    That they might also contain useful thick articles might be irrelevant, or this is a chicken/egg scenario.

    • Andy

      I actually think you have a chicken/egg scenario here and in much the same way as the egg obviously came first (neither of the parents of the first chicken was a chicken) so too the quality content came before the authority status of the sites.

      To write a good article on gestational diabetes (as in this study), the article will need to be long and obviously the longer an article, the more related theme words there will be (if written naturally by someone who knows the topic). Shorter articles just wont cover the material.

      The whole point of this (and several other studies) was to show that articles need to cover a range of words and phrases related to the topic of the article. Without those words and phrases you wont compete for any mildly competitive term. Authority sites are authority sites BECAUSE they have shown that they know their stuff and create quality content, but to do so, they have to write longer, theme word rich articles.

  • Stu

    Hi Andy

    Excellent tutorial

    Its is very refreshing to see all these ideas on one page. I totally agree with your ideas on this subject as I have been using this method for years and it works beautifully.
    What I cannot understand is how an earth do some websites with no content, no meta keywords, not optimised and some with lots of flash, end up in the number one slot??
    I have checked the age of some of these sites and that is not the reason they are at the top, and they are not adword adverts either.
    I cant give any examples here, but I guess that you will know what I mean.