SEO MARKETING SERVICES

Search Engine Marketing Lessons & Tutorials

SEO Marketing News - Google Changes Daily

Print the article

This entry was posted on 6 April 2006, 1:09 PM and is filed under SEO News.

Google Algorithm Problems


Have you noticed anything different with Google lately?
The webmaster community certainly has, and if recent talk on several search engine optimization (SEO) forums is an indicator, Webmasters are very frustrated. For approximately two years Google has introduced a series of algorithm and filter changes that have led to unpredictable search engine results, and many clean (non-spam) websites have been dropped from the rankings.

Google updates used to be monthly, and then quarterly. Now with so many servers, there seems to be several different search engine results rolling through the servers at any time during a quarter. Part of this is the recent Big Daddy update, which is a Google infrastructure update as much as an algorithm update.

We believe Big Daddy is using a 64 bit architecture. Pages seem to go from a first page ranking to a spot on the 100th page, or worse yet to the Supplemental index. Google algorithm changes started in November 2003 with the Florida update, which now ranks as a legendary event in the Webmaster community. Then came updates named Austin, Brandy, Bourbon, and Jagger. Now we are dealing with the BigDaddy!

The Google algorithm problems seem to fall into 4 categories. There are canonical issues, duplicate content issues, the Sandbox, and supplemental page issues.

1. Canonical Issues: These occur when a search engine treats www.yourdomain.com, yourdomain.com, and yourdomain.com/index.html all as different websites. When Google does this, it then flags the different copies as duplicate content and penalizes them. Also, if the site not penalized is http://yourdomain.com, but all of the websites link to your website using www.yourdomain.com, then the version left in the index will have no ranking. These are basic issues that other major search engines, such as Yahoo and MSN, have no problem dealing with. Google is possibly the greatest search engine in the world (ranking themselves as a 10 on a scale of 1 to 10). They provide tremendous results for a wide range of topics, and yet they cannot get some basic indexing issues resolved.

2. The Sandbox: This has become one of the legends of the search engine world. It appears that websites, or links to them, are "sandboxed" for a period before they are given full rank in the index, kind of like a maturing time. Some even think it is only applied to a set of competitive keywords, because they were the ones being manipulated the most. The Sandbox existence is debated, and Google has nevër officially confirmed it. The hypothesis behind the Sandbox is that Google knows that someone cannot create a 100,000 page website overnight, so they have implemented a type of time penalty for new links and sites before fully making the index.

3. Duplicate Content Issues: These have become a major issue on the Internet. Because web pages drive search engine rankings, black hat SEOs (search engine optimizers) started duplicating entire sites' content under their own domain name, thereby instantly producing a ton of web pages (an example of this would be to download an Encyclopedia onto your website). As a result of this abuse, Google aggressively attacked duplicate content abusers with their algorithm updates. But in the process they knocked out many legitïmate sites as collateral damage. One example occurs when someone scrapes your website. Google sees both sites and may determine the legitïmate one to be the duplicate. About the only thing a Webmaster can do is track down these sites as they are scraped, and submit a sp@m report to Google. Another big issue with duplicate content is that there are a lot of legitïmate uses of duplicate content. News feeds are the most obvious example. A news story is covered by many websites because it is content the viewers want. Any filter will inevitably catch some legitïmate uses.

4. Supplemental Page Issues: Webmasters fondly refer to this as Supplemental Hell. This issue has been reported on places like WebmasterWorld for over a year, but a major shake up around February 23rd has led to a huge outcry from the Webmaster community.

 This recent shakeup was part of the ongoing BigDaddy rollout that should finish this month. This issue is still unclear, but here is what we know. Google has 2 indexes: the Main index that you get when you search, and the Supplemental index that contains pages that are old, no longer active, have received errors, etc.

The Supplemental index is a type of graveyard where web pages go when they are no longer deemed active. No one disputes the need for a Supplemental index. The problem, though, is that active, recent, and clean pages have been showing up in the Supplemental index. Like a dungeon, once they go in, they rarely come out. This issue has been reported with a low noise level for over a year, but the recent February upset has led to a lot of discussion around it. There is not a lot we know about this issue, and no one can seem to find a common cause leading to it.

Google updates were once fairly predictable, with monthly updates that Webmasters anticipated with both joy and angst. Google followed a well published algorithm that gave each website a Page Rank, which is a number given to each webpage based on the number and rank of other web pages pointing to it. When someone searches on a term, all of the web pages deemed relevant are then ordered by their Page Rank.

Google uses a number of factors such as keyword density, page titles, meta tags, and header tags to determine which pages are relevant. This original algorithm favored incoming links and the anchor text of them. The more links you got with an anchor text, the better you ranked for that keyword.

As Google gained the bülk of internet searches in the early part of the decade, ranking well in their engine became highly coveted. Add to this the release of Google's Adsense program, and it became very lucrative. If a website could rank high for a popular keyword, they could run Google ads under Adsense and split the revenue with Google!

IN OTHER NEWS

The Curse of Keyword Density

To rank individual sites by their relevance to a user’s query, early search engines relied heavily on keywords – words that appeared in headings and within the site’s corpus (body of text).

These early spiders didn’t do much except count the number of times keywords appeared within the corpus. The higher the frequency, the more relevant the site. And thus was born SEO text.

Murphy’s Olde Time Canoes are all hand made. You won’t find a better hand-made canoe than a Murphy’s Olde Time Canoe. When it comes to hand-made canoes, Murphy’s Old Time Canoes deliver more canoe bang for your canoe buck.

Gibberish. Garbage text. Words written for a machine, not for humans. An SE bot that spidered Murphy’s website would have no trouble determining what the company sold. Unfortunately, to any literate carbon-based life form, this was hardly the stuff of which sales were made.

The Quality of SERPs
Search engines live or die based on the quality of their search results. That’s the whole raison d’etre for a search engine – to find the search engine results pages (SERPs) that list the most useful sites.

Google, Yahoo and the hundreds of other SEs spend a bundle tweaking the rating mechanisms used to assess and rank a site. These complex, algorithmic formulas are top secret, eyes only information. In fact, Google’s search algorithm is its most valuable asset. It’s the product it sells to all of us who use it everyday.

SE algorithms are constantly being tweaked. Weighting factors are added, deleted, increased or decreased with one goal in mind – to deliver the highest quality SERPs possible.

That means that sites providing useful, helpful information and other on-line amenities, like relevant links, are more valuable to SE users and, therefore, should rank higher than sites that don’t provide helpful data.

So What Do Search Engines Want Today?
With each tweak of the algorithm, spiders became more sophisticated at determining the usefulness and helpfulness of a given site. Oh sure, keyword counts still matters. So does the placement of keywords. But SEs are smart enough to detect SEO-keyword-stuffed-text. And they don’t much like it.

Instead, the sophisticated algorithms employed by Google and Yahoo (the two biggies) want more than keyword density. A lot more.

Keep It Fresh
Google spiders at least once every two weeks. Yahoo makes the rounds every 48 hours. When a spider visits, it takes a snapshot of the site. When you do a Google search, simply click on ‘cache’ to see the site as it was when last indexed.

Now, when that spider comes back a few days or weeks later, it compares the cache with the current site looking for new content. Search engines like new content. No new content, you may lose a few points in the PR race.

Now this doesn’t mean you have to update your site daily or weekly to maintain a nice PR. But update it now and then and you’ll keep your rank and maybe even move up a few pages.

 

What did you think of this article?




Trackbacks
Trackback specific URL for this entry
  • No trackbacks exist for this entry.
Comments
    Page: 1 of 1
    Page: 1 of 1
    Leave a comment

    Submitted comments will be subject to moderation before being displayed.

     Enter the above security code (required)

     Name (required)

     Email (will not be published) (required)

     Website

    Your comment is 0 characters limited to 3000 characters.