Thursday, May 15, 2008

General SEO information:

In the early days of Internet development, its users were a privileged minority and the amount of available information was relatively small. Access was mainly restricted to employees of various universities and laboratories who used it to access scientific information. In those days, the problem of finding information on the Internet was not nearly as critical as it is now.

Site directories were one of the first methods used to facilitate access to information resources on the network. Links to these resources were grouped by topic. Yahoo was the first project of this kind opened in April 1994. As the number of sites in the Yahoo directory inexorably increased, the developers of Yahoo made the directory searchable. Of course, it was not a search engine in its true form because searching was limited to those resources who’s listings were put into the directory. It did not actively seek out resources and the concept of seo was yet to arrive.

Such link directories have been used extensively in the past, but nowadays they have lost much of their popularity. The reason is simple – even modern directories with lots of resources only provide information on a tiny fraction of the Internet. For example, the largest directory on the network is currently DMOZ (or Open Directory Project). It contains information on about five million resources. Compare this with the Google search engine database containing more than eight billion documents.

The WebCrawler project started in 1994 and was the first full-featured search engine. The Lycos and AltaVista search engines appeared in 1995 and for many years Alta Vista was the major player in this field.

In 1997 Sergey Brin and Larry Page created Google as a research project at Stanford University. Google is now the most popular search engine in the world.

Currently, there are three leading international search engines – Google, Yahoo and MSN Search. They each have their own databases and search algorithms. Many other search engines use results originating from these three major search engines and the same seo expertise can be applied to all of them. For example, the AOL search engine (search.aol.com) uses the Google database while AltaVista, Lycos and AllTheWeb all use the Yahoo database.

Principles Of Search Engine:

To understand seo you need to be aware of the architecture of search engines. They all contain the following main components:

Spider - This program downloads web pages just like a web browser. The difference is that a browser displays the information presented on each page (text, graphics, etc.) while a spider does not have any visual components and works directly with the underlying HTML code of the page. You may already know that there is an option in standard web browsers to view source HTML code.

Crawler – This program finds all links on each page. Its task is to determine where the spider should go either by evaluating the links or according to a predefined list of addresses. The crawler follows these links and tries to find documents not already known to the search engine.

Indexer - This component parses each page and analyzes the various elements, such as text, headers, structural or stylistic features, special HTML tags, etc.

Database – This is the storage area for the data that the search engine downloads and analyzes. Sometimes it is called the index of the search engine.

Results engine – The results engine ranks pages. It determines which pages best match a user's query and in what order the pages should be listed. This is done according to the ranking algorithms of the search engine. It follows that page rank is a valuable and interesting property and any seo specialist is most interested in it when trying to improve his site search results. In this article, we will discuss the seo factors that influence page rank in some detail.

Web serverThe search engine web server usually contains a HTML page with an input field where the user can specify the search query he or she is interested in. The web server is also responsible for displaying search results to the user in the form of an HTML page.

Ranking Factors In Search Engine:

Several factors influence the position of a site in the search results. They can be divided into external and internal ranking factors. Internal ranking factors are those that are controlled by seo aware website owners (text, layout, etc.) and will be described next.

Amount of text on a page – A page consisting of just a few sentences is less likely to get to the top of a search engine list. Search engines favor sites that have a high information content. Generally, you should try to increase the text content of your site in the interest of seo. The optimum page size is 500-3000 words (or 2000 to 20,000 characters).

Number of keywords on a page – Keywords must be used at least three to four times in the page text. The upper limit depends on the overall page size – the larger the page, the more keyword repetitions can be made. Keyword phrases (word combinations consisting of several keywords) are worth a separate mention. The best seo results are observed when a keyword phrase is used several times in the text with all keywords in the phrase arranged in exactly the same order. In addition, all of the words from the phrase should be used separately several times in the remaining text. There should also be some difference (dispersion) in the number of entries for each of these repeated words.

Let us take an example. Suppose we optimize a page for the phrase "seo software” (one of our seo keywords for this site) It would be good to use the phrase “seo software” in the text 10 times, the word “seo” 7 times elsewhere in the text and the word “software” 5 times. The numbers here are for illustration only, but they show the general seo idea quite well.

Keyword density and seo – Keyword page density is a measure of the relative frequency of the word in the text expressed as a percentage. For example, if a specific word is used 5 times on a page containing 100 words, the keyword density is 5%. If the density of a keyword is too low, the search engine will not pay much attention to it. If the density is too high, the search engine may activate its spam filter. If this happens, the page will be penalized and its position in search listings will be deliberately lowered.

The optimum value for keyword density is 5-7%. In the case of keyword phrases, you should calculate the total density of each of the individual keywords comprising the phrases to make sure it is within the specified limits. In practice, a keyword density of more than 7-8% does not seem to have any negative seo consequences. However, it is not necessary and can reduce the legibility of the content from a user’s viewpoint.


Description Meta tag – This is used to specify page descriptions. It does not influence the seo ranking process but it is very important. A lot of search engines (including the largest one – Google) display information from this tag in their search results if this tag is present on a page and if its content matches the content of the page and the search query.

Experience has shown that a high position in search results does not always guarantee large numbers of visitors. For example, if your competitors' search result description is more attractive than the one for your site then search engine users may choose their resource instead of yours. That is why it is important that your Description Meta tag text be brief, but informative and attractive. It must also contain keywords appropriate to the page.

External Ranking Factors:

Why inbound links to sites are taken into account - As you can see from the previous section, many factors influencing the ranking process are under the control of webmasters. If these were the only factors then it would be impossible for search engines to distinguish between a genuine high-quality document and a page created specifically to achieve high search ranking but containing no useful information. For this reason, an analysis of inbound links to the page being evaluated is one of the key factors in page ranking. This is the only factor that is not controlled by the site owner.

It makes sense to assume that interesting sites will have more inbound links. This is because owners of other sites on the Internet will tend to have published links to a site if they think it is a worthwhile resource. The search engine will use this inbound link criterion in its evaluation of document significance.

Therefore, two main factors influence how pages are stored by the search engine and sorted for display in search results:

* Relevance, as described in the previous section on internal ranking factors.

* Number and quality of inbound links, also known as link citation, link popularity or citation index. This will be described in the next section.



Link importance (citation index, link popularity) - You can easily see that simply counting the number of inbound links does not give us enough information to evaluate a site. It is obvious that a link from www.microsoft.com should mean much more than a link from some homepage like www.hostingcompany.com/~myhomepage.html. You have to take into account link importance as well as number of links.

Search engines use the notion of citation index to evaluate the number and quality of inbound links to a site. Citation index is a numeric estimate of the popularity of a resource expressed as an absolute value representing page importance. Each search engine uses its own algorithms to estimate a page citation index. As a rule, these values are not published.

As well as the absolute citation index value, a scaled citation index is sometimes used. This relative value indicates the popularity of a page relative to the popularity of other pages on the Internet. You will find a detailed description of citation indexes and the algorithms used for their estimation in the next sections.



Google PageRank – Theoretical Basics - The Google company was the first company to patent the system of taking into account inbound links. The algorithm was named PageRank. In this section, we will describe this algorithm and how it can influence search result ranking.

PageRank is estimated separately for each web page and is determined by the PageRank (citation) of other pages referring to it. It is a kind of “virtuous circle.” The main task is to find the criterion that determines page importance. In the case of PageRank, it is the possible frequency of visits to a page.

I shall now describe how user’s behavior when following links to surf the network is modeled. It is assumed that the user starts viewing sites from some random page. Then he or she follows links to other web resources. There is always a possibility that the user may leave a site without following any outbound link and start viewing documents from a random page. The PageRank algorithm estimates the probability of this event as 0.15 at each step. The probability that our user continues surfing by following one of the links available on the current page is therefore 0.85, assuming that all links are equal in this case. If he or she continues surfing indefinitely, popular pages will be visited many more times than the less popular pages.

The PageRank of a specified web page is thus defined as the probability that a user may visit the web page. It follows that, the sum of probabilities for all existing web pages is exactly one because the user is assumed to be visiting at least one Internet page at any given moment.

Since it is not always convenient to work with these probabilities the PageRank can be mathematically transformed into a more easily understood number for viewing. For instance, we are used to seeing a PageRank number between zero and ten on the Google Toolbar.

According to the ranking model described above:
* Each page on the Net (even if there are no inbound links to it) initially has a PageRank greater than zero, although it will be very small. There is a tiny chance that a user may accidentally navigate to it.
* Each page that has outbound links distributes part of its PageRank to the referenced page. The PageRank contributed to these linked-to pages is inversely proportional to the total number of links on the linked-from page – the more links it has, the lower the PageRank allocated to each linked-to page.
* PageRank A “damping factor” is applied to this process so that the total distributed page rank is reduced by 15%. This is equivalent to the probability, described above, that the user will not visit any of the linked-to pages but will navigate to an unrelated website.

Let us now see how this PageRank process might influence the process of ranking search results. We say “might” because the pure PageRank algorithm just described has not been used in the Google algorithm for quite a while now. We will discuss a more current and sophisticated version shortly. There is nothing difficult about the PageRank influence – after the search engine finds a number of relevant documents (using internal text criteria), they can be sorted according to the PageRank since it would be logical to suppose that a document having a larger number of high-quality inbound links contains the most valuable information.

Thus, the PageRank algorithm "pushes up" those documents that are most popular outside the search engine as well.


Google PageRank – Practical Use - Currently, PageRank is not used directly in the Google algorithm. This is to be expected since pure PageRank characterizes only the number and the quality of inbound links to a site, but it completely ignores the text of links and the information content of referring pages. These factors are important in page ranking and they are taken into account in later versions of the algorithm. It is thought that the current Google ranking algorithm ranks pages according to thematic PageRank. In other words, it emphasizes the importance of links from pages with content related by similar topics or themes. The exact details of this algorithm are known only to Google developers.

You can determine the PageRank value for any web page with the help of the Google ToolBar that shows a PageRank value within the range from 0 to 10. It should be noted that the Google ToolBar does not show the exact PageRank probability value, but the PageRank range a particular site is in. Each range (from 0 to 10) is defined according to a logarithmic scale.

Here is an example: each page has a real PageRank value known only to Google. To derive a displayed PageRank range for their ToolBar, they use a logarithmic scale as shown in this table
Real PR == ToolBar PR

1-10 == 1
10-100 == 2
100-1000 == 3
1000-10.000 == 4


This shows that the PageRank ranges displayed on the Google ToolBar are not all equal. It is easy, for example, to increase PageRank from one to two, while it is much more difficult to increase it from six to seven.

In practice, PageRank is mainly used for two purposes:

1. Quick check of the sites popularity. PageRank does not give exact information about referring pages, but it allows you to quickly and easily get a feel for the sites popularity level and to follow trends that may result from your seo work. You can use the following “Rule of thumb” measures for English language sites: PR 4-5 is typical for most sites with average popularity. PR 6 indicates a very popular site while PR 7 is almost unreachable for a regular webmaster. You should congratulate yourself if you manage to achieve it. PR 8, 9, 10 can only be achieved by the sites of large companies such as Microsoft, Google, etc. PageRank is also useful when exchanging links and in similar situations. You can compare the quality of the pages offered in the exchange with pages from your own site to decide if the exchange should be accepted.

2. Evaluation of the competitiveness level for a search query is a vital part of seo work. Although PageRank is not used directly in the ranking algorithms, it allows you to indirectly evaluate relative site competitiveness for a particular query. For example, if the search engine displays sites with PageRank 6-7 in the top search results, a site with PageRank 4 is not likely to get to the top of the results list using the same search query.

It is important to recognize that the PageRank values displayed on the Google ToolBar are recalculated only occasionally (every few months) so the Google ToolBar displays somewhat outdated information. This means that the Google search engine tracks changes in inbound links much faster than these changes are reflected on the Google ToolBar.