PageRank
is a numeric value that represents how important a page is on the web. Google
figures that when one page links to another page, it is effectively casting a
vote for the other page. The more votes that are cast for a page, the more
important the page must be. Also, the importance of the page that is casting
the vote determines how important the vote itself is. Google calculates a
page's importance from the votes cast for it. How important each vote is is
taken into account when a page's PageRank is calculated.
PageRank
is Google's way of deciding a page's importance. It matters because it is one
of the factors that determines a page's ranking in the search results. It isn't
the only factor that Google uses to rank pages, but it is an important one.
From
here on in, we'll occasionally refer to PageRank as "PR".
Notes:
Not all links are counted by Google. For instance, they filter out links from
known link farms. Some links can cause a site to be penalized by Google. They
rightly figure that webmasters cannot control which sites link to their sites,
but they can control which sites they link out to. For this reason,
links into a site cannot harm the site, but links from a site can be harmful if
they link to penalized sites. So be careful which sites you link to. If a site
has PR0, it is usually a penalty, and it would be unwise to link to it.
[TOP]
![]()
To
calculate the PageRank for a page, all of its inbound links are taken into
account. These are links from within the site and links from outside the site.
PR(A)
= (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn))
That's
the equation that calculates a page's PageRank. It's the original one that was
published when PageRank was being developed, and it is probable that Google
uses a variation of it but they aren't telling us what it is. It doesn't matter
though, as this equation is good enough.
In
the equation 't1 - tn' are pages linking to page A, 'C' is the number of
outbound links that a page has and 'd' is a damping factor, usually set to
0.85.
We
can think of it in a simpler way:-
a
page's PageRank = 0.15 + 0.85 * (a "share" of the PageRank of every
page that links to it)
"share"
= the linking page's PageRank divided by the number of outbound links on the
page.
A
page "votes" an amount of PageRank onto each page that it links to.
The amount of PageRank that it has to vote with is a little less than its own
PageRank value (its own value * 0.85). This value is shared equally between all
the pages that it links to.
From
this, we could conclude that a link from a page with PR4 and 5 outbound links
is worth more than a link from a page with PR8 and 100 outbound links. The
PageRank of a page that links to yours is important but the number of links on
that page is also important. The more links there are on a page, the less
PageRank value your page will receive from it.
If
the PageRank value differences between PR1, PR2,.....PR10 were equal then that
conclusion would hold up, but many people believe that the values between PR1
and PR10 (the maximum) are set on a logarithmic scale, and there is very good
reason for believing it. Nobody outside Google knows for sure one way or the
other, but the chances are high that the scale is logarithmic, or similar. If
so, it means that it takes a lot more additional PageRank for a page to move up
to the next PageRank level that it did to move up from the previous PageRank
level. The result is that it reverses the previous conclusion, so that a link
from a PR8 page that has lots of outbound links is worth more than a link from
a PR4 page that has only a few outbound links.
Whichever
scale Google uses, we can be sure of one thing. A link from another site
increases our site's PageRank. Just remember to avoid links from link farms.
Note
that when a page votes its PageRank value to other pages, its own PageRank is
not reduced by the value that it is voting. The page doing the voting doesn't
give away its PageRank and end up with nothing. It isn't a transfer of
PageRank. It is simply a vote according to the page's PageRank value. It's like
a shareholders meeting where each shareholder votes according to the number of
shares held, but the shares themselves aren't given away. Even so, pages do
lose some PageRank indirectly, as we'll see later.
Ok
so far? Good. Now we'll look at how the calculations are actually done.
For
a page's calculation, its existing PageRank (if it has any) is abandoned
completely and a fresh calculation is done where the page relies solely on the
PageRank "voted" for it by its current inbound links, which may have
changed since the last time the page's PageRank was calculated.
The
equation shows clearly how a page's PageRank is arrived at. But what isn't
immediately obvious is that it can't work if the calculation is done just once.
Suppose we have 2 pages, A and B, which link to each other, and neither have
any other links of any kind. This is what happens:-
Step
1: Calculate page A's PageRank from the value of its inbound links
Page
A now has a new PageRank value. The calculation used the value of the inbound
link from page B. But page B has an inbound link (from page A) and its new
PageRank value hasn't been worked out yet, so page A's new PageRank value is
based on inaccurate data and can't be accurate.
Step
2: Calculate page B's PageRank from the value of its inbound links
Page
B now has a new PageRank value, but it can't be accurate because the
calculation used the new PageRank value of the inbound link from page A, which
is inaccurate.
It's
a Catch 22 situation. We can't work out A's PageRank until we know B's
PageRank, and we can't work out B's PageRank until we know A's PageRank.
Now
that both pages have newly calculated PageRank values, can't we just run the
calculations again to arrive at accurate values? No. We can run the
calculations again using the new values and the results will be more accurate,
but we will always be using inaccurate values for the calculations, so the
results will always be inaccurate.
The
problem is overcome by repeating the calculations many times. Each time
produces slightly more accurate values. In fact, total accuracy can never be
achieved because the calculations are always based on inaccurate values. 40 to
50 iterations are sufficient to reach a point where any further iterations
wouldn't produce enough of a change to the values to matter. This is precisiely
what Google does at each update, and it's the reason why the updates take so
long.
One
thing to bear in mind is that the results we get from the calculations are proportions.
The figures must then be set against a scale (known only to Google) to arrive
at each page's actual PageRank. Even so, we can use the calculations to channel
the PageRank within a site around its pages so that certain pages receive a
higher proportion of it than others.
![]()
NOTE:
You may come across explanations of PageRank where the same equation is stated
but the result of each iteration of the calculation is added to the
page's existing PageRank. The new value (result + existing PageRank) is then
used when sharing PageRank with other pages. These explanations are wrong for
the following reasons:-
1. They quote the same, published equation - but then change it
from
PR(A) =
(1-d) + d(......) to PR(A) = PR(A) + (1-d) + d(......)
It
isn't correct, and it isn't necessary.
2. We will be looking at how to organize links so that certain pages
end up with a larger proportion of the PageRank than others. Adding to the
page's existing PageRank through the iterations produces different proportions
than when the equation is used as published. Since the addition is not a part
of the published equation, the results are wrong and the proportioning isn't
accurate.
According
to the published equation, the page being calculated starts from scratch at
each iteration. It relies solely on its inbound links. The 'add to the
existing PageRank' idea doesn't do that, so its results are necessarily wrong.
[TOP]
![]()
Fact: A website has a maximum
amount of PageRank that is distributed between its pages by internal links.
The
maximum PageRank in a site equals the number of pages in the site * 1. The
maximum is increased by inbound links from other sites and decreased by
outbound links to other sites. We are talking about the overall PageRank in the
site and not the PageRank of any individual page. You don't have to take my
word for it. You can reach the same conclusion by using a pencil and paper and
the equation.
Fact: The maximum amount of
PageRank in a site increases as the number of pages in the site increases.
The
more pages that a site has, the more PageRank it has. Again, by using a pencil
and paper and the equation, you can come to the same conclusion. Bear in mind
that the only pages that count are the ones that Google knows about.
Fact: By linking poorly, it
is possible to fail to reach the site's maximum PageRank, but it is not
possible to exceed it.
Poor
internal linkages can cause a site to fall short of its maximum but no kind of
internal link structure can cause a site to exceed it. The only way to increase
the maximum is to add more inbound links and/or increase the number of pages in
the site.
Cautions: Whilst I thoroughly
recommend creating and adding new pages to increase a site's total PageRank so
that it can be channeled to specific pages, there are certain types of pages
that should not be added. These are pages that are all identical or very
nearly identical and are known as cookie-cutters. Google considers them to be
spam and they can trigger an alarm that causes the pages, and possibly the
entire site, to be penalized. Pages full of good content are a must.
What can we do with this 'overall' PageRank?
We
are going to look at some example calculations to see how a site's PageRank can
be manipulated, but before doing that, I need to point out that a page will be
included in the Google index only if one or more pages on the web link
to it. That's according to Google. If a page is not in the Google index, any
links from it can't be included in the calculations.
For
the examples, we are going to ignore that fact, mainly because other 'Pagerank
Explained' type documents ignore it in the calculations, and it might be
confusing when comparing documents. The calculator
operates in two modes:- Simple and Real. In Simple mode, the calculations
assume that all pages are in the Google index, whether or not any other pages
link to them. In Real mode the calculations disregard unlinked-to pages. These
examples show the results as calculated in Simple mode. 
Let's
consider a 3 page site (pages A, B and C) with no links coming in from the
outside. We will allocate each page an initial PageRank of 1, although it makes
no difference whether we start each page with 1, 0 or 99. Apart from a few
millionths of a PageRank point, after many iterations the end result is always
the same. Starting with 1 requires fewer iterations for the PageRanks to
converge to a suitable result than when starting with 0 or any other number.
You may want to use a pencil and paper to follow this or you can follow it with
the calculator.
The
site's maximum PageRank is the amount of PageRank in the site. In this case, we
have 3 pages so the site's maximum is 3.
At
the moment, none of the pages link to any other pages and none link to them. If
you make the calculation once for each page, you'll find that each of them ends
up with a PageRank of 0.15. No matter how many iterations you run, each page's
PageRank remains at 0.15. The total PageRank in the site = 0.45, whereas it
could be 3. The site is seriously wasting most of its potential PageRank.
![]()
Example 1 
Now
begin again with each page being allocated PR1. Link page A to page B and run
the calculations for each page. We end up with:-
Page A = 0.15
Page B = 1
Page C = 0.15
Page
A has "voted" for page B and, as a result, page B's PageRank has
increased. This is looking good for page B, but it's only 1 iteration - we
haven't taken account of the Catch 22 situation. Look at what happens to the
figures after more iterations:-
After
100 iterations the figures are:-
Page A = 0.15
Page B = 0.2775
Page C = 0.15
It
still looks good for page B but nowhere near as good as it did. These figures
are more realistic. The total PageRank in the site is now 0.5775 - slightly
better but still only a fraction of what it could be.
NOTE:
Technically, these particular results are incorrect because of the special
treatment that Google gives to dangling links, but they
serve to demonstrate the simple calculation.
![]()
Example 2 
Try
this linkage. Link all pages to all pages. Each page starts with PR1 again.
This produces:-
Page A = 1
Page B = 1
Page C = 1
Now
we've achieved the maximum. No matter how many iterations are run, each page
always ends up with PR1. The same results occur by linking in a loop. E.g. A to
B, B to C and C to D. View
this in the calculator.
This
has demonstrated that, by poor linking, it is quite easy to waste PageRank and
by good linking, we can achieve a site's full potential. But we don't
particularly want all the site's pages to have an equal share. We want one or
more pages to have a larger share at the expense of others. The kinds of pages
that we might want to have the larger shares are the index page, hub pages and
pages that are optimized for certain search terms. We have only 3 pages, so
we'll channel the PageRank to the index page - page A. It will serve to show
the idea of channeling.
![]()
Example 3 
Now
try this. Link page A to both B and C. Also link pages B and C to A. Starting
with PR1 all round, after 1 iteration the results are:-
Page A = 1.85
Page B = 0.575
Page C = 0.575
and
after 100 iterations, the results are:-
Page A = 1.459459
Page B = 0.7702703
Page C = 0.7702703
In
both cases the total PageRank in the site is 3 (the maximum) so none is being
wasted. Also in both cases you can see that page A has a much larger proportion
of the PageRank than the other 2 pages. This is because pages B and C are
passing PageRank to A and not to any other pages. We have channeled a large
proportion of the site's PageRank to where we wanted it.
![]()
Example 4 
Finally,
keep the previous links and add a link from page C to page B. Start again with
PR1 all round. After 1 iteration:-
Page A = 1.425
Page B = 1
Page C = 0.575
By
comparison to the 1 iteration figures in the previous example, page A has lost
some PageRank, page B has gained some and page C stayed the same. Page C now
shares its "vote" between A and B. Previously A received all of it.
That's why page A has lost out and why page B has gained. and after 100
iterations:-
Page A = 1.298245
Page B = 0.9999999
Page C = 0.7017543
When
the dust has settled, page C has lost a little PageRank because, having now
shared its vote between A and B, instead of giving it all to A, A has less to
give to C in the A-->C link. So adding an extra link from a page causes the
page to lose PageRank indirectly if any of the pages that it links to
return the link. If the pages that it links to don't return the link, then no
PageRank loss would have occured. To make it more complicated, if the link is
returned even indirectly (via a page that links to a page that links to a page
etc), the page will lose a little PageRank. This isn't really important with internal
links, but it does matter when linking to pages outside the site.
![]()
Example 5: new pages
Adding
new pages to a site is an important way of increasing a site's total PageRank
because each new page will add an average of 1 to the total. Once the new pages
have been added, their new PageRank can be channeled to the important pages.
We'll use the calculator to demonstrate these.
Let's
add 3 new pages to Example 3 [view].
Three new pages but they don't do anything for us yet. The small increase in
the Total, and the new pages' 0.15, are unrealistic as we shall see. So let's
link them into the site.
Link
each of the new pages to the important page, page A [view].
Notice that the Total PageRank has doubled, from 3 (without the new pages) to
6. Notice also that page A's PageRank has almost doubled.
There
is one thing wrong with this model. The new pages are orphans. They wouldn't
get into Google's index, so they wouldn't add any PageRank to the site and they
wouldn't pass any PageRank to page A. They each need to be linked to from at
least one other page. If page A is the important page, the best page to put the
links on is, surprisingly, page A [view].
You can play around with the links but, from page A's point of view, there
isn't a better place for them.
It
is not a good idea for one page to link to a large number of pages so, if you
are adding many new pages, spread the links around. The chances are that there
is more than one important page in a site, so it is usually suitable to spread
the links to and from the new pages. You can use the calculator to experiment
with mini-models of a site to find the best links that produce the best results
for its important pages.
![]()
Examples summary
You
can see that, by organising the internal links, it is possible to channel a
site's PageRank to selected pages. Internal links can be arranged to suit a
site's PageRank needs, but it is only useful if Google knows about the pages,
so do try to ensure that Google spiders them.
![]()
Inbound and Outbound links
Examples
of these could be given but it is probably clearer to read about them (below)
and to 'play' with them in the calculator.
![]()
Questions
When
a page has several links to another page, are all the links counted?
E.g.
if page A links once to page B and 3 times to page C, does page C receive 3/4
of page A's shareable PageRank?
The
PageRank concept is that a page casts votes for one or more other pages.
Nothing is said in the original PageRank document about a page casting more
than one vote for a single page. The idea seems to be against the PageRank
concept and would certainly be open to manipulation by unrealistically
proportioning votes for target pages. E.g. if an outbound link, or a link to an
unimportant page, is necessary, add a bunch of links to an important page to
minimize the effect.
Since
we are unlikely to get a definitive answer from Google, it is reasonable to
assume that a page can cast only one vote for another page, and that additional
votes for the same page are not counted.
When
a page links to itself, is the link counted?
Again,
the concept is that pages cast votes for other pages. Nothing is said in the
original document about pages casting votes for themselves. The idea seems to
be against the concept and, also, it would be another way to manipulate the
results. So, for those reasons, it is reasonable to assume that a page can't
vote for itself, and that such links are not counted.
[TOP]
![]()
"Dangling links are simply links that point
to any page with no outgoing links. They affect the model because it is not
clear where their weight should be distributed, and there are a large number of
them. Often these dangling links are simply pages that we have not downloaded
yet..........Because dangling links do not affect the ranking of any other page
directly, we simply remove them from the system until all the PageRanks are
calculated. After all the PageRanks are calculated they can be added back in
without affecting things significantly." - extract from the original PageRank
paper by Google’s founders, Sergey Brin and Lawrence Page.
A
dangling link is a link to a page that has no links going from it, or a link to
a page that Google hasn't indexed. In both cases Google removes the links
shortly after the start of the calculations and reinstates them shortly before
the calculations are finished. In this way, their effect on the PageRank of
other pages in minimal.
The
results shown in Example 1 (right diag.) are wrong because page B has no links
going from it, and so the link from page A to page B is dangling and would be
removed from the calculations. The results of the calculations would show all
three pages as having 0.15.
It
may suit site functionality to link to pages that have no links going from them
without losing any PageRank from the other pages but it would be waste of
potential PageRank. Take a look at this example.
The site's potential is 5 because it has 5 pages, but without page E linked in,
the site only has 4.15.
Link
page A to page E and click Calculate. Notice that the site's total has gone down very
significantly. But, because the new link is dangling and would be removed from
the calculations, we can ignore the new total and assume the previous 4.15 to
be true. That's the effect of functionally useful, dangling links in the site.
There's no overall PageRank loss.
However,
some of the site's potential total is still being wasted, so link Page E back
to Page A and click Calculate. Now we have the maximum PageRank that is possible with 5 pages.
Nothing is being wasted.
Although
it may be functionally good to link to pages within the site without those
pages linking out again, it is bad for PageRank. It is pointless wasting
PageRank unnecessarily, so always make sure that every page in the site links
out to at least one other page in the site.
[TOP]
![]()
Inbound
links (links into the site from the outside) are one way to increase a site's
total PageRank. The other is to add more pages. Where the links come from
doesn't matter. Google recognizes that a webmaster has no control over other
sites linking into a site, and so sites are not penalized because of where the
links come from. There is an exception to this rule but it is rare and doesn't
concern this article. It isn't something that a webmaster can accidentally do.
The
linking page's PageRank is important, but so is the number of links going from
that page. For instance, if you are the only link from a page that has a lowly
PR2, you will receive an injection of 0.15 + 0.85(2/1) = 1.85 into your site,
whereas a link from a PR8 page that has another 99 links from it will increase
your site's PageRank by 0.15 + 0.85(7/100) = 0.2095. Clearly, the PR2 link is
much better - or is it? See here for a probable
reason why this is not the case.
Once
the PageRank is injected into your site, the calculations are done again and
each page's PageRank is changed. Depending on the internal link structure, some
pages' PageRank is increased, some are unchanged but no pages lose any
PageRank.
It
is beneficial to have the inbound links coming to the pages to which you are
channeling your PageRank. A PageRank injection to any other page will be spread
around the site through the internal links. The important pages will receive an
increase, but not as much of an increase as when they are linked to directly.
The page that receives the inbound link, makes the biggest gain.
It
is easy to think of our site as being a small, self-contained network of pages.
When we do the PageRank calculations we are dealing with our small network. If
we make a link to another site, we lose some of our network's PageRank, and if
we receive a link, our network's PageRank is added to. But it isn't like that.
For the PageRank calculations, there is only one network - every page that
Google has in its index. Each iteration of the calculation is done on the
entire network and not on individual websites.
Because
the entire network is interlinked, and every link and every page plays its part
in each iteration of the calculations, it is impossible for us to calculate the
effect of inbound links to our site with any realistic accuracy.
[TOP]
![]()
Outbound
links are a drain on a site's total PageRank. They leak PageRank. To counter
the drain, try to ensure that the links are reciprocated. Because of the
PageRank of the pages at each end of an external link, and the number of links
out from those pages, reciprocal links can gain or lose PageRank. You need to
take care when choosing where to exchange links.
When
PageRank leaks from a site via a link to another site, all the pages in the
internal link structure are affected. (This doesn't always show after just 1
iteration). The page that you link out from makes a difference to which pages
suffer the most loss. Without a program to perform the calculations on specific
link structures, it is difficult to decide on the right page to link out from,
but the generalization is to link from the one with the lowest PageRank.
Many
websites need to contain some outbound links that are nothing to do with
PageRank. Unfortunately, all 'normal' outbound links leak PageRank. But there
are 'abnormal' ways of linking to other sites that don't result in leaks.
PageRank is leaked when Google recognizes a link to another site. The answer is
to use links that Google doesn't recognize or count. These include form actions
and links contained in javascript code.
Form actions
A form's 'action' attribute does not need to be the url of a form parsing script.
It can point to any html page on any site. Try it.
Example:
<form
name="myform"
action="http://www.domain.com/somepage.html">
<a
href="javascript:document.myform.submit()">Click here</a>
To
be really sneaky, the action attribute could be in some javascript code rather
than in the form tag, and the javascript code could be loaded from a 'js' file
stored in a directory that is barred to Google's spider by the robots.txt file.
Javascript
Example: <a
href="javascript:goto('wherever')">Click here</a>
Like
the form action, it is sneaky to load the javascript code, which contains the
urls, from a seperate 'js' file, and sneakier still if the file is stored in a
directory that is barred to googlebot by the robots.txt file.
The "rel" attribute
As of 18th January 2005, Google, together with other search engines, is
recognising a new attribute to the anchor tag. The attribute is
"rel", and it is used as follows:-
<a
href="http://www.domain.com/somepage.html"
rel="nofollow">link text</a>
The attribute tells Google to ignore the link completely. The link won't help
the target page's PageRank, and it won't help its rankings. It is as though the
link doesn't exist. With this attribute, there is no longer any need for
javascript, forms, or any other method of hiding links from Google.
[TOP]
![]()
First, let
me explain in more detail why the values shown in the Google toolbar are not
the actual PageRank figures. According to the equation, and to the creators of
Google, the billions of pages on the web average out to a PageRank of 1.0 per
page. So the total PageRank on the web is equal to the number of pages on the
web * 1, which equals a lot of PageRank spread around the web.
The
Google toolbar range is from 1 to 10. (They sometimes show 0, but that figure
isn't believed to be a PageRank calculation result). What Google does is divide
the full range of actual PageRanks on the web into 10 parts - each part
is represented by a value as shown in the toolbar. So the toolbar values only
show what part of the overall range a page's PageRank is in, and not the actual
PageRank itself. The numbers in the toolbar are just labels.
Whether
or not the overall range is divided into 10 equal parts is a matter for debate
- Google aren't saying. But because it is much harder to move up a toolbar
point at the higher end than it is at the lower end, many people (including me)
believe that the divisions are based on a logarithmic scale, or something very
similar, rather than the equal divisions of a linear scale.
Let's
assume that it is a logarithmic, base 10 scale, and that it takes 10 properly
linked new pages to move a site's important page up 1 toolbar point. It will
take 100 new pages to move it up another point, 1000 new pages to move it up
one more, 10,000 to the next, and so on. That's why moving up at the lower end
is much easier that at the higher end.
In
reality, the base is unlikely to be 10. Some people think it is around the 5 or
6 mark, and maybe even less. Even so, it still gets progressively harder to move
up a toolbar point at the higher end of the scale.
Note
that as the number of pages on the web increases, so does the total PageRank on
the web, and as the total PageRank increases, the positions of the divisions in
the overall scale must change. As a result, some pages drop a toolbar point for
no 'apparent' reason. If the page's actual PageRank was only just above a
division in the scale, the addition of new pages to the web would cause the
division to move up slightly and the page would end up just below the division.
Google's index is always increasing and they re-evaluate each of the pages on
more or less a monthly basis. It's known as the "Google dance". When
the dance is over, some pages will have dropped a toolbar point. A number of
new pages might be all that is needed to get the point back after the next
dance.
The
toolbar value is a good indicator of a page's PageRank but it only indicates
that a page is in a certain range of the overall scale. One PR5 page could be
just above the PR5 division and another PR5 page could be just below the PR6
division - almost a whole division (toolbar point) between them.
[TOP]
![]()
Domain
names and Filenames
To
a spider, www.domain.com/, domain.com/, www.domain.com/index.html and domain.com/index.html are different urls and,
therefore, different pages. Surfers arrive at the site's home page whichever of
the urls are used, but spiders see them as individual urls, and it makes a
difference when working out the PageRank. It is better to standardize the url
you use for the site's home page. Otherwise each url can end up with a
different PageRank, whereas all of it should have gone to just one url.
If
you think about it, how can a spider know the filename of the page that it gets
back when requesting www.domain.com/ ? It can't. The filename could be index.html, index.htm,
index.php, default.html, etc. The spider doesn't know. If you link to
index.html within the site, the spider could compare the 2 pages but that seems
unlikely. So they are 2 urls and each receives PageRank from inbound links.
Standardizing the home page's url ensures that the Pagerank it is due isn't
shared with ghost urls.
Example: Go to my UK Holidays and UK Holiday
Accoommodation site - how's that for a nice piece of link text ;). Notice
that the url in the browser's address bar contains "www.". If you
have the Google Toolbar installed, you will see that the page has PR5. Now
remove the "www." part of the url and get the page again. This time
it has PR1, and yet they are the same page. Actually, the PageRank is for the
unseen frameset page.
When
this article was first written, the non-www URL had PR4 due to using different
versions of the link URLs within the site. It had the effect of sharing the
page's PageRank between the 2 pages (the 2 versions) and, therefore, between
the 2 sites. That's not the best way to do it. Since then, I've tidied up the
internal linkages and got the non-www version down to PR1 so that the PageRank
within the site mostly stays in the "www." version, but there must be
a site somewhere that links to it without the "www." that's causing
the PR1.
Imagine
the page, www.domain.com/index.html. The index page
contains links to several relative urls; e.g. products.html and details.html. The spider sees those
urls as www.domain.com/products.html and www.domain.com/details.html. Now let's add an
absolute url for another page, only this time we'll leave out the
"www." part - domain.com/anotherpage.html. This page links back to the index.html page,
so the spider sees the index pages as domain.com/index.html. Although it's the same
index page as the first one, to a spider, it is a different page because it's
on a different domain. Now look what happens. Each of the relative urls on the
index page is also different because it belongs to the domain.com/ domain. Consequently,
the link stucture is wasting a site's potential PageRank by spreading it
between ghost pages.
![]()
Adding new pages
There
is a possible negative effect of adding new pages. Take a perfectly normal
site. It has some inbound links from other sites and its pages have some
PageRank. Then a new page is added to the site and is linked to from one or
more of the existing pages. The new page will, of course, aquire PageRank from the
site's existing pages. The effect is that, whilst the total PageRank in the
site is increased, one or more of the existing pages will suffer a PageRank
loss due to the new page making gains. Up to a point, the more new pages that
are added, the greater is the loss to the existing pages. With large sites,
this effect is unlikely to be noticed but, with smaller ones, it probably
would.
So,
although adding new pages does increase the total PageRank within the site,
some of the site's pages will lose PageRank as a result. The answer is to link
new pages is such a way within the site that the important pages don't suffer,
or add sufficient new pages to make up for the effect (that can sometimes mean
adding a large number of new pages), or better still, get some more inbound
links.
[TOP]
![]()
The Google toolbar
If you have the Google toolbar installed in your browser, you will be used to
seeing each page's PageRank as you browse the web. But all isn't always as it
seems. Many pages that Google displays the PageRank for haven't been indexed in
Google and certainly don't have any PageRank in their own right. What is
happening is that one or more pages on the site have been indexed and a
PageRank has been calculated. The PageRank figure for the site's pages that
haven't been indexed is allocated on the fly - just for your toolbar. The
PageRank itself doesn't exist.
It's
important to know this so that you can avoid exchanging links with pages that
really don't have any PageRank of their own. Before making exchanges, search
for the page on Google to make sure that it is indexed.
Sub-directories
Some people believe that Google drops a page's PageRank by a value of 1 for
each sub-directory level below the root directory. E.g. if the value of pages
in the root directory is generally around 4, then pages in the next directory
level down will be generally around 3, and so on down the levels. Other people
(including me) don't accept that at all. Either way, because some spiders tend
to avoid deep sub-directories, it is generally considered to be beneficial to
keep directory structures shallow (directories one or two levels below the
root).
ODP and Yahoo!
It used to be thought that Google gave a Pagerank boost to sites that are
listed in the Yahoo! and ODP (a.k.a. DMOZ) directories, but these days general
opinion is that they don't. There is certainly a PageRank gain for sites that
are listed in those directories, but the reason for it is now thought to be
this:-
Google
spiders the directories just like any other site and their pages have decent
PageRank and so they are good inbound links to have. In the case of the ODP,
Google's directory is a copy of the ODP directory. Each time that sites are
added and dropped from the ODP, they are added and dropped from Google's
directory when they next update it. The entry in Google's directory is yet
another good, PageRank boosting, inbound link. Also, the ODP data is used for
searches on a myriad of websites - more inbound links!
Listings
in the ODP are free but, because sites are reviewed by hand, it can take quite
a long time to get in. The sooner a working site is submitted, the better. For
tips on submitting to DMOZ, see this this DMOZ article.
Further information and resources
·
Another PageRank Explained article (by Ian Rogers): Click here
· Internet marketing articles, tips, tricks and secrets: Click here