View Full Version : Google-friendly URLs, please
ThatAdamGuy
4th of January 2004 (Sun), 04:17
This is also something that greatly interests me!
It seems that Google is not fond of pages that have too many variables, and -- oh my goodness -- though I love the look of EE's pages (simple, clean, friendly!), the URLs are just downright scary.
Like this one:
link (http://www.photography-on-the.net/gallery/photo.php?photo=461&exhibition=7&offset=31〈=eng&va rlist=a%3A15%3A%7Bi%3A0%3Bs%3A1%3A%220%22%3Bi%3A1% 3Bs%3A31%3A%22ee_order_to_exhibition.ee_order%22%3 Bi%3A2%3Bs%3A4%3A%22DESC%22%3Bi%3A3%3Bs%3A20%3A%22 ee_photo.ee_photo_id%22%3Bi%3A4%3Bs%3A0%3A%22%22%3 Bi%3A5%3Bs%3A1%3A%221%22%3Bi%3A6%3Bs%3A6%3A%22thum bs%22%3Bi%3A7%3Bi%3A30%3Bi%3A8%3Bs%3A6%3A%22public %22%3Bi%3A9%3Bs%3A7%3A%22default%22%3Bi%3A10%3Bs%3 A1%3A%227%22%3Bi%3A11%3Bs%3A2%3A%2210%22%3Bi%3A12% 3Bs%3A1%3A%220%22%3Bi%3A13%3Bs%3A1%3A%220%22%3Bi%3 A14%3Bs%3A1%3A%220%22%3B%7D〈=eng)
No search engine is going to index pages with URL's like that :(.
I did notice that, accepting defaults, this URL seem to work just as well:
http://www.photography-on-the.net/gallery/photo.php?photo=461
Is it possible to have simpler URLs like this in the current EE beta, or is this something that you would consider? At minimum, I'd really like to see some things (like lang=eng) simply omitted from standard URLs when they are a site's default.
I've been quite sincerely stunned by the quality of the photos I've seen featured by you and other users of EE, and it seems a shame that much of this is 'lost' to the Net, given folks' oft-reliance on Google and other search engines for finding things.
Thank you much for your time, and best of the New Year to you and yours!
ThatAdamGuy
4th of January 2004 (Sun), 04:17
Oops, I meant to post this as a reply to a similar note; apologies for making it a new separate topic!
Pekka
4th of January 2004 (Sun), 07:29
The URL you wrote holds actually only 6 parameters (one is a "package").
Of course I have thought about this. The whole thing is result of avoiding cookies.
There are few ways to get over this problem.
1. Hack in http://www.zend.com/zend/spotlight/searchengine.php which has several drawbacks and does not always work.
2. Use cookies, which will force people use cookies in order to keep their searches etc while browsing gallery
3. PHP Sessions, which are dependent on PHP installation and setup and use a cookie anyway.
4. database-based session system, which will work fine on all systems but will need one cookie, too.
5. a parameter storage system which will hold each different URL parameter string and substitute them with just one ID on the fly. This would add maybe couple of hundred kilos more data in db, but would not need any cookies.
6. build a new transparent parameter passing method using custom http headers.
All above are not quick solutions, they will require time to code and test.
P.S: Google may change their settings any day and all work to comply to their current "rules" (which no one really knows) can be wasted time.
ThatAdamGuy
4th of January 2004 (Sun), 07:42
Ah, it is more complicated than I suspected.
However, I'd like to respectfully urge you to consider whatever the easiest-path-of-resistance is via the use of cookies.
I know that some people are not happy with them, but I think it's fair to ask these worried people to then give up SOME functionality when viewing picture galleries. Besides, I think those of us who are apt to implement EE are the more responsible sort. I know I have a privacy policy prominently posted (wow, say that 10 times fast!) on my site, and I think that my site visitors are very comfortable with my limited use of cookies.
It's a trade-off. Some people (I think very few) will be offended or annoyed by the use of cookies if they're implemented into EE. But many, many more people will not find wonderful EE-based galleries if the internal pages are obscured from Google and other search engines.
And indeed, other search engines don't tend to like long URLs, either (and as someone who hangs out on Search Engine Optimization forums, I read about this a lot!)
---
Incidentally, I just did a little test to see how many 'internal' EE pages (individual photo pages) are in the Google index. Of the 10 pages I tested, 3 were found. That's actually better than I thought, and it's likely a factor of those pages Google PageRank (the higher the PR, the more Google is willing to push deeper into sites).
But I still highly vote for Friendly URLs not just for the search engines, but also for the many folks who -- not noticing the shorter URL options at the bottom of the pages -- simply try to cut and paste the URL showing in their browser address window, and then find that it's not easily e-mailable to friends.
Anyway, thank you for your quick response to my note here and in my other threads, too!
Pekka
4th of January 2004 (Sun), 08:00
ThatAdamGuy wrote:
But I still highly vote for Friendly URLs not just for the search engines, but also for the many folks who -- not noticing the shorter URL options at the bottom of the pages -- simply try to cut and paste the URL showing in their browser address window, and then find that it's not easily e-mailable to friends.
The advantage of that long URL is that it holds the whole status of current "moment" of browsing from image size and listing type to search/sorting etc.
ThatAdamGuy
5th of January 2004 (Mon), 04:27
Yes, I now understand the purpose of the long URLs (though they seem particularly long and cryptic, even for the info contained!), but I'm suggesting that the webmasters AND users would be far better served by having this info in cookies.
People are used to -- and largely accepting -- of cookies nowadays. Heck, even this board (and practically every other board out there) uses cookies so we don't have to re-sign in every time :). Cookies would also be handy with EE so that people wouldn't have to retype their name and e-mail address when they make comments.
Lastly, I discovered yet one more reason why the long EE URLs is particularly unfortunate; it renders Google's AdSense program practically useless. AdSense, as you may already know, is a contextual text advertising system... it looks on a page to determine what the page is about, and then serves ads that are relevant to the person viewing that page. For instance, on my Lindy Hop page, there are ads for swing dancing -- and even often local swing dance events, based upon my IP address.
Eventually, I plan to rid my site of ALL graphical ads and simply have one unobtrusive Google AdSense ad block on each page in its place to help me pay for my server costs.
But with EE's currently long URLs, AdSense gets confused, thinking that every time it sees a page, it's new, so it queues it up in its line for determining a context. But the next visitor is likely to hit the same page via a different URL, so AdSense is stymied, and serves a relatively generic ad. That's bad for my visitors, and that's bad for my revenues.
Perhaps, as a compromise, you could have an option to enable or disable simple-URLs (with or without cookies, with or without "browsing moment" info).
Thanks for putting up with my long-windedness! :)
Pekka
5th of January 2004 (Mon), 06:04
I'll see how cookies can be used to get rid of complex URLs - one technical problem is that one EE page has several different sets of link parameters (variables) - it will need some work to make a cookie out of that all.
BenV
9th of January 2004 (Fri), 14:19
I'll see how cookies can be used to get rid of complex URLs - one technical problem is that one EE page has several different sets of link parameters (variables) - it will need some work to make a cookie out of that all.
Pekka:
Remember that you can still package everything up like you do now and stick in a cookie instead of the URL.
There are two problems here as I see it. One is search engine indexing: Google is great at indexing dynamic content but the AdSense issue is pretty interesting -- that's a pretty big issue to me.
I think a more important usability issue is URL persistence. IOW, the URLs should be technology agnostic, both now and in the future. You should be able to change the underlying technology and keep the URLs. So instead of:
http://www.photography-on-the.net/gallery/photo.php?photo=461
We have:
http://www.photography-on-the.net/gallery/photo/5/461/
i.e. show photo 461 in gallery 5
or
http://www.photography-on-the.net/gallery/exhibit/5/31/
i.e. show exhibition 5 starting at number 31
You could even have an optional user-provided identifier in each record which would make the URLs look like:
http://www.photography-on-the.net/gallery/photo/Family/MyWifeSmiling/
http://www.photography-on-the.net/gallery/exhibit/Family/31/
You get the idea. The browser doesn't ever care if the page is generated by PHP or PERL or bash or Python or . . .
The only thing PHP needs to know on the URL is enough information to generate the page in it's default state. Everything else can come from cookies or a MySQL session record or whatever.
Here's real simple code to handle the basics of processing the URL:
$a_URI = URI_to_array();
if ($a_URI[0] == 'photo')
{
// handle the showing of a photo
$gallery = $a_URI[1];
$photoid = $a_URI[2];
}
elseif ($a_URI[0] == 'exhibit')
{
// handle the showing of a gallery
$gallery = $a_URI[1];
$offset = $a_URI[2];
}
// etc.
// ----------------------------------------------------------------------------
// URI_to_array():
// Takes the query string and extracts the vars by splitting on the '/'
// Returns an array $array containing each "directory".
function URI_to_array()
{
$array = explode("/",$_SERVER['REQUEST_URI']);
// shift off the first element (i.e. the leading slash)
array_shift($array);
return $array;
}
Of course, that's a real simplistic example but you get the idea. At this point, I think cookies are basically ubiquitous.
The only other requirement would be a mod_rewrite entry in the .htaccess file to point everything to your main script:
# do our rewrite magic
RewriteEngine On
# these dirs are left alone and served as-is
RewriteRule ^images(.*)$ - [L]
RewriteRule ^whatever(.*)$ - [L]
# everything else gets handled by index.php
RewriteRule ^.* index.php
---
BenV
YMMV
ThatAdamGuy
9th of January 2004 (Fri), 15:23
http://www.photography-on-the.net/gallery/photo/5/461/
Yes, yes, yes!
This is great for a million reasons (for Webmasters, for search engines, for users, etc.)
And as a side benefit, it even would make AdSense work! :)
oishf7
9th of January 2004 (Fri), 16:54
The only other requirement would be a mod_rewrite entry in the .htaccess file to point everything to your main script:
I was going to suggest using Mod_Rewrite as well. Unfortunately, in order to work, it has to be enabled when doing your PHP compiling so that may cause some people to have to re-compile. That is not THAT big a trade-off for the benefits far outweigh the inconvenience.
My only other comment has to do with URL formatting.
You could even have an optional user-provided identifier in each record which would make the URLs look like:
http://www.photography-on-the.net/gallery/photo/Family/MyWifeSmiling/
http://www.photography-on-the.net/gallery/exhibit/Family/31/
Between the two options, allowing for Keywords to be included in the URL makes a BIG difference from an SEO perspective as you know. My recommendation would be to approach this in a manner that does allow for meaningful words to be included in the URL.
Thank you Pekka for all of your hard work. It is greatly appreciated.
Luke
Bcaps
9th of January 2004 (Fri), 23:00
Vbulletin gets around this issue by using an "archive". The archive is essentially a mirror of the main forum but without all the "?" and whatnot in the urls (I don't really understand the specifics). The archive then points to the actual posts. Google indexes the archive and then when someone goes to a particular archived post they are then forwarded to the actual post. Or something :D
ThatAdamGuy
10th of January 2004 (Sat), 06:34
I know, I know, I just get so excited about this topic, but I just thought I'd throw out one more 'bonus' from improved SEO-friendliness in URLs:
More EE sites will get seen... and this visibility will likely translate into more EE users. More EE users = more diverse and interesting interplay here on the boards, and hopefully more peer-to-peer support (which means Pekka can concentrate more on coding than on answering questions from newbies like me :D)
Pekka
10th of January 2004 (Sat), 15:36
Just some short comments.
- Google seems to accept 2 first variables in PHP URL's. This means for example on my site EVERY GALLERY PAGE is indexed, as you can see from
google image search results (http://images.google.com/images?hl=en&lr=&ie=UTF-8&oe=utf-8&q=+site%3Aphotography-on-the.net+pekka&sa=N&tab=wi)
Click image and it opens the gallery page. In addition to two parameter rule (which is there because more and more pages are dynamic and they can not ignore them all) I'm sure they have URL lenght restriction which is why that long parameter (even if there is only one) is rejected.
- Although "folderized" parameters sounds good, mod_rewrite can not be a requirement to a basic feature like addresses. Most users use rented virtual servers where adding htaccess may not be an option. Also to modify EE to work on single index.php is just not sensible right now. Doing several parallel address systems is really not top on my list. So, other solutions must be found.
- Doing lot of programming to make Google adSense work can not be the reason why this should be done, why should it be? Google can change their indexing behaviour any day. Effort should be done to make general search engine indexing position pages higher, not support a very specific commercial service. Finding solution to general problem will of course help adSense too and that is of course ok, I'm not anti-ad :)
- the most urgent changes and most effective regarding search engine positioning will be adding keywords to
TITLE TAG
H1 HEADER TAG
URL
above is the order of importance.
TITLE TAG:
This is what you see on browser top. It is easy to add here any data, as long it is not too long (most search engines have limits of data lenghts). I'll have keywords on title in next version, and will strip all repetitive unneccessary data like "Exhibition browser:".
H1 HEADER TAG:
Current EE displays photo header as paragraph. This will be corrected to H1 tag so search engines get the main content title right away.
URL:
Without doing any esoteric solutions to URL a keyword string could be added as a foobar anchor (http://www.example.com/photo.php?photo=12#boat_harbour_ship_cayman
But addition to that parameters must be reduced.
I have now built a model for a very fast and simple solution that does not require any cookies, does not require mod_rewrite stuff or any server/PHP settings, gives simple shareable urls with full snapshot of current browsing situation and simplifies URLs to http://www.example.com/gallery/photo.php?p=852&u=2593 (or just http://www.example.com/gallery/photo.php?u=2593) which will be accepted by Google engine, too.
Additional easy measure to improve search result placement will be changing all words "exhibition" to "gallery" because that is what people mostly search for. Word "exhibit" will remain only in EE application name. This can be done easily on translation level.
These search engine issues will be solved.
ThatAdamGuy
10th of January 2004 (Sat), 16:15
Pekka,
So much of this sounds really promising, and with the title tag and H1, I'm guessing not that difficult to add in given the potential punch! :)
I hope I'm not being a bother, but I am still curious to better understand the future URL format. Most importantly, I'd like to know:
1) How many distinct URLs can a single photo have under the new system and what is this dependent upon?
2) Assuming the answer is more than one, will there also be multiple "link to this photo" URLs listed as well, or just one?
3) Is the parameter data merely referencing a specific photo and a specific gallery number? Or is it something else?
Thanks so much for the info and for the upcoming changes!
Pekka
10th of January 2004 (Sat), 18:38
1) How many distinct URLs can a single photo have under the new system and what is this dependent upon?
All combinations you can get with variables in use. Depends on how people browse :wink:
For example these URLs may represent the same photo page with different "settings":
http://www.foo.com/gallery/photo.php?photo=123?u=236473
http://www.foo.com/gallery/photo.php?photo=123?u=6234
http://www.foo.com/gallery/photo.php?photo=123?u=34234
http://www.foo.com/gallery/photo.php?photo=123?u=12323
I can give you more detailed technical info when it is released.
2) Assuming the answer is more than one, will there also be multiple "link to this photo" URLs listed as well, or just one?
Easy link of course works. But any of those new type links can be used as well in forums etc. The new link holds a snapshot of the moment link was made - all settings from list page (keyword, sorting, output type, current image, language etc.) are reproduced when link is opened. Only thing that will not be included is pass. For other passes than "public" I will most likely add the database based session support which I now have in EE editor in place.
3) Is the parameter data merely referencing a specific photo and a specific gallery number? Or is it something else?
It is all.
ThatAdamGuy
10th of January 2004 (Sat), 19:19
Ah, this unfortunately does create some problems with Google. And yes, I know you can't (and shouldn't) do all your prioritizing based on Google :D
Much of what determines page ranking in Google is, as you know, "PageRank" (actually named after co-founder Larry Page, amusingly enough). PageRank is based upon each individual Web page's "popularity" on the Web.
So if lots of sites (and a few REALLY popular sites, like the New York Times) link to example.com/gallery.php?p=37&u=4958 -- a photo of a model airplane -- then searches for "model airplane photo" are likely to turn up this image at the very top, over the zillion other photos of model airplanes (even given the same keywords on their pages!)
However, if this photo, with many different 'browsing situations', has 40 URL variants and sites around the net link to many of those, the PR of each will be significantly diluted, causing the photo to be ranked much lower on Google.
So, a few thoughts:
This *ONLY* affects Google (and the search sites that use Google results). Google will be less important after Yahoo stops using their listings, and I also understand that it's neither necessarily reasonable nor fair for you to worry all that much about a single (albeit important) search engine.
The only solution I see is removing current browsing situation info from the URL, which you've implied would be difficult or cumbersome to achieve. This, then, would either necessitate cookies, or the abandonment of (IMHO, useful) current browsing situation data.
And indeed, there are problems with putting current browsing situation data into a cookie, even for users! Though I personally don't care about those folks who refuse cookies (they're making a conscious choice, and I see no need to cater to them), I can imagine that the 'cookied' URLs would present problems when sharing URLs with others. For instance, if I wanted to share a photo of me and my friend Christiana and have friends see other Christiana photos within the same gallery, this would be a problem if that photo also appeared in an "Adam photos" gallery. I can assure you that my friends would much rather see more Christiana photos :D... and without the additional info in the URL, this wouldn't be automatically triggered.
---
I feel bad about seemingly pushing this one particular concern, but as Google is a pretty major search engine and PageRank is a significant factor in search result ordering, I wanted to make sure you were aware of the downsides in creating many different URLs for the same image.
I do want to reiterate, though, that I appreciate the huge shortening of the URL and -- with the exception of Google results ranking and AdSense, I think -- I cannot see any problems with the new structure.
Pekka
10th of January 2004 (Sat), 19:53
I'm PRETTY sure engines like Google can see that when they get
?var1=12345&var2=fhjdhfjd
?var1=12345&var2=xx
?var1=12345&var2=334
?var1=12345&var2=foo
that they all point to same document. They spend millions of dollars developing the spiders and they must have noticed that half the web runs of databases now.
Compared to the "new" system the current EE's URL is "random" in exactly same way - those two represent the same thing, only that other has 300 characters and other 5.
I could check if cookie saves ok (this is possible only on second page user browses) and then we would have either
?photo=123 (with cookies)
or
?photo=123&u=37483 (without cookies)
ThatAdamGuy
10th of January 2004 (Sat), 20:21
Pekka,
I've posted queries in a couple of relevant WebmasterWorld forums to see if having a differing second variable in a database driven page will cause Google to either:
- penalize a site for duplicate content
- diffuse PageRank
I'm confident that I'll get some decent answers, perhaps even from the famous GoogleGuy (Google employee) himself, and I'll post back here with what I find.
Thanks so much for your interest and your patience with my seemingly never-ceasing issues on this topic. I really appreciate it.
ThatAdamGuy
10th of January 2004 (Sat), 21:02
Okay, and while I'm waiting to hear back from some of the experts over there, I'll also drop a related issue here: URL persistence.
If one moves a photo from one gallery to another, does the URL remain constant (at least the first variable under your new scheme)?
And how about in these circumstances:
- a photo is re-watermarked
- a photo is resized
- a photo is given new keyword or caption info
- a photo is added to or removed from other galleries
With blogs, of course, I'm guessing it's easier to have a persistent URL. And I know that there are more complexities in photo databases. But I still want to be assured that when I finally get around to posting my photos, that the URLs will stay relatively consistent after this next change :)
BenV
10th of January 2004 (Sat), 22:58
Although "folderized" parameters sounds good, mod_rewrite can not be a requirement to a basic feature like addresses. Most users use rented virtual servers where adding htaccess may not be an option.
Pekka:
Most people on rented servers who have access to MySQL and PHP with the ftp modules compiled-in will more than likely have the ability to use .htaccess files and have mod-rewrite enabled.
I think the EE requirements already exceed the threshhold of likelihood that people will be able to use mod_rewrite.
Also to modify EE to work on single index.php is just not sensible right now.
You don't have to:
RewriteRule ^photo/.* photo.php
RewriteRule ^exhibition/.* exhibition.php
RewriteRule ^whatever/.* whatever.php
# etc...
YMMV
----
BenV
Pekka
11th of January 2004 (Sun), 04:27
Okay, and while I'm waiting to hear back from some of the experts over there, I'll also drop a related issue here: URL persistence.
If one moves a photo from one gallery to another, does the URL remain constant (at least the first variable under your new scheme)?
And how about in these circumstances:
- a photo is re-watermarked
- a photo is resized
- a photo is given new keyword or caption info
- a photo is added to or removed from other galleries
With blogs, of course, I'm guessing it's easier to have a persistent URL. And I know that there are more complexities in photo databases. But I still want to be assured that when I finally get around to posting my photos, that the URLs will stay relatively consistent after this next change :)
With the first parameter photo=xxx then whatever you do with that photo still gets you same page. If you move the photo around galleries or use it in several galleries in EE the photo id will always remain the same.
The "browser state" parameter u=xxx can even be tranferred from old site to new site because the data is in database :)
Also, if you go really further with this, one may set up a system where you define a text for each photo id, so URL's may be
http://www.foo.com/gallery/photo=paris_1
http://www.foo.com/gallery/photo=paris_2
http://www.foo.com/gallery/photo=rome_colosseum_night
http://www.foo.com/gallery/photo=salonen_in_concert_22
Pekka
11th of January 2004 (Sun), 04:30
Most people on rented servers who have access to MySQL and PHP with the ftp modules compiled-in will more than likely have the ability to use .htaccess files and have mod-rewrite enabled.
I think the EE requirements already exceed the threshhold of likelihood that people will be able to use mod_rewrite.
With release version of EE 1.5 ftp module is not required. So EE will fully work on basic PHP with safe mode on, because GD2 library is included in PHP (since 4.2 or so).
Also to modify EE to work on single index.php is just not sensible right now.
You don't have to:
RewriteRule ^photo/.* photo.php
RewriteRule ^exhibition/.* exhibition.php
RewriteRule ^whatever/.* whatever.php
# etc...
Ok. But I'll still think I'll try other methods first.
Jouko
11th of January 2004 (Sun), 04:41
I think this topic is really a non-issue. Google is (more than) capable of handling even the current URL formats. AdSense (and Google) have way shorter release cycles than EE, trying to follow them would be difficult.
Furthermore, I would oppose using cookies (or mod_rewrite) for this:
1. Every new requirement (from user or webhotel) is a bad thing.
2. I don't think cookies should be used to simplify URL's in this kind of a case. They should be used to remember settings/behaviour of a visitor - like image size he/she prefers etc.
3. I have some money on my bank account that WANTS to get on Pekka's account. The faster the next release is out, the faster the money gets where it belongs to. ;-)
Jouko
http://galleria.vierumaki.com/
Pekka
14th of January 2004 (Wed), 05:11
Just an update to this:
Now I have coded and done lot of empiric testing with the new URL system into EE. The final result was that after I reduced the amount of possible combinations drastically merely by merging variable "offset" to variable u (inbetween character can be freely chosen) we get urls like
http://www.example.com/gallery/list.php?photo=976&u=114|61
or
http://www.example.com/gallery/list.php?photo=976&u=114-61
or
http://www.example.com/gallery/list.php?photo=976&u=114,61
or
http://www.example.com/gallery/list.php?photo=976&u=114_gallery_61
or
http://www.example.com/gallery/list.php?photo=976&u=114_pekka_saarinen_61
whatever works/looks best.
Technical talk:
Gzip compression for stored parameters is also there if needed, so there should be no mentionable database disk space penalty. Although reading of the parameter packs is done from MySQL, speed of the system remains same because the (few milliseconds of) extra time taken to execute this parameter system is counteract by much smaller html code size. As welcomed side effect javascript link system is not needed any more.
Ok, back to PHP...
ThatAdamGuy
14th of January 2004 (Wed), 19:26
That looks great!
The advice I've been reading on WebmasterWorld has been mixed, but a few things have been constant:
- the fewer variables, the better
- never ever use the strong "id=" in a URL, because Google assumes it's a session ID
- spaces and underlines are bad; uses dashes instead (Google treats this_phrase as one word -- "thisphrase" but this-phrase as a phrase or two words)
Anyway, thanks, Pekka, for coding this much nicer URL!
vBulletin® v3.6.12, Copyright ©2000-2012, Jelsoft Enterprises Ltd.