Modern Software Experience

2011-03-22

Adventures in Frame Forwarding

or How not to host a Website

  1. Why Frame Forwarding
  2. Frame Forwarding Problems
  3. Moving the Site

the last bits

This is the last episode in the frame forwarding saga. So far, I've discussed why I used frame forwarding in the first place, the various issues I encountered with frame forwarding, and how I moved the site to a shared hosting company. It may seem the story is done, but there is one final chapter; the old site.

issues

There are multiple issues surrounding the old site. One is that, the moment the domain name transfer completed, there were two identical web sites at two different locations. Search engines do not like duplicated content. I don't want to serve an old page when there is updated page.

Another issue is that the Google index still contains hard links to the old site, and we'd like Google to replace these with the permalinks. And finally, even when we've solved these two problems, there is still the problem of there being hard links to the old site out there; if we don't do something, people will follow these links and not realise that they are on an old and outdated copy of the site.

redirection

The solution to all these problems is redirection. When you follow a hard link to the old site, you'll find that it has changed. Every page has been replaced by a very bare bones ones that tells you that you are in the wrong place, and then points you to the right one. Fore example, this what the about page looks like now:

Old Site Redirect Page

 

The page tells you that you are in the wrong place because of a hard link to an old hosting, and that you will be redirected to the proper page. The page does not only tell you what the proper page is, but provides a clickable link. So, if the automatic redirection does not work for some reason, you can click the link.
The page also asks you to inform the author of the hard link you followed and remarks that although this redirection page corrects the mistake, that this page will not be around forever.

The page will probably be around for a few years. One day I will, for whatever reason, decided to switch to another ISP, and the old site will be deleted. The few hard links to the old location that are still around by then probably aren't very important.

visitors

It is not a particularly friendly page. It is very business-like. It provides a very brief explanation of what is going, why it is happening and how it is done. It gets the job done, and that is good enough.  There aren't many hard links to the old location, and even users who follow such a link aren't like to see to actually see the page.

The page specifies that the browser should redirect to the proper page in zero seconds. The explanatory text is there for the few users who do notice the redirect and wonder what is going on and users who have configured their browser to not follow redirection commands.

Most visitors will be redirected to the proper page without noticing the redirection page. This solves the immediate issue that although the site has moved, the Google index still shows hard links to the old location; whenever any visitor follows such a link, they'll end up on the proper page anyway, and if they decide to bookmark that page, they'll bookmark the permalink.

webmasters

Webmasters that have a hard link to the old location, will be alerted to the change when they check their links. They will not be informed of a broken link; after all, every page they linked to is still there. However, a good link checker will notice that there is a redirection command now and report include that in its link report. When the webmaster decided check it out, they'll have no problem understanding what happened or discovering the permalink they should be using.

search engines

Because there are both permalinks to the site out there and hard links to the old location in the search index, web crawlers will find both the old and the new site. If the old site had not been updated, their back-end software would discover that both have the same content, and possibly penalise the site for that.
Now that the back-end software finds that every page, however large it was, has been replaced with a small redirection page, it understands that the entire site has been moved. Whenever it visits any of the old pages still in its index it fits a small page with a redirection command and a link to that same page repeated in the body of the page.

Ideally, the redirection would be handled by the web server. There are two error codes to tell a browser - and a web crawler - that a page has moved. Code 301 tells the visitor that the page has moved permanently, code 302 tells the visitor that the page has moved temporarily.

The site has moved permanently, so we'd like to give the web crawler a 301 code and a link to the new location. Without some control over the web server, that is possible, and we must use pages with a meta-refresh instruction pages.
Many web crawlers interpret a redirection time-out of 0 or 1 seconds as a permanent redirect (code 301) and anything longer as a temporary redirect (code 302). That is the real reason that the time-out on these pages is zero seconds; the zero-second time-out tells search engines to replace one link with another. Sure enough, without any other intervention by me, at least one link was updated within a day. Random checks suggest that most search engine links have been updated now.

making the redirect pages

In the early days of the site, I occasionally changed to every page manually. That was practical back then, but there are more than four hundred pages now. Obviously, what I needed to do was create some template, and then have some program create all the replacements pages from that template.

This is the template I created:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Language" content="en-gb" />
<meta name="author" content="Tamura Jones" />
<meta name="robots" content="noindex, follow, nosnippet, noarchive, noimageindex" />
<meta http-equiv="refresh" content="0; url=http://www.tamurajones.net/%FILENAME%" />

<title>Redirect Page</title>
<meta name="keyword" content="forwarding." />
<meta name="description" content="Outdated hard link." />
<meta name="summary" content="Forwarding." />
</head>

<body>

<h1>Redirection</h1>

<h2>What</h2>
<p>You will be redirected to <a href="http://www.tamurajones.net/%FILENAME%">http://www.tamurajones.net/%FILENAME%</a></p>

<h2>Why</h2>
<p>The creator of the link you just followed used did not use a permalink, but a hard link to an old hosting location.</p>
<p>You'd do the owner of the link a favour by alerting them to their mistake.</p>
<p>This page corrects the mistake by redirecting to the permalink, but this page will not be around forever.</p>

<h2>How</h2>
<p>This page has a so-called <em>meta refresh</em> in the page header.</p>
<p>The meta refresh instructs the browser to navigate to the proper page.</p>

<p>Copyright © 2011 Tamura Jones. All Rights reserved.</p>

</body>

</html>

It is an fairly ordinary XHTML page, it just has %FILENAME% where the filename should go.
Notice the robots instruction; it tells web crawlers that they should not index the page, not display snippets from the page, not archive the page and not display images from the page, but that they should follow the links on the page.
This way, even a bot that does not understand the meta-refresh redirection will still come and visit the new site.

directory list

The program to create the redirection pages gets two inputs; the template and a list of filenames. It is possible to let a program read a directory and then use that list, but creating the list manually allows filtering the list without having to create any code for that.
I created a simple list of files using the command DIR /B /A-D /OEN > filelist.txt; that generates a file named filelist.txt that contains the names of all the files in the directory, excluding subdirectories, sorted by extension first and name second. I then edited that list to remove a few unneeded files names, such as desktop.ini.

PowerShell

Now, any UNIX administrator would whip up a shell script to quickly generate the desired web pages from the template and the file list, but I am running Windows, and Windows does not ship with a scripting language, and using C or C++ is overkill.
However, I remember attending a presentation, about five years, about the then new Windows PowerShell. That seemed like just the ticket. So I installed PowerShell, and after battling some of the oddities that every first-time PowerShell user runs into, came up with the following script:

# CreateFileFromTemplate.ps1
#
# WARNINGS:
# This script will bluntly overwrite any existing files.
# There is no error handling. None at all.

Write-Host CreateFileFromTemplate.ps1 -foregroundcolor Green
# Write-Host "This script creates files listed in $filelist based on template in $template.txt"
# Write-Host "Occurrences of %FILENAME% in the template will be replaced by the filename."

$filelist = "filelist.txt"
$template = "template.txt"

Write-Host "Processing $filelist" -foregroundcolor Cyan

$filenames = Get-Content $filelist
ForEach ($filename in $filenames)
{
# write-host -NoNewLine "Creating $filename"

    $content = Get-Content $template
    $modified = $content -Replace "%FILENAME%", $filename
    Set-Content $filename $modified
    write-host -NoNewLine "." -foregroundcolor Yellow
}
write-host "`nDone." -foregroundcolor Cyan

This script produces all the redirection pages in just a few seconds. It reads the file list, and then loops through the list. For each filename, it reads the unmodified template, modifies it by replacing each occurrence %FILENAME% with the filename, and then writes the result to a file, using that same filename.

upload

Conceptually, the final step is to upload all these redirection files to the old site, overwriting the articles that are there. Of course, what I actually did was to upload all these files to another directory. Then, after the domain transfer was done, I renamed directories.
The benefit of renaming over overwriting are two-fold. Once the upload was done, renaming directories was all that was left to do. I will delete the directory containing the old site eventually, but for now, all the files of the old site are still around, just in case.

The End

Thus end my Adventures in Frame Forwarding.
I've turned my mistake into this series of articles to share with you some of the things I learned along the way. I learned that frame forwarding is cheap, in both the literal and figurative meaning of the word. I learned that eNom's frame forwarding service is ridiculously unreliable and that their support staff does not even understand HTML. I experienced how much search engines dislike it, and how inconsistent web browsers treat it. I learned that even though Internet Explorer 9 is the first Microsoft browser in a decade that deserves to be called a web browser, Microsoft still managed to make it fail basic stuff that other web browsers handle with ease.

frame forwarding sucks

Above all, I learned that frame forwarding sucks. It makes sense to use your domain name from day one. It makes sense not to splurge when you don't have visitors yet, but frame forwarding isn't such a hot idea.
I knew that frame forwarding is less then perfect, but heh, how bad a solution can it be when so many domain registrars still offer this service to their clients? Well, nothing like personal experience to find out. I should have listened to myself. I should have opted for real web site hosting from the start; the money paid is worth the hassle saved.
This site is truly frames-free now, and I like it that way.

links