Implementation of the Webcache

go back to this Tutorial
The webcache now consists of 4 php files and some lines in .htaccess. The files can be downloaded below, but I recommend that you read this first.
The cache only runs on Unix based hosting solutions (Linux, FreeBSD, Solaris ...) as it is based on some unix commands. It can however easily be changed to run under windows by substituting the corresponding command for mv (move). It is also designed to work with .htacess (apache webserver), which should be running on most hosts anyways.
dfcg-config.inc.php I preserved the original name of the config file from Ben. I added a couple of more lines.
cache_frontpage.php This file is responsible for caching the front page only. This is also wher I have a random block. Since this file only caches a single file, I call this every 10 minutes via a cron job.
cache_comm.php This file caches all nodes (actually only nodes that have a path alias) that have comments enabled.
cache_nocomm.php This file caches all nodes with path alias that have no comments enabled and are static.

It makes sense to split the cache like this. I only need to call cache_nocomm.php if I made a major change (e.g. in the navigation menu). Otherwise these nodes are rather static and don't change much. It takes about 5-6 hours to recreate the cache for all those pages. There is a sleep time defined in the config file ($sleep_static = 10). This will cause the process to pause for 10s between each file that is being cached. This is the reason that it takes so long, but this will greatly reduce impact on the server. After all this is what we are after, to unload our server and accustom more users with less CPU usage.
The file cache_comm.php only caches those pages that have a comment enabled. In order to distinguish between the two, I did a simple table lookup:

$sql = 'SELECT nid FROM `node` WHERE `comment` > 0 LIMIT '.$n_offset.', '.$n_dynamic;

This will simply determine if the comment in table node has a value greater than 0. This may be different in your CMS, but it should be rather easy to figure this out by playing around with phpmyadmin.
The code above is pretty much the main difference between cache_comm.php and cache_nocomm.php (here it goes = 0 instead of > 0).
Offset and dynamic are defined in the config file. This is to limit the number of nodes to be created and restart later with an offset. This way the whole process can be split up into multiple runs, in case you have problems with your provider. Currently I am just shy of 2000 nodes and I am o.k. but with a larger website you may be in trouble for running processes for such a long time. It depends on your host. Either way, the hooks are there in case you want to modify it.
Next I need to look up the URL alias:

$sql2 = 'SELECT dst FROM `url_alias` WHERE `src` = CONVERT(_utf8 \'node/'.$nid.'\' USING latin1) COLLATE latin1_general_ci LIMIT 0, 1 ';

This simply takes the node number and figures out the url alias that goes with it.
Since I am making heavy use of the path-alias in Drupal (SEO friendly URLs, similar modules are available for other CMS), I had to do some more lookups on the database, to determine the exact filenames.
Example:
The url:
http://www.aguntherphotography.com/galleries/south_america/peru.html
is actually a Drupal Node, but the path alias makes it look nicely. I would like to preserve this throughout the cache, so I have to create the file
galleries/south_america/peru.html


I am not a php progammer, so I helped myself with a little trick. I simply went to phpmyadmin and did some sorting and selecting on the respective nodes. I then simply copied the code that phpmyadmin generated and modified it slightly. So I am pretty proud of my first little php hack, especially since it works ;-)

The rest of the code is similar to Bens original program. I made sure that each of the 3 files generates different lock and log files, so that they can run concurrently. This is important since I run the generation of the front page every 10 minutes, while the generation of the static pages with no comment takes about 6 hours.

Right now there is one little drawback.
The url:
http://www.aguntherphotography.com/galleries/south_america/peru.html

requires me to have the directory:
galleries/south_america
in the temp folder and in the folder where the static files end up.
There are two ways out of this. First, the script could be spiced up with some unix commands to automatically create the directory if it is not there.
A better way however would be to strip out the "/" and simply generate a file:
galleriessouth_americaperu.html
and changing the code in the .htaccess file.
This is probably the cleanest and easiest solution. PHP has a special command to strip the slash and with a simple rule .htacess should be able to handle this.
This change is on my list, but so are other things. Right now it is easier to simply create a directory in two places if I add a new hierarchy.

The file cache_frontpage.php does nothing more but download the frontpage and saves it in the document root as index.html.

The .htaccess code looks like this:
  RewriteCond %{QUERY_STRING} =""
  RewriteRule ^$ index.html [L]
  RewriteCond /home/xxx/public_html/static%{REQUEST_URI} -f
  RewriteCond %{QUERY_STRING} !^.+$
  RewriteRule ^(.*)$ static/$1 [L]

I think the first two lines can be dropped, by simply changing this line in my .htaccess file:
DirectoryIndex index.php
to
DirectoryIndex index.html
Even though this would affect every directory, I don't have any index.html files anywhere but in the static cache.

There is some room for optimization on the .htaccess code and on the php code. I will soon implement the change that cuts out all slashes to make the scripts more user friendly. The scripts are tailored to suit my own needs, but you may find them usefull to start your own development with. Maybe your needs are very similar and you don't need any mods.

Download the PHP files by clicking me.

Excellent!

Is this on the drupal.org handbook or was it shot down by somebody for whatever reason?

Oh, hey. Nice to see you on

Oh, hey. Nice to see you on my site. Haven't seen you in a while.

I am not exactly sure what you are refering to with shot down, but this is something I stole from Ben as outlined here and modified to suit my needs.

I wrote in a thread on drupal.org about it, but I didn't feel it was my right to take credit for it since Ben had the original idea.

Excellent!

Hi, I stumbled across this while doing a google search and I wonder if it will solve my Drupal speed issues. I have recently moved over to DreamHost from a dedicated server in an attempt to cut costs and I'm finding the speed of Drupal on this system is excrutiatingly slow. I'm also more familiar with Xaraya than with Drupal and I assumed Drupal would implement a similar caching system to Xaraya which uses (your choice) flat file caching similar to yours, or database caching (or memcache and others). I find the compiled HTML file caching to be the most speedy. I'd love to see something like this in Drupal. In Xaraya it is implemented so that if there is a change to the site the cache is automatically updated. I think Blocks aren't cached by default but you can fine tune the caching so that certain elements are cached. I'd love to see some figures of speeds with different kinds of caching implemented. I will give your caching system a go once my new Drupal site is complete. One of the other things that strikes me as slow in Drupal is the messy nature of the modules, each one seems to want to load its own css, javascript and other files on every page. I know the browser should cache these but it still seems like a lot of extra data is being requested when it may not be needed. I'd like to see a better override system for the modules, where a theme developer can easily override css and javascript files, for instance so he could compile one javascript file optimised for quicker loading rather than all the module js files, likewise for the css. While Drupal is an amazing product, I have to say Xaraya seems better coded, or directed, implemented, whatever. I just wish there were more developers on that project. Regards, Tanc.

If you are implementing

If you are implementing this cache, keep in mind that currently only nodes with path alias are being cached. For me this worked out nicely, but it may not for you.
The actual version that I had running includes a check that will not copy the files from the temp space to the static dir. This is important ! If the MySQL request fails, you will get a file that is less than 1kB.

Currently the cache is disabled on my site since I was adding a lot of stuff (changing the menu structure of each page). I will re-enable it again. It also makes sense if only a rel. small part of your nodes allows comments. Then you could simply not cache those.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options