Caching Web Sites With PEAR Cache

Improve site performance with a PHP-driven cache.

Traffic Jam

Think of today's Web as a freeway, ferrying cars of all shapes and sizes to their destinations. Zoom in a few factors, and focus on the red Ferrari travelling north at high speed. Notice how it's able to dodge and swerve past its larger brethren because of its small size and efficient design. Then take a look at the large black truck a few kilometres behind it, slowly lumbering forward with its load of cut-price refrigerators. Notice how slowly it's travelling, and how its monstrous size hinders its ability to take advantages of changes in the traffic flow.

More often than not, today's Web sites more closely resemble the big black truck crawling along the freeway than the little red Ferrari zipping through traffic. This is a natural consequence of the bells and whistles - sophisticated interfaces, streaming media, dynamically-generated content - increasingly sported by Web sites in their attempts to attract and retain visitors. As sites become more content-rich, as their reliance on dynamic data sources increases, as more and more requests come in per minute, it's only natural that the first casualty be the performance of the system.

Fortunately, it's not all doom and gloom. By "caching" certain sections of your Web site, you can reduce the load on your server, significantly improve application response times and make your users happier. Over the course of this article, I'm going to use PEAR's Cache class to show you how.

A Cache For Every Need

First up, what's a cache? According to one definition, a cache "is a store of information that is designed to improve the accessibility or availability of data to the user." Simply put, it is a location where copies of information are kept so they can sent to the user quickly and without using scarce Internet resources.

Caches can be maintained at various levels depending on the requirements of the user. Unknown to many, the most commonly used caching mechanism is the Web browser itself. Modern Web browsers download content to a temporary location on the hard drive before rendering it to the user. And usually, if you visit the same page again, the browser will just pick it up from the local cache (unless you configured it otherwise).

At a workplace, it's highly likely that you share your Internet connection with a large group of users, through a proxy server. Clever network administrators often use the proxy server's cache to save copies of frequently-requested pages. Subsequent requests for such pages are directly serviced from the proxy server's cache. This system is usually replicated at different levels of the food chain - it's not uncommon to find ISPs caching content in order to reduce traffic that might otherwise eat up precious bandwidth on their Internet backbone.

Finally, Web sites often implement a caching system to serve their own content faster. In its simplest form, such a system consists of sending static "snapshots" of dynamic pages to clients, rather than re-creating the pages anew in response to every request. This reduces server load, and frees up resources for other tasks. The snapshots are refreshed at regular intervals to ensure they are reasonably "fresh". This is the type of caching discussed in this tutorial.

A Galaxy Far, Far Away

The Cache class is a PEAR package which provides a caching framework for PHP developers. It's currently maintained by Christian Stocker, Ulf Wendel and Helgi Pormar, and is freely available from http://pear.php.net/package/Cache.

To understand how it works, consider the following simple script:

<?php
// include the package
include("Cache.php");

// initialize cache
$cache = new Cache("file", array("cache_dir" => "cache"));

// generate an ID for this cache
$id = $cache->generateID("starwars");

// if cached data available, print it
// add a message indicating this is cached output
if ($quote = $cache->get($id)) {
    echo $quote;
    echo " [cached]";
// else obtain fresh data and print it
// also save it to the cache for the next run
// auto-expire the cached data after 2 min
} else {
    $quote = "Do, or do not. There is no try. -- Yoda, Star Wars";
    echo $quote;
    $cache->save($id, $quote, 120);
}
?>

Now, the first time, you run this script, the page will be generated from scratch and you'll see something like this:

Do, or do not. There is no try. -- Yoda, Star Wars

Next, try refreshing the page. This time, the output will be retrieved from the cache:

Do, or do not. There is no try. -- Yoda, Star Wars [cached]

Let's take a quick look at the script above. I begin by importing the Cache class, and creating a new Cache() object. This object is initialized with two arguments: the "container" to use for the cached data (here, a file, although you can also use a database), and an array containing container-specific options (here, the location of the cache directory).

You can use multiple simultaneous caches with the Cache class, which is why every cache must be given a unique identifier. The object's generateID() method is used to create this from the identifying string "starwars".

Once an instance of the Cache object has been created, the business logic to use it becomes fairly simple. The first step is to check if the required data already exists in the cache. If it doesn't, it should be generated from the original data source, and a copy saved to the cache for future use. If it does, you can do something useful with it - write it to a file, pipe it to an external program or output it to the client.

Checking whether the data already exists in the cache is accomplished with the get() method, while writing a fresh snapshot to the cache is done with the save() method. Notice that both methods required the ID generated by generateID() to identify which cache to access; the save() method additionally lets you specify a duration (in seconds) for which the cache is valid.

The steps above make up a fairly standard process for using the Cache class, and you'll see them being repeated over and over again in subsequent examples as well.

Mixing It Up

Of course, in the real world, it's unlikely that you'll want to cache a single line of output. It's more likely that you'll have a complete page to cache, usually with PHP code mixed in with the HTML formatting instructions. It's fairly easy to adapt the steps outlined on the previous page to fit this situation, by adding an output buffer to the mix. Take a look:

<?php
// include the package
include("Cache.php");

// initialize cache
$cache = new Cache("file", array("cache_dir" => "cache"));

// generate an ID for this cache
$id = $cache->generateID("index");

// look in the cache for this page
// if it exists, print it and exit
if ($data = $cache->get($id)) {
    echo $data;
    echo "<br> [cached data]";
// if not, generate the page from scratch
// and save it to the output buffer
} else {
    ob_start(); ?>

    <!-- page begins -->
    <html>
    <head></head>
    <body>
    My favourite fruits:
    <ul>
    <?php
    // mix in some PHP code
    $fruits = array("apples", "oranges", "bananas");
    foreach ($fruits as $f) {
        echo "<li>$f";
    } ?>
    </ul>
    </body>
    </html>
    <!-- page ends -->

<?php
    // save the output buffer to the cache
    // for the next run
    // then dump it to the client
    $output = ob_get_contents();
    $cache->save($id, $output, 120);
    ob_end_flush();
}
?>

Here, if the call to get() returns false, the script initializes PHP's output buffer with ob_start() and then generates a complete page (including the dynamic PHP bits). The contents of the output buffer are then retrieved with a call to ob_get_contents(), and saved to the cache in readiness for the next request with the save() method. The ob_end_flush() method then flushes the output buffer and sends the page to the browser.

An alternative way to do this involves using the Cache_Output class, a subclass of the Cache class. This subclass (which internally uses output buffering) inherits all of the attributes of the parent class, and adds two methods to the parent class' method collection: a start() method, which begins caching data, and an end() method, which marks the end of data caching. The following listing, which is equivalent to the previous one, demonstrates usage:

<?php
// include the package
include("Cache.php");
include("Cache/Output.php");

// initialize cache
$cache = new Cache_Output("file", array("cache_dir" => "cache"));

// generate an ID for this cache
$id = $cache->generateID("index");

// look in the cache for this page
// if it exists, print it and exit
if ($data = $cache->start($id)) {
    echo $data;
    echo "<br> [cached data]";
// if not, generate the page from scratch
// and save it to the output buffer
} else {
    ?>

    <!-- page begins -->
    <html>
    <head></head>
    <body>
    My favourite fruits:
    <ul>
    <?php
    // mix in some PHP code
    $fruits = array("apples", "oranges", "bananas");
    foreach ($fruits as $f) {
        echo "<li>$f";
    } ?>
    </ul>
    </body>
    </html>
    <!-- page ends -->

<?php
// save the output buffer to the cache
// for the next run
// then dump it to the client
    echo $cache->end(120);
}
?>

Here, the start() method checks the cache to see if a page already exists for the named ID; if it doesn't, the page is generated in the output buffer, saved to the cache and then flushed to the browser once the end() method is called. The same as the previous script, really, except this version is a little more compact.

Flava Of Da Month

Just as you can cache pages, so too can you cache variables. The Cache_Application subclass lets you save application-level variables to the cache, and make them available to all instances of an application. This means that different users of an application can "share" variables with each other, even though their application instances are running in different sessions.

An example will make this clearer:

<html>
<head></head>
<body>

<?php
// include the package
include("Cache.php");
include("Cache/Application.php");

// initialize cache
$cache = new Cache_Application("file", array("cache_dir" => "cache/"));

// if new flavour submitted
// save its value to the cache
if ($_POST['flava']) {
    $cache->register("flava", $_POST['flava']);
}

// get cached variables
$VARS =& $cache->getData();

// if required variable exists in cache
// print its value
if ($VARS['flava']) {
    echo "Current flavour: " . $VARS['flava'];
}
?>

<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
Enter a new flavour: <input name="flava" type="text">
<input type="submit" name="Save">
</form>

<?php
// run destructor to save variables to cache
$cache->_Cache_Application();
?>

</body>
</html>

Now, every time a new flavour is submitted, it is saved to the application cache as a variable. Because the cache is centralized and accessible to all application instances, new instances can read this variable by importing it into the $VARS array with getData(). These new instances may themselves alter the current flavour (even though they may be running in different sessions) and save the value back to the application cache.

Such application-level caching is very useful to make global configuration changes to a running application, because changes made by one instance (for example, the administrator instance) are immediately visible in all other instances.

Keeping Time

Why stop with variables? It's also possible to cache the output of user-defined functions with the Cache class...or, more precisely, with the Cache_Function subclass. This subclass inherits all the methods and properties of the parent Cache class, and adds an additional one: the call() method, which is used to save the return value of a PHP function to the cache.

Using the Cache_Function object's call() method is simplicity itself: the methods accepts the name of the function to be cached as its first argument, and input parameters to that function as additional arguments. It then checks the cache to see if the function has been previously cached. If it has, the previous return value is retrieved without actually executing the function; if it has not, the function is executed (with appropriate arguments) and the result returned to the caller and simultaneously saved to the cache for future use.

Here's an example:

<?php
// function to return
// the current time
function getTime()
{
    return date("H:i:s", mktime());
}

// include the package
include("Cache.php");
include("Cache/Function.php");

// initialize cache
$cache = new Cache_Function("file", array("cache_dir" => "cache/"), 300);

// call cached function
echo "The cached time is " . $cache->call("getTime") . ". The real time is " . getTime();
?>

And here's an example of the output:

The cached time is 13:08:50. The real time is 13:11:40

As you might imagine, this class has the potential to do much more than simply confuse users in search of the time. Here's another, more realistic example, which caches a database result set and uses the cached set in order to reduce the time spent on executing subsequent queries for the same data. The result set is obtained via a function called getStories(), converted into a PHP array and cached using the call() method of the Cache_Function class.

<?php
// returns a list of stories
// from a content source
function getStories()
{
    // open connection to database
    $connection = mysql_connect("localhost", "joe", "secret") or die("Unable to connect!");
    mysql_select_db("db1") or die("Unable to select database!");

    // formulate and execute query
    $query = "SELECT headline FROM stories ORDER BY tstamp DESC";
    $result = mysql_query($query) or die("Error in query: " . mysql_error());

    // iterate over result set and create array
    while ($row = mysql_fetch_object($result)) {
        $arr[] = $row->headline;
    }

    // close connection
    mysql_close($connection);

    // return array of records
    return $arr;
}

// include the package
include("Cache.php");
include("Cache/Function.php");

// initialize cache
$cache = new Cache_Function("file", array("cache_dir" => "cache/"), 300);

// call function
$data = $cache->call("getStories");

// iterate over return value and print elements
echo "<ul>";
foreach ($data as $s) {
    echo "<li>$s";
}
echo "</ul>";
?>

This kind of thing comes in particularly handy when your database server is overworked and cannot deal efficiently with the volume of requests it's getting. Remember that this technique is most appropriate for data that is relatively less volatile, and doesn't change on a minute-by-minute basis.

Different Strokes

So far, all the examples you've seen have used a file cache. However, the Cache class also allows you to use a database for your cache. To do this, first create a database table to hold the cached data. If you're using MySQL, this is the SQL you'll need:

CREATE TABLE cache (
   id CHAR(32) NOT NULL DEFAULT '',
   cachegroup VARCHAR(127) NOT NULL DEFAULT '',
   cachedata BLOB NOT NULL DEFAULT '',
   userdata VARCHAR(255) NOT NULL DEFAULT '',
   expires INT(9) NOT NULL DEFAULT 0,
   changed TIMESTAMP(14) NOT NULL,
   INDEX (expires),
   PRIMARY KEY (id, cachegroup)
);

Next, initialize your Cache() object to use a "db" container instead of a file. Note that the options to the Cache() object constructor change to reflect the database access parameters and table name:

<?php
// include the package
include("Cache.php");

// initialize cache
$cache = new Cache("db", array("dsn" => "mysql://user:password@localhost/mydb", "cache_table" => "cache"));

// generate an ID for this cache
$id = $cache->generateID("starwars");

// if cached data available, print it
// add a message indicating this is cached output
if ($quote = $cache->get($id)) {
    echo $quote;
    echo " [cached]";
// else obtain fresh data and print it
// also save it to the cache for the next run
// auto-expire the cached data after 30 min
} else {
    $quote = "Do, or do not. There is no try. -- Yoda, Star Wars";
    echo $quote;
    $cache->save($id, $quote, 1800);
}
?>

When you run this script, the MySQL table will be used as a cache, instead of a file.

Cache Cow

Now that you know the theory, let's wrap this tutorial up with a real-world example of how useful the Cache class can be in improving performance on your Web site. As you may (or may not) know, Amazon.com has opened up their product catalog, allowing developers to create Amazon-backed online stores with its ECS service. This service allows developers to interact with Amazon.com's catalog and transaction system using SOAP.

The only problem? Querying the ECS system, retrieving XML-encoded data and formatting it for use on a Web page is a complex process, one which can affect the performance of your store substantially. In order to compensate for this, clever developers can consider caching some of the AWS data on their local systems, reducing the need to query Amazon.com for every single request and thereby improving performance on their site.

Here's a script that demonstrates how this might work:

<html>
<head><basefont face="Arial"></head>
<body>
<?php
// include Cache class
include("Cache.php");
include("Cache/Output.php");

// initialize cache
$cache = new Cache_Output("file", array("cache_dir" => "cache"));

// generate an ID for this cache
$id = $cache->generateID("scifi");

// look in the cache for this page
// if it exists, print it and exit
if ($data = $cache->start($id)) {
    echo $data;
    echo "<br> [cached data]";
// if not, generate the page from scratch
// and save it to the output buffer
} else {
    // include SOAP class
    include("SOAP/Client.php");

    // initialize SOAP client
    $soapclient = new SOAP_Client("http://webservices.amazon.com/onca/soap?Service=AWSECommerceService");

    // set data for SOAP request
    // get top sellers in Sci-Fi/Fantasy category
    $params = array('SubscriptionId' => '0GFYHCJY9H6HW2FTFTG2',
                    'SearchIndex' => 'Books',
                    'Sort' => 'salesrank',
                    'ResponseGroup' => 'Small',
                    'BrowseNode' => '25');

    // get node information
    $result = $soapclient->call("ItemSearch", $params);
    if (PEAR::isError($result)) {
        die("Something went wrong...");
    }

    // print items
    foreach ($result['Items']->Item as $item) {
        echo "<b><a href=" . $item->DetailPageURL . ">" . $item->ItemAttributes->Title . "</a></b><br />";
        echo $item->ItemAttributes->Author . "<p />";
    }

    // save the output buffer to the cache
    // for the next run
    // then dump it to the client
    echo $cache->end(1800);
}
?>
</body>
</html>

I'm assuming here that you already know how to use ECS (in case you don't, look at for a tutorial on the subject) and so will focus on the caching aspects of the system.

Since this Web page contains intermingled HTML and PHP code, it's convenient to use the Cache_Output class here. The first task, obviously, is to initialize and configure the cache; I've set the cache to refresh itself at 30-minute intervals, since this seems like a reasonable period of time. Next, I've used the start() method to see if any data already exists in the cache, and display it if so.

If no data exists, the PEAR SOAP class is include()-d in the script, a SOAP client is instantiated, and a request is made to Amazon.com for a list of sci-fi bestsellers (node ID 25 in the ECS system). The response is then parsed and formatted into a Web page suitable for display; it is also simultaneously saved to the cache so that future requests for the same page can be served instantly, without having to query the ECS system each time.

And that's about all I have time for today. I hope this tutorial offered some insight into how PEAR's Cache class can be used to speed up performance on your Web site, and gave you some ideas to improve the responsiveness of your site. Until next time...happy coding!

This article was first published on03 Nov 2006.