The Need For Speed
In the old days, a Web site usually consisted of static HTML pages and perhaps a few images to liven up the text. No more is this the case - sophisticated interfaces, streaming media, dynamically-generated content and other enhancements have all contributed to make today’s Web more content-rich and interactive than ever before. Most often, this is a Good Thing - greater accessibility and more quality content only make the Web more attractive to new users, and increase its usefulness to the community at large.
However, there is a downside to this phenomenon as well. As sites become more content-rich, as their reliance on dynamic data sources increases, as their servers struggle to meet the thousands of requests coming in per minute, it’s only natural that the first casualty be the performance of the system. This is clearly visible in the Web, which today more closely resembles a slow traffic jam than a fast-moving freeway.
Fortunately, there is a workaround, one that has been successfully used by many sites to provide a performance improvement: caching. And over the course of this article, I’m going to show you a few examples of this technique in action, using my favourite language, PHP, and an open-source implementation of a server-side cache called Cache_Lite. Flip the page, and let’s get started!
The Food Chain
According to one definition (http://www.nottm.edu.org.uk/techi/networks/caching.pdf), a cache “is a store of information that is designed to improve the accessibility or availability of data to the user.” Simply put, it is a location where copies of information are kept so they can sent to the user quickly and without wastage of already scarce Internet resources.
Now, caches can be maintained at various levels depending on the requirements of the user. Unknown to many, the most commonly used caching mechanism is the Web browser itself. Modern Web browsers download content to a temporary location on the hard drive before rendering it to the user. And usually, if you visit the same page again, the browser will just pick it up from the local cache.
At a workplace, it’s highly likely that you share your Internet connection with a large group of users. Here, the local cache of the Web browser does not always help to optimize the utilization of resources. In such a situation, network administrators often utilize the caching features of the proxy server used to share the Internet connection. This allows them to ensure that commonly-accessed sites are cached at the time of the first request. Subsequent requests to the same page are directly serviced from the proxy server’s cache. Of course, this is not advisable for Web sites with highly dynamic content; proxy servers can be configured to avoid caching these pages.
The above caching mechanism is often replicated at different levels. It is not uncommon to find ISPs caching content in order to reduce traffic that might otherwise eat up precious bandwidth on the Internet backbones.
Finally, Web sites often implement complex caching mechanisms so as to serve their own content faster. This is the type of caching that this article discusses - caching your data at the application level. Keep reading.
Return Of The Jedi
The Cache_Lite class comes courtesy of PEAR, the PHP Extension and Application Repository (http://pear.php.net). In case you didn’t know, PEAR is an online repository of free PHP software, including classes and modules for everything from data archiving to XML parsing. When you install PHP, a whole bunch of PEAR modules get installed as well; the Cache_Lite class is one of them.
In case your PHP distribution didn’t include Cache_Lite, you can get yourself a copy from the official PEAR Web site, at http://pear.php.net - simply unzip the distribution archive into your PEAR directory and you’re ready to roll!
Let’s begin with something simple - building a primitive cache using Cache_Lite object methods. Here’s the code:
<?php
// include the package
require_once("Lite.php");
// set an ID for this cache
$id = "starwars";
// set some variables
$options = array(
"cacheDir" => "cache/",
"lifeTime" => 50
);
// create a Cache_Lite object
$objCache = new Cache_Lite($options);
// test if there exists a valid cache
if ($quote = $objCache->get($id)) {
// if so, display it
echo $quote;
// add a message indicating this is cached output
echo " [cached]";
} else {
// no cached data
// implies this data has not been requested in last cache lifetime
// so obtain it and display it
$quote = "Do, or do not. There is no try. -- Yoda, Star Wars";
echo $quote;
// also save it in the cache for future use
$objCache->save($quote, $id);
}
?>
Don’t worry if it didn’t make too much sense - all will be explained shortly. For the moment, just feast your eyes on the output:
Do, or do not. There is no try. -- Yoda, Star Wars
Now refresh the page - you should see something like this, indicating that the second occurrence of the page has been retrieved from the cache.
Do, or do not. There is no try. -- Yoda, Star Wars [cached]
Let’s take a closer look at how this was accomplished, on the next page.
Digging Deeper
When implementing a cache for your Web page, the first step is, obviously, to include the Cache_Lite class
<?php
// include the package
require_once("Lite.php");
?>
You can either provide an absolute path to this file, or do what most lazy programmers do - include the path to your PEAR installation in PHP’s “include_path” variable, so that you can access any of the PEAR classes without needing to type in long, convoluted file paths.
The Cache_Lite object can support multiple caches simultaneously, so long as every cache created has a unique identifier. In this case, I’ve used the identifier “starwars” to uniquely distinguish the cache I’ll be using.
<?php
// set an ID for this cache
$id = "starwars";
?>
Next, an object of the Cache_Lite class needs to be initialized, and assigned to a PHP variable.
<?php
// create a Cache_Lite object
$objCache = new Cache_Lite($options);
?>
This variable, $objCache, now serves as the control point for cache manipulation.
The constructor of the Cache_Lite class can be provided with an associative array containing configuration parameters; these parameters allow you to customize the behaviour of the cache. In the example above, this array contains two parameters, “cacheDir”, which specifies the directory used by the cache, and “lifeTime”, which specifies the period for which data should be cached, in seconds.
<?php
// set some variables
$options = array(
"cacheDir" => "cache/",
"lifeTime" => 50
);
?>
Note that the directory specified must already exist, or else the cache will simply not work.
Once an instance of the Cache_Lite object has been created, the business logic to use it becomes fairly simple. The first step is to check if the required data already exists in the cache. If it doesn’t, it should be generated from the original data source, and a copy saved to the cache for future use. If it does, you can do something useful with it - write it to a file, pipe it to an external program or - as I’ve done here - simply output it to the screen for all to admire.
<?php
// test if there exists a valid cache
if ($quote = $objCache->get($id)) {
// if so, display it
echo $quote;
// add a message indicating this is cached output
echo " [cached]";
} else {
// no cached data
// implies this data has not been requested in last cache lifetime
// so obtain it and display it
$quote = "Do, or do not. There is no try. -- Yoda, Star Wars";
echo $quote;
// also save it in the cache for future use
$objCache->save($quote, $id);
}
?>
Most of this logic is accomplished via the get() and save() methods of the Cache_Lite object. The get() method checks to see if the data exists in the cache and returns it if so, while the save() method saves data to the cache. The save() method accepts the data to be saved, together with a unique identifier, as input arguments; the get() method uses this identifier to find and retrieve the cached data.
The steps above make up a fairly standard process for using the Cache_Lite class, and you’ll see them being repeated over and over again in subsequent examples as well.
In And Out
The example you just saw cached the output of PHP statements. It’s also possible to cache static HTML content, through creative use of PHP’s output buffering functions. Consider the following example, which revises the code on the previous page to cache HTML markup instead of PHP command output:
<?php
// include the package
require_once("Lite.php");
// set some variables
$options = array(
"cacheDir" => "cache/",
"lifeTime" => 5
);
// create a Cache_Lite object
$objCache = new Cache_Lite($options);
// test if there exists a valid cache
if ($page = $objCache->get("starwars")) {
// if so, display it
echo $page;
// add a message indicating this is cached output
echo " [cached]";
} else {
// no cache
// so display the HTML output
// and save it to a buffer
ob_start(); ?>
<html>
<head></head>
<body>
Do, or do not. There is no try.
<br>
-- Yoda, <i>Star Wars</i>
</body>
</html>
<?php
// page generation complete
// retrieve the page from the buffer
$page = ob_get_contents();
// and save it in the cache for future use
$objCache->save($page, "starwars");
// also display the buffer and then flush it
ob_end_flush();
}
?>
In this case, PHP’s output buffering functions have been used to capture all the output generated subsequent to the call to ob_start(), and store this output in a buffer. Once the entire output has been captured, the ob_get_contents() function is used to retrieve the buffer contents to a variable. This variable (which now contains the entire HTML page) is then stored in the cache for future use. Finally, the ob_end_flush() function is used to end output buffering and send the contents of the buffer to the browser.
Different Strokes
An alternative method to accomplish the task on the previous page lies with the Cache_Lite_Output() class, a subclass of the Cache_Lite class. This subclass (which internally uses output buffering) inherits all of the attributes of the parent class, and adds two methods to the parent class’ method collection: a start() method, which begins caching data, and an end() method, which marks the end of data caching.
Consider the following variant of the example on the previosu page, which illustrates how this may be used:
<?php
// include the package
require_once("Output.php");
// set some variables
$options = array(
"cacheDir" => "cache/",
"lifeTime" => 5
);
// create a Cache_Lite_Output object
$objCache = new Cache_Lite_Output($options);
// test if there exists a valid cache
// if so, display it, else regenerate the page
if (!$objCache->start("starwars")) {
?>
<html>
<head></head>
<body>
Do or do not. There is no try.
<br>
-- Yoda, <i>Star Wars</i>
</body>
</html>
<?php
// end caching
$objCache->end();
}
?>
Over here, if the call to start() - which is provided with a cache identifier - returns true, it implies that the requested data has already been cached, and the Cache_Lite_Output object takes care of printing the contents of the cache. If, however, the call to start() returns false, it implies that the data is not present in the cache, and so all subsequent output generated by the script will be cached, until the object’s end() method is invoked.
Bits And Bytes
It’s also possible to use the Cache_Lite object to cache certain sections of a page, rather than the entire page - useful if some parts of the page are updated more frequently than others. This is accomplished by specifying a different cache identifier for each block of data that is to be cached, and using these different identifiers to retrieve data when needed. Data that is not to be cached (that is, data that is to be retrieved from source every time it is needed) will simply not be included in the call to the Cache_Lite object’s save() method.
Consider the following example, which demonstrates:
<?php
// include the class
require_once('Lite.php');
// initalize the cache for 24 hours
$objCache = new Cache_Lite(array("cacheDir" => "cache/", "lifeTime" => 86400));
// open connection to database
$connection = mysql_connect("localhost", "joe", "secret") or die("Unable to connect!");
mysql_select_db("db111") or die("Unable to select database!");
// first data block
if ($data = $objCache->get("featured_quote")) {
// retrieve the header from the cache, if available
// display cached data
echo $data;
} else {
// else query database for it, and also store it in cache
// formulate and execute query
$query = "SELECT quote FROM quotes ORDER BY RAND() LIMIT 1";
$result = mysql_query($query) or die("Error in query: " . mysql_error());
// get record
$row = mysql_fetch_object($result);
// display it
echo $row->quote;
// and also cache it
$objCache->save($row->quote, "featured_quote");
}
// //
// dynamic, non-cacheable page content goes here //
// //
// second data block
// test if there is a valid cache for this block
if ($data = $objCache->get("featured_character")) {
// if so, display it
echo $data;
} else {
// formulate and execute query
$query = "SELECT char FROM characters ORDER BY RAND() LIMIT 1";
$result = mysql_query($query) or die("Error in query: " . mysql_error());
// get record
$row = mysql_fetch_object($result);
// display it
echo $row->char;
// and also cache it
$objCache->save($row->char, "featured_character");
}
// close connection
mysql_close($connection);
?>
In this case, the code for the header and footer blocks is cached; however, the data in between the two is not. Note the use of two different cache identifiers in this example, one for each block.
Of Form And Function
Just as you can cache HTML markup and PHP command output, it’s also possible to cache the output of user-defined functions with the Cache_Lite class…or, more precisely, with the Cache_Lite_Function subclass. This subclass inherits all the methods and properties of the parent Cache_Lite class, and adds an additional one: the call() method, which is used to save the return value of a PHP function to the cache.
Using the Cache_Lite_Function object’s call() method is simplicity itself: the methods accepts the name of the function to be cached as its first argument, and input parameters to that function as additional arguments. It then first checks the cache to see if the function has been cached previously. If it has, the previous return value is retrieved without actually executing the function; if it has not, the function is executed (with appropriate arguments) and the result returned to the caller and simultaneously saved to the cache for future use.
Here’s a simple example which demonstrates how this works:
<?php
// returns the current time
function get_time()
{
return date("H:i:s", mktime());
}
// include the class
require_once('Function.php');
// configure the cache
$options = array(
'cacheDir' => 'cache/',
'lifeTime' => 600
);
// create an instance of the Cache_Lite_Function object
$objCache = new Cache_Lite_Function($options);
// cache the call to the get_time() function
// if return value is present in cache,
// Cache_Lite_Function will fetch it from the cache
// else the function will be called and the result returned
$time = $objCache->call('get_time');
echo "The cached time is " . $time . ". The real time is " . get_time();
?>
And here’s an example of the output:
The cached time is 13:08:50. The real time is 13:11:40
As you might imagine, this class has the potential to do much more than simply confuse users in search of the time. Flip the page to see what I mean.
No News Is Good News
Here’s another, more realistic example, which caches a database result set and uses the cached set in order to reduce the time spent on executing subsequent queries for the same data. The result set is obtained via a function called get_headlines(), converted into a PHP array and cached using the call() method of the Cache_Lite_Function class.
<?php
// returns an array of headlines from a database
function get_headlines()
{
// open connection to database
$connection = mysql_connect("localhost", "joe", "secret") or die("Unable to connect!");
mysql_select_db("db111") or die("Unable to select database!");
// formulate and execute query
$query = "SELECT headline FROM news ORDER BY timestamp DESC";
$result = mysql_query($query) or die("Error in query: " . mysql_error());
// iterate over result set and create array
while ($row = mysql_fetch_object($result)) {
$arr[] = $row->headline;
}
// close connection
mysql_close($connection);
// return array of records
return $arr;
}
// include the class
require_once('Function.php');
// configure the cache
$options = array(
'cacheDir' => 'cache/',
'lifeTime' => 600
);
// create an instance of the Cache_Lite_Function object
$objCache = new Cache_Lite_Function($options);
// cache the call to the get_headlines() function
// if return value is present in cache,
// Cache_Lite_Function will fetch it from the cache
// else the function will be called and the result returned
$data = $objCache->call('get_headlines');
// iterate over return value and print elements
echo "<ul>";
foreach ($data as $h) {
echo "<li>$h";
}
echo "</ul>";
?>
Here’s what the output looks like:
Cache Cow
Now that you know the theory, let’s wrap this tutorial up with a real-world example of how useful the Cache_Lite class can be in improving performance on your Web site. As you may (or may not) know, Amazon.com recently opened up their product catalog, allowing developers to create Amazon-backed online stores with their new AWS service. This service allows develoeprs to interact with Amazon.com’s catalog and transaction system using SOAP, and it represents an important milestone in the progress of XML-based Web services.
The only problem? Querying the AWS system, retrieving XML-encoded data and formatting it for use on a Web page is a complex process, one which can affect the performance of your store substantially. In order to compensate for this, clever developers can consider caching some of the AWS data on their local systems, reducing the need to query Amazon.com for every single request and thereby improving performance on their site.
Here’s the script,
``` <?php
// include Cache_Lite_Output class require_once(“Output.php”);
// set an ID for this cache $id = “MyStore”;
// configure the cache $options = array( “cacheDir” => “cache/”, “lifeTime” => 1800 );
// instantiate the cache $objCache = new Cache_Lite_Output($options);
// does data exist in cache? // if so, print it // else regenerate it if (!$objCache->start($id)) {
// include SOAP class
include("nusoap.php");
// create a instance of the SOAP client object
$soapclient = new soapclient("http://soap.amazon.com/schemas2/AmazonWebServices.wsdl", true);
// uncomment the next line to see debug messages
// $soapclient->debug_flag = 1;
// create a proxy so that WSDL methods can be accessed directly
$proxy = $soapclient->getProxy();
// set up an array containing input parameters to be
// passed to the remote procedure
$params = array(
'browse_node' => 1000,
'page' => 1,
'mode' => 'books',
'tag' => 'melonfire-20',
'type' => 'lite',
'devtag' => 'YOUR_TOKEN_HERE',
'sort' => '+salesrank'
);
// invoke the method
$result = $proxy->BrowseNodeSearchRequest($params);
// check for errors
if ($result['faultstring']) {
echo $result['faultstring'];
} else {
// no errors?
$total = $result['TotalResults'];
$items = $result['Details'];
// format and display the results ?> `<html>` `<head>` `<basefont face="Verdana">`
</head>
<body bgcolor="white">
<p> <p>
<table width="100%" cellspacing="0" cellpadding="5">
<tr>
<td bgcolor="Navy">
<font color="white" size="-1">
<b>Bestsellers</b>
</font></td>
<td bgcolor="Navy" align="right">
<font color="white" size="-1">
<b><?php echo date("d M Y", mktime()); ?></b>
</font></td> </tr>
</table>
<p>
Browse the catalog below:
<p>
````<img border="0" src=>` | `````` / / |
```List Price: /Amazon.com Price: | |
``` Release Date: | |
`````Read more about this title on Amazon.com | |
` |
<font size="2">
Disclaimer: All product data on this page belongs to Amazon.com. No guarantees are made as to accuracy of prices and information. YMMV!
</font>
</body>
</html>
<?php } $objCache->end(); } ?>
and here’s what the output looks like:
I’m assuming here that you already know how to use AWS (in case you don’t, flip the page for some links to tutorials that will teach you the basics, and remember that you’ll need a free AWS developer token for the script above to work) and instead will focus on the caching aspects of the system.
Since this Web page contains intermingled HTML and PHP code, it’s convenient to use the Cache_Lite_Output class here. The first task, obviously, is to initialize and configure the cache; I’ve set the cache to refresh itself at 30-minute intervals, since this seems like a reasonable period of time. Next, I’ve used the start() method to see if any data already exists in the cache, and display it if so.
If no data exists, the NuSOAP PHP class is include()-d in the script, a SOAP client is instantiated, and a request is made to Amazon.com to obtain a list of bestsellers (node ID 100 in the AWS system). The response is then parsed and formatted into a Web page suitable for display; it is also simultaneously saved to the cache so that future requests for the same page can be served instantly, without having to query the AWS system each time. The end result: faster responses to user clicks, and an overall enhancement in user perception of your site’s performance.
Endgame
And that’s about all we have time for. In this article, I introduced you to the PEAR Cache_Lite class, a PHP class designed specifically to provide a robust caching mechanism for Web pages. I showed you how to configure the location and the lifetime of the cache, and demonstrated how to use the class to cache both static HTML content and dynamic PHP output. I also gave you a quick tour of the Cache_Lite class’ two variants, the Cache_Lite_Output and Cache_Lite_Function classes, illustrating how they could be used to cache blocks of output and function return values, respectively. Finally, I wrapped things up with a real-world example, showing you how a cache can make a substantial difference when dealing with complex, XML-based applications like Amazon Web Services.
In case you’d like to learn more about the topics discussed in this article, you should consider visiting the following links:
Documentation for the PEAR Cache_Lite class, at http://pear.php.net/package-info.php?pacid=99
phpCache, a lightweight alternative to Cache_Lite, at http://0x00.org/php/phpCache/, and a tutorial on how to use it, at http://www.sitepoint.com/article/685
A comprehensive resource on the topic of Web caching, at http://www.web-caching.com/
An article discussing the benefits of caching your Web content, at http://www.ariadne.ac.uk/issue4/caching/
Using Amazon Web Services With PHP And SOAP, at https://www.melonfire.com/archives/trog/article/using-amazon-web-services-with-php-and-soap-part-1
Until next time…stay healthy!
Note: Examples are illustrative only, and are not meant for a production environment. Melonfire provides no warranties or support for the source code described in this article. YMMV!
This article was first published on 06 Jun 2003.