XPointer Basics

Uncover the third and final piece of the XML linking jigsaw with XPointer.

The XML Jigsaw

If you’ve been paying attention to this column over the past few weeks, you’ll already be familiar with XPath and XLink, two important pieces of the XML jigsaw. XPath provides a standard way to access specific nodes (or sets of nodes) within an XML document, while XLink offers XML document authors the ability to link XML data together in a myriad of different ways.

While XLink is a very significant component of the effort to bring HTML-like linking capabilities to XML, it merely provides the constructs to link different documents together; it does not provide any mechanism for locating and referencing specific segments within a document. XPath, however, does - it provides the constructs and functions that make it possible to traverse an XML document tree and move from one node to another within it. Put the two together, and you have the ability to create links not merely between documents, but between specific nodes of documents.

That’s where XPointer comes in. A critical part of the effort to improve XML’s linking capabilities, it expands on XPath’s basic functionality, making it possible to address specific nodes or ranges within an XML document in simpler and more efficient ways.

Over the course of this tutorial, I’ll be taking a closer look at XPointer and its capabilities. Since XPointer relies heavily on XPath concepts, this tutorial also contains a brief discussion of how XPath expressions are constructed, with an explanation of location paths, axes and predicates.

Before we begin, a couple of important disclaimers:

First, XPointer is not yet a W3C recommendation; it’s still in the process of getting there. Consequently, the material here may become invalid when the final recommendation emerges; you should always refer to the most current standard or recommendation for up-to-date information. This tutorial is based on the W3C’s XPointer Candidate Recommendation dated 11 September 2001.

Second, since XPointer is not yet a standard, there aren’t that many tools out there that know how to process XPointers accurately. As of this writing, the W3C’s XPointer activity page listed three tools, each with its own limitations. Consequently, you will find it difficult to road-test much of the material in this tutorial, and will probably have to wait until the final recommendation emerges to actually begin using XPointers in your development activities.

All clear? Let’s get started!

The Need For XPointer

XPointer evolved as part of the work of the W3C’s XML Linking Working Group, which is also responsible for XLink. While developing XLink, the Working Group realized the necessity of a language to address specific nodes - elements or character data - or sets of nodes within an XML document.

This wasn’t the only requirement, however - the XML Linking Working Group also identified the following requirements of such a language:

XPointers should be based on “descriptive links” - links which identify locations by their context.
XPointers should be easy to read and understand; they should also be simple to interpret for browsers and Web servers.
It should be possible to create an XPointer for any specific point in a document. Similarly, it should also be possible to create an XPointer that identifies a specific range of data within an XML document.
XPointers should include the capability to restrict node selections on the basis of user-imposed constraints, in much the same way as XPath uses predicates.
Minor changes to an XML document - changes in whitespace, line breaks and formatting - should not “break” an XPointer.

XPath already existed to meet some of these requirements; however, it is primarily intended for use with XSLT rather than XML links, and therefore does not address some of the more specialized items in the list above.

First Steps

Let’s look at XPath quickly, and then proceed to XPointer proper.

XPath makes it possible to locate a node, or set of nodes, at any level of an XML document tree, using a thingamajig known as a “location path.”

A location path may be either an absolute path, which expresses a location with reference to the root node, or a relative path, which expresses a location with reference to the current node (since this location is always in context to something else, it is also referred to as the “context node”). Location paths are made up of a series of “location steps”, each identifying one level in the XPath tree and separated from each other by a forward slash (/).

A location step can be further broken down into three components: there’s an “axis”, which defines the relationship to use when selecting nodes; a “node test”, which specifies the types of nodes to select; and optional “predicates” to filter out unwanted nodes from the resulting collection (I’ll explain each of these in detail further down so that they become a little less frightening.)

The syntax of a location step is as follows

axis::node-test[predicates]

Revolving Around An Axis

Since the first component of a location step is the axis, let’s deal with that first. An axis defines the relationship between the current node and the nodes to be selected - whether, for example, they are children of the current node, siblings of the current node, or the parent of the current node.

The XPath specification defines the following axes:

self - the “self” axis refers to the context node itself;

parent - the “parent” axis selects the parent of the context node;

child - the “child” axis selects the children of the context node;

attribute - the “attribute” axis refers to the attributes of the context node;

These are the most commonly-used ones; however, it’s quite likely that you’ll also find a use for:

ancestor - the “ancestor” axis selects the parent, grandparent, great-grandparent and all other ancestors of the context node;

descendant - the opposite of the “ancestor” axis, this axis selects the children (and children’s children) of the context node;

ancestor-or-self - this variant of the “ancestor” axis selects all ancestors of the context node as well as the node itself;

descendant-or-self - this variant of the “descendant” axis selects all descendants of the context node as well as the node itself;

following-sibling and preceding-sibling - these two axes contain the nodes at the same level in the document tree as the context node. Depending on which axis you use, you will get a collection of siblings which are either after or before the context node, in document order.

following and preceding - the “following” axis selects all nodes within the document tree which follow (are placed after) the context node, while the “preceding” axis selects all nodes which come before the context node.

namespace - the “namespace” axis selects all the nodes in the same namespace as the context node;

Once the relationship to be established has been defined and an appropriate node collection obtained, a node test can be used to further filter the items in the collection. This node test is connected to the axis by a double colon (::) symbol.

Typically, you would select a node on the basis of its name without using a node test; in such a case, the node type is deduced from the axis part of the location step. If, on the other hand, you’d like to select nodes on the basis of type, XPath offers some pre-defined node tests; the text() node test selects text nodes, the comment() function selects comments, the processing-instruction() function selects PIs and the generic node() function selects any and all nodes.

Finally, in case the resulting collection needs to be further broken down, XPath allows you to add optional predicates to each location step, enclosed within square braces.

Proof Of The Pudding

By combining the axis and node test into a location step, and combining multiple location steps into a location path, it becomes possible to locate specific nodes with the document tree quite easily. Using the following XML sample, let’s consider some examples.

<?xml version="1.0"?>
<movie id="67" genre="sci-fi">
	<title>X-Men</title>
	<cast>Hugh Jackman, Patrick Stewart and Ian McKellen</cast>
	<director>Bryan Singer</director>
	<year>2000</year>
	<?play_trailer?>
</movie>

The path

/child::movie/child::cast/child::text()

references the text node

Hugh Jackman, Patrick Stewart and Ian McKellen

In order to make this a little easier to read (and write), XPath assumes a default axis of “child” if none is specified - which means that I could also write the above path as

/movie/cast/text()

The * character matches all child elements of the context node, while the @ prefix indicates that attributes, rather than elements, are to be matched. The path

/movie/*

would match all the children of the “movie” element, while the path

/movie/@*

would refer to all the attributes of the movie element. In case I need a specific attribute - say, “genre”, I could use the path

/movie/@genre

or the path

/movie/attribute::genre

both of which would reference the value

sci-fi

Finally, the path

/*

would reference the first element under the document root, which also happens to be the outermost element, while the path

//*

selects all the elements in the document.

So far as constructing basic XPointers go, the material above is more than sufficient for your needs. However, if you’d like to learn more about XPath’s capabilities, including its built-in functions and expressions, take a look at the XPath tutorial at INSERT XPATH LINK HERE, or at the XPath specification at http://www.w3.org/TR/xpath.html

A Fragmented View

So that takes care of XPath - but where does XPointer fit in?

According to the XPointer specification, XPointer serves as the basis for “fragment identifiers” in XLinks. You probably don’t know this, but you use fragment identifiers pretty frequently in your daily Web development activities, to link to specific segments of an HTML document - they’re the bits following the hash (#) symbol in a standard anchor tag. For example, in the hyperlink

<a href="http://www.paint.server/colors.html#tangerine">See what tangerine looks like</a>

the segment

#tangerine

would be a fragment identifier.

Just as fragment identifiers, when attached to an HTML hyperlink, direct the browser to a specific section of an HTML document, XPointers, when attached to an XLink, identify a specific location (or set of locations) within an XML document. This location could be a single character, a text string, a block of contiguous content, or a collection of elements; XPointer allows link authors to specify the location using XPath’s built-in functions and conditional tests.

Let’s take a quick example. Consider the following XML document, which I’ll call “movie.xml”:

<?xml version="1.0"?>
<movie id="67" genre="sci-fi">
	<title>X-Men</title>
	<cast>Hugh Jackman, Patrick Stewart and Ian McKellen</cast>
	<director>Bryan Singer</director>
	<year>2000</year>
	<?play_trailer?>
</movie>

Here’s a simple XLink, which uses an XPointer to link to the data for “X-Men” using the movie’s “id” attribute:

<item xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="movie.xml#xpointer(id('67'))">X-Men</item>

As you can see, XPointer, with the able assistance of XPath’s location paths, is used to create the fragment identifier that identifies a particular section of the XML data. Without XPointer, link authors would merely be able to reference XML documents; they would have no way of drilling down to specific points within those documents.

As the example above demonstrates, the syntax for an XPointer is usually

xpointer(expression)

For example,

xpointer(id('67')/title)

In order to offer link authors the flexibility to use “fallback links” - say, if the document structure or content changes - the XPointer specification allows for more than one XPointer to be specified simultaneously. For example,

xpointer(id('67')/cast) xpointer(id('67')/director)

The XPointers are evaluated in left-to-right order, and the first successful evaluation is used.

The XPointer specification also defines two additional “shortcut” forms. The first of these is the so-called “bare name” form, which has been included primarily to maintain compatibility with the existing HTML fragment identifier. An XPointer of this form merely specifies a name, and refers to any element within the document which has an “id” attribute matching that name. Which means that I could write the XPointer above

xpointer(id('67'))

and the two would still be equivalent.

An XPointer may also consist of a “child sequence” - essentially, a series of steps beginning with the document element and drilling down from there to the required resource. Note that the steps are defined in terms of integers, not element names - this XPointer references the first child element of the document element (the movie title.)

xpointer(/1/1)

Bare names and child sequences may be combined together if desired.

As with XPath, you can use predicates and conditional tests to filter out unwanted nodes, and select from a variety of built-in function to further classify the result nodeset. I’m not going to spend any more time on this, though - instead, I’d like to concentrate on XPointer’s extensions to XPath, which are primarily in the form of additional constructs to define “points” and “ranges”.

A Range Of Options

To the various node types defined in the XPath specification, XPointer adds two more: points and ranges.

A point is defined as the address of a specific location within an XML document. It is identified by two characteristics: a container node and an index number. The container node is the node which encloses the point, while the index number is an integer which indicates the relative position of the point among the children of the container node.

There are two types of points: node-points, which refer to XML elements, and character-points, which refer to the text contained within XML elements.

The index number within a point definition differs in meaning depending on whether the point is a node-point or a character-point. In the case of a node-point, the index number references a specific child node or nested XML element; in the case of a character-point, it references a particular character of the text string.

Points are defined with XPointer’s start-point() and end-point() functions, both of which accept a location path (or collection of location paths) as argument.

A range, defined as the area between two points, is created with the range() function, which returns a collection containing all the elements within the specified range.

An example might help to make this clearer. Consider the following XML document:

<?xml version="1.0"?>
<movie id="67" genre="sci-fi">
	<title>X-Men</title>
	<cast>Hugh Jackman, Patrick Stewart and Ian McKellen</cast>
	<director>Bryan Singer</director>
	<year>2000</year>
	<?play_trailer?>
</movie>

Now, the XPointer

xpointer(range(/))

would return a range covering the / element (the document element) and all those within it - in other words, a range covering the entire document.

The start and end points of this range would be accessible via the XPointers

xpointer(start-point(/))

xpointer(end-point(/))

and would point to the beginning and end of the document respectively.

In a similar manner, the XPointer

xpointer(range(//movie/title))

would identify the range

<title>X-Men</title>

while the XPointer

xpointer(start-point(//movie/director))

would point to the location immediately preceding the “director” element.

Asymmetrically Yours

The range() function is great for times when you need to define a range which begins and ends with the same element. However, for situations which require more asymmetric ranges, XPointer also offers the range-to() function, which allows link authors greater flexibility when defining ranges.

The range-to() function creates a range beginning with the context node and ending with the location set specified as an argument to the range-to() function. For example, the following XPointer

xpointer(id('67')/cast/range-to(id('67')/director))

defines a range beginning at the opening “cast” element and ending at the closing “director” element.

It’s also possible to identify points and ranges within character data - the string-range() function scans a specified location for a match to a user-specified string, and returns a range containing the result. So, the following XPointer

xpointer(string-range(//cast, "Patrick Stewart"))

would return a range enclosing the string “Patrick Stewart” from the “cast” element.

If the XML document above contained more than one “cast” element matching the specified string, the string-range() function would return multiple ranges, one for each match.

It’s possible to further constrain the range returned by specifying two additional arguments to the string-range() function - an offset to push forward the starting point of the range, and a length constraint to move back the ending point. So a modification of the XPointer above to

xpointer(string-range(//cast, "Patrick Stewart", 9, 4))

would return a range enclosing the substring “Stew”.

The string-range() function ignores embedded elements - which means that a change to the XML document above to read

<?xml version="1.0"?>
<movie id="67" genre="sci-fi">
	<title>X-Men</title>
	<cast>Hugh<space />Jackman,<space />Patrick<space />Stewart<space />and<space />Ian<space />McKellen</cast>
	<director>Bryan Singer</director>
	<year>2000</year>
	<?play_trailer?>
</movie>

would not affect the XPointer at all.

Since the string-range() function only returns string ranges, it follows that the start and end points of these returned ranges are always character-points, not node-points.

Linking Up

And that’s about it for the moment. As you can see, XPointer is a small but necessary bit of the XML Linking effort, and it will be interesting to see how it progresses over the next few months.

In the meanwhile, you might want to consider checking out the following links:

A list of XPointer requirements, at http://www.w3.org/TR/NOTE-xptr-req

The W3C’s current XPointer specification, at http://www.w3.org/TR/1998/WD-xptr

The W3C’s XPath specification, at http://www.w3.org/TR/xpath

Until next time….stay healthy!

Note: Examples are illustrative only, and are not meant for a production environment. YMMV!

This article was first published on 23 Nov 2001.