XML Parsing With DOM and Xerces (part 2)

Use your knowledge of DOM processing with Xerces to construct simple Web applications based on Xerces, XML and JSP.

Rewind

The first part of this article discussed XML parsing with the Xerces DOM parser, explaining the basics of creating and traversing an XML DOM tree. It also illustrated the theory with practical (and not-so-practical) examples, using parser interfaces and callbacks to to construct some simple Java programs.

Now, though the first part of this article included some interesting exercises, your knowledge is still incomplete. This is because, with everyone and their grandma jumping on the Web, it's necessary to understand XML parsing in the content of the Web to consider yourself even marginally proficient with Xerces. And so, this concluding article focuses on the dynamic generation of Web pages from an XML file, demonstrating how Java, JSP, Xerces and XML can be combined to create simple Web applications. Take a look!

The Writing On The Wall

Let's begin by quickly revisiting the sample XML file used earlier:

<?xml version="1.0"?>
<inventory>
   <!-- time to lock and load -->
   <item>
      <id>758</id>
      <name>Rusty, jagged nails for nailgun</name>
      <supplier>NailBarn, Inc.</supplier>
      <cost currency="USD">2.99</cost>
      <quantity alert="500">10000</quantity>
   </item>
   <item>
      <id>6273</id>
      <name>Power pack for death ray</name>
      <supplier>QuakePower.domain.com</supplier>
      <cost currency="USD">9.99</cost>
      <quantity alert="20">10</quantity>
   </item>
</inventory>

This next example ports the dynamically-printed tree structure you saw in a previous example to work with a Web server.

import org.apache.xerces.parsers.DOMParser;
import org.xml.sax.SAXException;
import org.w3c.dom.*;
import java.io.*;

public class MyFourthDomApp {

   private Writer out;
   String Content = "";

   // a counter for keeping track of the "tabs"
   private int TabCounter = 0;

   // constructor
   public MyFourthDomApp (String xmlFile, Writer out) {

      this.out = out;

      //  create a Xerces DOM parser
      DOMParser parser = new DOMParser();

      //  parse the document and
      // access the root node with its children.
      try {
             parser.parse(xmlFile);
             Document document = parser.getDocument();
         NodeDetails(document);

            } catch (SAXException e) {
                  // something went wrong!
            } catch (IOException e) {
                  // something went wrong!
            }
      }

   // recursively traverse the document tree
   private void NodeDetails (Node node) {
      try {
         int type = node.getNodeType();

         if (type == Node.ELEMENT_NODE) {
            // if element
            FormatTree(TabCounter);
            out.write ("Element: " + node.getNodeName() + "<br>");
            if(node.hasAttributes()) {
               NamedNodeMap AttributesList = node.getAttributes();
               for(int j = 0; j < AttributesList.getLength(); j++) {
                  FormatTree(TabCounter);
                  out.write("Attribute: " +
AttributesList.item(j).getNodeName() + " = " +
AttributesList.item(j).getNodeValue() + "<br>");
               }
            }
         } else if (type == Node.TEXT_NODE) {
            // if character data
            Content = node.getNodeValue();
                    if (!Content.trim().equals("")){
                  FormatTree(TabCounter);
                  out.write ("Character Data: " + Content + "<br>");
            }
         } else if (type == Node.COMMENT_NODE) {
            // if comment
            Content = node.getNodeValue();
                   if (!Content.trim().equals("")){
                  FormatTree(TabCounter);
                  out.write("Comment: " + Content + "<br>");
            }
            // add code for other node types here if you like
         }

         NodeList children = node.getChildNodes();
         if (children != null) {
            for (int i=0; i< children.getLength(); i++) {
               TabCounter++;
               NodeDetails(children.item(i));
               TabCounter--;
            }
         }

      } catch (IOException e) {
         // something went wrong!
      }
   }

   // this formats the output as a tree
   private void FormatTree (int TabCounter) {
      try {
         for(int j = 1; j < TabCounter; j++) {
            out.write("&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;");
         }
      } catch (IOException e) {
         // something went wrong!
      }

   }

}

The most important difference between this example and the previous one is the introduction of a new Writer object, which makes it possible to redirect output to the browser instead of the standard output device.

private Writer out;

The constructor also needs to be modified to accept two parameters: the name of the XML file, and a reference to the Writer object.

// constructor
public MyFourthDomApp (String xmlFile, Writer out) {

   // snip!
}

Note that, since I'm using a Writer object, I also need to add some code to trap and resolve IOExceptions, if they occur. I've left that aside for a more detailed discussion a little later - however, you should be aware of this requirement and design your code appropriately.

You'll notice that I haven't made any major changes to the NodeDetails() function, other than the inclusion of HTML tags in the output - this is what will ultimately get sent to the browser. The only item left is to handle the connection between the Java class above and the Web server - which is where JSP comes in.

Here's the JSP page that brings it all together:

<%@ page language="java" import="java.io.IOException" %>
<html>
<head>
</head>
<body>
<%
   try {
        MyFourthDomApp myFourthExample = new
MyFourthDomApp("/www/xerces/WEB-INF/classes/inventory.xml",out);
   } catch (Exception e) {
         out.println("Something bad happened!" + e);
   }
%>

</body>
</html>

And here's what the output looks like:

For the moment, I'll ignore any errors that may occur when processing the JSP page, deferring this to a later discussion.

Stock Options

Now, how about something a little more useful? Let's suppose I want to display this same information in a neatly-formatted table, with those items that I'm low on highlighted in red. My preferred output would look something like this:

Here's the code to accomplish this:

import org.apache.xerces.parsers.DOMParser;
import org.xml.sax.SAXException;
import org.w3c.dom.*;
import java.io.*;

public class MyFifthDomApp {

   private Writer out;
   private String local = "";
   private Integer Alert, Quantity;

       // constructor
       public MyFifthDomApp (String xmlFile, Writer out) throws SAXException {
        this.out = out;

         // create a Xerces DOM parser
         DOMParser parser = new DOMParser();

         // parse the document
         // get to the root node's children
         // call a recursive function to process the tree
         try {
                parser.parse(xmlFile);
                Document document = parser.getDocument();
                Element RootElement = document.getDocumentElement();
                NodeList Children = RootElement.getChildNodes();
                printData(Children);
                out.flush();

                 } catch (IOException e) {
                        throw new SAXException(e);
                 }
       }

   private void printData (NodeList NodeCollection) throws SAXException {
         try {
            if (NodeCollection != null) {
               for (int i=0; i< NodeCollection.getLength(); i++) {
                  local = NodeCollection.item(i).getNodeName();
                  if(local.equals("item")) {

                     // "item" element starts a new row
                     out.write("<tr>");
                     getElementData(NodeCollection.item(i));
                     printData(NodeCollection.item(i).getChildNodes());
                     out.write("</tr>");
                  } else if( local.equals("name") ||
local.equals("supplier")) {

                     // create table cells within row
                     // align strings left
                     out.write("<td><p align=left><font face=Verdana size=2>");
                     getElementData(NodeCollection.item(i));
                     printData(NodeCollection.item(i).getChildNodes());
                     out.write("</font></p></td>");
                  } else if( local.equals("id") || local.equals("cost") || local.equals("quantity")) {

                     // create table cells within row
                     // align numbers right

out.write("<td><p align=right><font face=Verdana size=2>");
                     getElementData(NodeCollection.item(i));
                     printData(NodeCollection.item(i).getChildNodes());
                     out.write("</font></p></td>");
                  }
               }
            }
         } catch (IOException e) {
            throw new SAXException(e);
         }
   }

   private void getElementData(Node parentNode) throws SAXException {
         try {
            // get the type of the first child of the nodeset.
            int childType = parentNode.getFirstChild().getNodeType();

            // only proceed further if it is a text node
            if (childType == Node.TEXT_NODE) {

               // get the value stored at the node
               String Content = parentNode.getFirstChild().getNodeValue();

               // check for whitespace
               // proceed only if non-empty string
               if (!Content.trim().equals("")){

                  // check if node needs special handling (does the parent node has attributes?)
                  if(parentNode.hasAttributes()) {

                     // if parent has attributes, this one needs special handling
                     // first get the attributes
                     NamedNodeMap AttributesList = parentNode.getAttributes();

                     // iterate through the collection and get attribute details
                     for(int j = 0; j < AttributesList.getLength(); j++) {

                        // element-specific attribute handling
                        if ( parentNode.getNodeName().equals("quantity") && AttributesList.item(j).getNodeName().equals("alert") ) {

                           Quantity = new Integer(Content);
                           Alert = new Integer(AttributesList.item(j).getNodeValue());
                           // if quantity lower than expected, highlight in red
                           if(Quantity.intValue() < Alert.intValue()) {
                              out.write("<font color=\"#ff0000\">" + Quantity + "</font>");
                           } else {
                              out.write("<font color=\"#000000\">" + Quantity + "</font>");
                           }

                        } else     {

                           // element has attributes, but no special treatment required
                           out.write(Content);

                        }

                     }
                  } else {

                     // parent node has no attributes
                     // just display the text
                     out.write(Content);

                  }

               }
            }
         } catch (IOException e) {
            throw new SAXException(e);
         }
   }
}

Before you consider diving out of your window to escape the squiggles above, you might want to read the explanation - it's nowhere near as complicated as it looks.

For most of the elements, I'm simply displaying the content as is. The only deviation from this standard policy occurs with the "quantity" element,which has an additional "alert" attribute. This "alert" attribute specifies the minimum number of items that should be in stock of the corresponding item; if the quantity drops below this minimum level, an alert should be generated.

With this in mind, I've designed the class above around two main functions, one to traverse the document tree and the other to actually print the content it finds. And so, I have the printData() function, which recursively processes the document tree, and the getElementData() function, which retrieves the data enclosed with the XML elements.

Let's take a closer look at the different elements of the listing above:

// constructor
public MyFifthDomApp (String xmlFile, Writer out) throws SAXException {
      this.out = out;

      // create a Xerces DOM parser
      DOMParser parser = new DOMParser();

      //  parse the document
      //  get to the root node's children
      // call a recursive function to process the tree
      try {
         parser.parse(xmlFile);
         Document document = parser.getDocument();
         Element RootElement = document.getDocumentElement();
         NodeList Children = RootElement.getChildNodes();
         printData(Children);
         out.flush();
      } catch (IOException e) {
                  throw new SAXException(e);
   }
}

If you look at this closely, you'll see that I've been holding out on you a little in the previous examples. I never told you about the getDocumentElement() method, which lets you immediately access the document (outermost) element of the XML document. Well, now you know.

Once a reference to the document element has been obtained, the nest step is to obtain a list of its children, and hand these over to printData() for processing.

Data Overload

Let's take a closer look at the printData() function:

private void printData (NodeList NodeCollection) throws SAXException {
      try {
         if (NodeCollection != null) {
            for (int i=0; i< NodeCollection.getLength(); i++) {
               local = NodeCollection.item(i).getNodeName();
               if(local.equals("item")) {
                  // "item" element starts a new row
                  out.write("<tr>");
                  getElementData(NodeCollection.item(i));
                  printData(NodeCollection.item(i).getChildNodes());
                  out.write("</tr>");
               } else if( local.equals("name") || local.equals("supplier")) {
                  // create table cells within row
                  // align strings left
                  out.write("<td><p align=left><font face=Verdana size=2>");
                  getElementData(NodeCollection.item(i));
                  printData(NodeCollection.item(i).getChildNodes());
                  out.write("</font></p></td>");
               } else if( local.equals("id") || local.equals("cost") ||
local.equals("quantity")) {
                  // create table cells within row
                  // align numbers right
out.write("<td><p align=right><font face=Verdana size=2>");
                  getElementData(NodeCollection.item(i));
                  printData(NodeCollection.item(i).getChildNodes());
                  out.write("</font></p></td>");
               }
             }
         }
      } catch (IOException e) {
         throw new SAXException(e);
      }
}

The printData() function maps different XML elements to appropriate HTML markup.

The different "item" elements correspond to rows within the table. The details of each item - name, supplier, quantity et al - are formatted as cells within each row of the table. Since Xerces represents this entire document as a tree, getting to each individual node involves moving between the different levels of the tree until a leaf, or terminal node, is encountered - which is why printData() utilizes the same recursive technique demonstrated in previous examples.

As stated earlier, for most of the elements, I'm simply displaying the content as is. The only deviation from this standard policy occurs with the "quantity" element, where an alert needs to be generated if the actual quantity falls below the minimum level. That's where the getElementData() function comes in - it contains code to test the current quantity against the minimum quantity, and highlight the data in red if the test fails. This function is quite complex, but the inline comments provided should help make it clearer.

private void getElementData(Node parentNode) throws SAXException {
      try {
         // get the type of the first child of the nodeset.
         int childType = parentNode.getFirstChild().getNodeType();

         // only proceed further if it is a text node
         if (childType == Node.TEXT_NODE) {

            // get the value stored at the node
            String Content = parentNode.getFirstChild().getNodeValue();

            // check for whitespace
            // proceed only if non-empty string
            if (!Content.trim().equals("")){

               // check if node needs special handling (does the parent
node has attributes?)
               if(parentNode.hasAttributes()) {

                  // if parent has attributes, this one needs special handling
                  // first get the attributes
                  NamedNodeMap AttributesList = parentNode.getAttributes();

                  // iterate through the collection and get attribute details
                  for(int j = 0; j < AttributesList.getLength(); j++) {

                     // element-specific attribute handling
                     if ( parentNode.getNodeName().equals("quantity") && AttributesList.item(j).getNodeName().equals("alert") ) {

                     Quantity = new Integer(Content);
                        Alert = new Integer(AttributesList.item(j).getNodeValue());

                        // if quantity lower than expected, highlight in red
                        if(Quantity.intValue() < Alert.intValue()) {
                           out.write("<font color=\"#ff0000\">" + Quantity + "</font>");
                        } else {
                           out.write("<font color=\"#000000\">" + Quantity + "</font>");
                        }

                     } else     {

                        // element has attributes, but no special treatment required
                        out.write(Content);
                     }
                  }
               } else {

                  // parent node has no attributes
                  // just display the text
                  out.write(Content);

               }
            }
         }
      } catch (IOException e) {
         throw new SAXException(e);
      }
}

This function does a couple of different things. First, it checks to see if the node passed to it as an input parameter is a text node. If it is, the value of the node is checked, as is whether or not the parent node (the enclosing element) has any attributes. If no attributes are present and the node passes the whitespace test, the value is printed as is. In case the parent node does contain attributes, it must be either a "cost" or "quantity" element. If it's a "quantity" element, an additional test needs to be performed between the stated and alert quantities, with the resulting output highlighted on the basis of the test result.

With that out of the way, all that's left is to connect the class with the JSP page:

<%@ page language="java" import="java.io.IOException" %>
<html>
<head>
</head>
<body>
<h1><font face=Verdana>Inventory Management</font></h1>

<table width="55%" cellpadding="5" cellspacing="5" border="1">
<!-header row à
<tr>
<td><p align=right><b><font face=Verdana size=2> Code</font></b></p></td>
<td><b><font face=Verdana size=2>Name</font></b></td>
<td><b><font face=Verdana size=2>Supplier</font></b></td>
<td><p align=right><b><font face=Verdana size=2>Cost</font></b></p></td>
<td><p align=right><font face=Verdana size=2><b>Quantity</b></font></p></td>
</tr>
<%
try {
      MyFifthDomApp myFifthExample = new MyFifthDomApp("/www/xerces/WEB-INF/classes/inventory.xml",out);
} catch (Exception e) {
            out.println("<font face=\"verdana\" size=\"2\">The following error occurred: <br><b>" + e + "</b></font>");
}
%>
</table>
</body>
</html>

Here's what it looks like:

Oops!

If you take a close look at the previous example, you'll notice some error-handling routines built into it. It's instructive to examine that, and understand the reason for its inclusion.

You'll remember that I defined a Writer object at the top of my program; this Writer object provides a convenient way to output a character stream, either to a file or elsewhere. However, if the object does not initialize correctly, there is no way of communicating the error to the final JSP page.

The solution to the problem is simple: throw an exception. This exception can be captured by the JSP page and resolved appropriately.

Let's take another look at the constructor, this time focusing on the error-handling built into it:

// constructor
public MyFifthDomApp (String xmlFile, Writer out) throws SAXException {

      // some code here

          } catch (IOException e) {
                 throw new SAXException(e);
          }
}

Why is this necessary? Because if you don't do this, and your Writer object throws an error, there's no way of letting the JSP document know what happened, simply because the Writer object is the only available line of communication between the Java class and the JSP document. It's a little like that chicken-and-egg situation we all know and love...

Now, in the JSP page, it's possible to set up a basic error resolution mechanism to display the error on the screen. In order to test-drive it, try removing one of the opening "item" tags from the XML document used in this example and accessing the JSP page again through your browser.

It's also possible to raise a DOMException; however, take a look at what the Xerces documentation has to say about this:

"DOM operations only raise exceptions in "exceptional" circumstances, i.e., when an operation is impossible to perform (either for logical reasons, because data is lost, or because the implementation has become unstable). In general, DOM methods return specific error values in ordinary processing situations, such as out-of-bound errors when using NodeList." (from http://xml.apache.org/xerces-j/apiDocs/org/w3c/dom/DOMException.html)

Dear Diary

Let's try one more example, this one demonstrating an alternative technique of formatting an XML document into HTML Here's the XML file:

<?xml version="1.0"?>
<todo>
   <item>
      <priority>1</priority>
      <task>Figure out how Xerces works</task>
      <due>2001-12-12</due>
   </item>
   <item>
      <priority>2</priority>
      <task>Conquer the last Quake map</task>
      <due>2001-12-31</due>
   </item>
   <item>
      <priority>3</priority>
      <task>Buy a Ferrari</task>
      <due>2005-12-31</due>
   </item>
   <item>
      <priority>1</priority>
      <task>File tax return</task>
      <due>2002-03-31</due>
   </item>
   <item>
      <priority>3</priority>
      <task>Learn to cook</task>
      <due>2002-06-30</due>
   </item>
</todo>

Here's what it should look like:

As with the previous examples, this has two components: the source code for the Java class, and the JSP page which uses the class. Here's the class:

import org.apache.xerces.parsers.DOMParser;
import org.xml.sax.SAXException;
import org.w3c.dom.*;
import java.util.*;
import java.io.*;

public class MySixthDomApp {

   private Writer out;
   private String local = "";

   // define a hash table to store HTML markup
   // this hash table is used in the callback functions
   // for start, end and character elements ("priority" only)
   private Map StartElementHTML = new HashMap();
   private Map EndElementHTML = new HashMap();
   private Map PriorityHTML = new HashMap();

       // constructor
       public MySixthDomApp (String xmlFile, Writer out) throws SAXException {
         this.out = out;

         // initialize StartElementHTML Hashmap
         StartElementHTML.put("todo","<ol>\n");
         StartElementHTML.put("item","<li>");
         StartElementHTML.put("task","<b>");
         StartElementHTML.put("due","&nbsp;<i>(");

         // initialize EndElementHTML Hashmap
         EndElementHTML.put("todo","</ol>\n");
         EndElementHTML.put("item","</font></li>\n");
         EndElementHTML.put("task","</b>");
         EndElementHTML.put("due",")</i>");

         // initialize PriorityHTML Hashmap
         PriorityHTML.put("1","<font face=\"Verdana\" color=\"#ff0000\" size=\"2\">");
         PriorityHTML.put("2","<font face=\"Verdana\" color=\"#0000ff\" size=\"2\">");
         PriorityHTML.put("3","<font face=\"Verdana\" color=\"#000000\" size=\"2\">");

             //  create a Xerces DOM parser
             DOMParser parser = new DOMParser();

         // parse the document
         try {
            parser.parse(xmlFile);
                Document document = parser.getDocument();
            Element RootElement = document.getDocumentElement();
            NodeList Children = RootElement.getChildNodes();
            printData(Children);
            out.flush();
              } catch (IOException e) {
                     throw new SAXException(e);
              }
       }

   private void printData (NodeList NodeCollection) throws SAXException {
         try {
            // check if the node collection passed is NULL
            if (NodeCollection != null) {

               // iterate through the collection
               for (int i=0; i< NodeCollection.getLength(); i++) {

                  // store the name of the element in a string.
                  // this is used as a key into the HashMap.
                  local = NodeCollection.item(i).getNodeName();

                  if(NodeCollection.item(i).getNodeType() == Node.ELEMENT_NODE) {

                     // "priority" element needs special handling
                     // for everything else...
                     if(!((local.equals("priority")))) {

                        // get the HTML markup
                        String StartElement = StartElementHTML.get(local).toString();
                        String EndElement = EndElementHTML.get(local).toString();

                        // output starting HTML tags + content
                        out.write(StartElement + GetElementData(NodeCollection.item(i)));

                        // recursively traverse the tree
                        printData(NodeCollection.item(i).getChildNodes());

                        // output ending HTML tags
                        out.write(EndElement);

                     } else {
                        // handle the "priority" element differently.
                        // get the data for the "priority" element
                        String Priority = GetElementData(NodeCollection.item(i));

                        // use "priority" element value to get markup for the rest of the line
                        out.write((PriorityHTML.get(Priority)).toString());

                        // move on
                        printData(NodeCollection.item(i).getChildNodes());
                     }
                  }
               }
            }
         } catch (IOException e) {
               throw new SAXException(e);
         }
      }

      private String GetElementData(Node parentNode) {

         // get node type
         int childType = parentNode.getFirstChild().getNodeType();

         // return the node value if text node
         if (childType == Node.TEXT_NODE) {
            if(parentNode.getFirstChild().getNodeValue() != null) {
               return parentNode.getFirstChild().getNodeValue();
            }
         }
         // else return null
         return null;
      }
}

This is much cleaner and easier to read than the previous example, since it uses Java's HashMap object to store key-value pairs mapping HTML markup to XML markup. Three HashMap's have been used here: StartElementHTML, which stores the HTML tags for opening XML elements; EndElementHTML, which stores the HTML tags for closing XML elements; and PriorityHTML, which stores the HTML tags for the "priority" elements defined for each "item".

These HashMaps are populated with data in the class constructor:

// initialize StartElementHTML Hashmap
StartElementHTML.put("todo","<ol>\n");
StartElementHTML.put("item","<li>");
StartElementHTML.put("task","<b>");
StartElementHTML.put("due","&nbsp;<i>(");

// initialize EndElementHTML Hashmap
EndElementHTML.put("todo","</ol>\n");
EndElementHTML.put("item","</font></li>\n");
EndElementHTML.put("task","</b>");
EndElementHTML.put("due",")</i>");

// initialize PriorityHTML Hashmap
PriorityHTML.put("1","<font face=\"Verdana\" color=\"#ff0000\" size=\"2\">");
PriorityHTML.put("2","<font face=\"Verdana\" color=\"#0000ff\" size=\"2\">");
PriorityHTML.put("3","<font face=\"Verdana\" color=\"#000000\" size=\"2\">");

The rest of the constructor follows the pattern set previously - initialize parser, parse XML, get document element and first level children, and hand processing over to printData().

The printData() function has also been modified to incorporate the HashMaps described above. Every time the function encounters an XML element, it uses the element name as a key into the HashMaps, retrieves the corresponding HTML markup for that element, and prints it. If the element is a "priority" element, there's no real data to be printed; rather, the formatting of the entire to-do item changes to reflect the item priority.

private void printData (NodeList NodeCollection) throws SAXException {
      try {
         // check if the node collection passed is NULL
         if (NodeCollection != null) {

            // iterate through the collection
            for (int i=0; i< NodeCollection.getLength(); i++) {

               // store the name of the element in a string.
               // this is used as a key into the HashMap.
               local = NodeCollection.item(i).getNodeName();

               if(NodeCollection.item(i).getNodeType() == Node.ELEMENT_NODE) {

                  // "priority" element needs special handling
                  // for everything else...
                  if(!((local.equals("priority")))) {

                     // get the HTML markup
                     String StartElement = StartElementHTML.get(local).toString();
                     String EndElement = EndElementHTML.get(local).toString();

                     // output starting HTML tags + content
                     out.write(StartElement + GetElementData(NodeCollection.item(i)));

                     // recursively traverse the tree
                     printData(NodeCollection.item(i).getChildNodes());

                     // output ending HTML tags
                     out.write(EndElement);

                  } else {

                     // handle the "priority" element differently.
                     // get the data for the "priority" element
                     String Priority = GetElementData(NodeCollection.item(i));

                     // use "priority" element value to get markup for the rest of the line
                     out.write((PriorityHTML.get(Priority)).toString());

                     // move on
                     printData(NodeCollection.item(i).getChildNodes());
                  }
               }
            }
         }
      } catch (IOException e) {
         throw new SAXException(e);
      }
}

Here's the JSP code that uses the class above:

<%@ page language="java" import="java.io.IOException" %>
<html>
<head>
</head>
<body>
<h1><font face="Verdana">My Todo List</font></h1>
<%
   try {
      MySixthDomApp mySixthExample = new MySixthDomApp("/www/xerces/WEB-INF/classes/todo.xml ",out);
   } catch (Exception e) {
      out.println("<font face=\"verdana\" size=\"2\">Something bad just happened: <br><b>" + e + "</b></font>");
   }
%>
</body>
</html>

And here's what it all looks like:

Of Method And Madness

All the previous examples have discussed parsing an already-created XML document using the DOM. But how about flipping tradition on its head and dynamically creating an XML document using the DOM?

Sounds a little strange? Not at all. Consider a situation where data is stored in a database, yet needs to be converted into XML, either because the recipient system (or customer) needs it in that format, or simply to make it easier to process with XSLT. In such a situation, the ability to dynamically generate an XML document tree from a recordset would be very handy indeed.

This next example demonstrates, using a MySQL database table as the data source for an XML document tree. First, take a look at the MySQL table dump:

CREATE TABLE addressbook (
name varchar(255),
address text,
tel varchar(30),
email varchar(50),
url varchar(100)
);

INSERT INTO addressbook VALUES('John Doe', '94, Main Street, Nowhereville 13433, XY', '757 3838', johndoe@black.hole.com', 'http://www.unknown_and_unsung.com');

INSERT INTO addressbook VALUES('Sherlock Holmes', '220, Baker Street, London', '220 0220', 'sherlock@sherlock.com', 'http://www.sherlock.com');

So that's the data. Now, what about that Java code for dynamic tree generation?

import org.apache.xerces.parsers.DOMParser;
import org.xml.sax.SAXException;
import org.apache.xerces.dom.*;
import org.w3c.dom.*;
import java.io.*;
import java.sql.*;

public class MySeventhDomApp {

   private Writer out;
      String Content = "";

   // a counter for keeping track of the tabs in the document tree
   private int TabCounter = 0;

      // constructor
   public MySeventhDomApp (Writer out) throws SAXException {

        this.out = out;

      Document document = new DocumentImpl();
      Element rootNode = document.createElement("addressbook");

      try {
         Class.forName("org.gjt.mm.mysql.Driver").newInstance();
      }   catch (Exception e) {
            System.err.println("Unable to load driver.");
        }

      try {
         // connect to the database and run a query
         Connection myConn = DriverManager.getConnection("jdbc:mysql://localhost/db177?user=john&password=doe");
         Statement stmt = myConn.createStatement();
         String query = "SELECT * FROM addressbook";
           ResultSet myResultSet = stmt.executeQuery(query);

           if (myResultSet != null) {
            // iterate through resultset
              while (myResultSet.next()) {

                 Element userNode = document.createElement("user");
               Element userChildNode = document.createElement("name");
               Element userChildNode = document.createElement("name");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("name")));
               userNode.appendChild(userChildNode);

               userChildNode = document.createElement("address");
               userChildNode = document.createElement("address");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("address")));
               userNode.appendChild(userChildNode);

               userChildNode = document.createElement("tel");
               userChildNode = document.createElement("tel");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("tel")));
               userNode.appendChild(userChildNode);

               userChildNode = document.createElement("email");
               userChildNode = document.createElement("email");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("email")));
               userNode.appendChild(userChildNode);

               userChildNode = document.createElement("url");
               userChildNode = document.createElement("url");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("url")));
               userNode.appendChild(userChildNode);

               rootNode.appendChild( userNode);
                }
           }
      } catch (SQLException e) {
         throw new SAXException(e);
      } catch (DOMException e) {
         throw new SAXException(e);
      }
      document.appendChild( rootNode );
      NodeDetails(document);
      }

   // recursively traverse the document tree
   private void NodeDetails (Node node) throws SAXException {
         try {
            int type = node.getNodeType();

            // handle each node type

            if (type == Node.ELEMENT_NODE) {
            // if element
               FormatTree(TabCounter);
               out.write ("Element: " + node.getNodeName() + "<br>");
               if(node.hasAttributes()) {
                  NamedNodeMap AttributesList = node.getAttributes();
                  for(int j = 0; j < AttributesList.getLength(); j++) {
                     FormatTree(TabCounter);
                     out.write("Attribute: " + AttributesList.item(j).getNodeName() + " = " + AttributesList.item(j).getNodeValue() + "<br>");
                  }
               }
            } else if (type == Node.TEXT_NODE) {
               // if character data
               Content = node.getNodeValue();
                       if (!Content.trim().equals("")){
                  FormatTree(TabCounter);
                  out.write ("Character Data: " + Content + "<br>");
               }
            } else if (type == Node.COMMENT_NODE) {
               // if comment
               Content = node.getNodeValue();
                      if (!Content.trim().equals("")){
                  FormatTree(TabCounter);
                  out.write("Comment: " + Content + "<br>");
               }
               // add code for other node types here if required
            }

            NodeList children = node.getChildNodes();
            if (children != null) {
               for (int i=0; i< children.getLength(); i++) {
                  TabCounter++;
                  NodeDetails(children.item(i));
                  TabCounter--;
               }
            }
      } catch (IOException e) {
         throw new SAXException(e);
      }
   }

   // function to format the output as a tree
   private void FormatTree (int TabCounter) throws SAXException {
         try {
            for(int j = 1; j < TabCounter; j++) {
               out.write("&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;");
            }
         } catch (IOException e) {
            throw new SAXException(e);
         }
   }
}

Plug this into a JSP document,

<%@ page language="java" import="java.io.IOException" %>
<html>
<head>
</head>
<body>
<h1><font face="Verdana">Address book</font></h1>
<%
   try {
      MySeventhDomApp mySeventhExample = new MySeventhDomApp(out);
   } catch (Exception e) {
      out.println("<font face=\"verdana\" size=\"2\">Something bad just happened: <br><b>" + e + "</b></font>");
   }
%>
</body>
</html>

and here's what the result looks like:

The first thing to do here is import a couple of extra things - the java.sql. and the org.apache.xerces.dom. classes. The first is required to connect to a database, the second provides the methods and interfaces to construct a DOM tree in memory.

import org.apache.xerces.parsers.DOMParser;
import org.xml.sax.SAXException;
import org.apache.xerces.dom.*;
import org.w3c.dom.*;
import java.io.*;
import java.sql.*;

Next, the constructor, which does all the work:

// constructor
public MySeventhDomApp (Writer out) throws SAXException {

        this.out = out;

      Document document = new DocumentImpl();
      Element rootNode = document.createElement("addressbook");

      try {
         Class.forName("org.gjt.mm.mysql.Driver").newInstance();
      }   catch (Exception e) {
                  System.err.println("Unable to load driver.");
           }

      try {
         Connection myConn = DriverManager.getConnection("jdbc:mysql://localhost/db177?user=john&password=doe");
         Statement stmt = myConn.createStatement();
         String query = "SELECT * FROM addressbook";
         ResultSet myResultSet = stmt.executeQuery(query);

           if (myResultSet != null) {
              while (myResultSet.next()) {

               Element userNode = document.createElement("user");
               Element userChildNode = document.createElement("name");
               Element userChildNode = document.createElement("name");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("name")));
               userNode.appendChild(userChildNode);

               userChildNode = document.createElement("address");
               userChildNode = document.createElement("address");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("address")));
               userNode.appendChild(userChildNode);

               userChildNode = document.createElement("tel");
               userChildNode = document.createElement("tel");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("tel")));
               userNode.appendChild(userChildNode);

               userChildNode = document.createElement("email");
               userChildNode = document.createElement("email");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("email")));
               userNode.appendChild(userChildNode);

               userChildNode = document.createElement("url");
               userChildNode = document.createElement("url");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("url")));
               userNode.appendChild(userChildNode);

               rootNode.appendChild( userNode);
                }
           }
      } catch (SQLException e) {
         throw new SAXException(e);
      } catch (DOMException e) {
         throw new SAXException(e);
      }

      document.appendChild( rootNode );
      NodeDetails(document);
}

The DocumentImpl interface is the starting point in the process of constructing the DOM tree. This interface exposes the createElement() method, which you'll be seeing a lot of in the next few paragraphs. This createElement() method makes it possible to construct a new element node on the DOM tree.

Document document = new DocumentImpl();
Element rootNode = document.createElement("addressbook");

In the snippet above, it has been used to construct the outermost element of the tree, the <addressbook> element.

Next, the JDBC driver is used to connect to the MySQL database and retrieve a list of records from it.

try {
   Class.forName("org.gjt.mm.mysql.Driver").newInstance();
} catch (Exception e) {
   System.err.println("Unable to load driver.");
}

try {
   Connection myConn = DriverManager.getConnection("jdbc:mysql://localhost/db177?user=john&password=doe");
   Statement stmt = myConn.createStatement();
   String query = "select * from addressbook";
   ResultSet myResultSet = stmt.executeQuery(query);
   } catch (SQLException e) {
   throw new SAXException(e);
   } catch (DOMException e) {
   throw new SAXException(e);
}

Once all the records have been retrieved from the database, the next step is to iterate through the recordset and construct branches of the XML DOM tree to hold each data fragment.

if (myResultSet != null) {
   while (myResultSet.next()) {

      Element userNode = document.createElement("user");
      Element userChildNode = document.createElement("name");
      Element userChildNode = document.createElement("name");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("name")));
      userNode.appendChild(userChildNode);

      userChildNode = document.createElement("address");
      userChildNode = document.createElement("address");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("address")));
      userNode.appendChild(userChildNode);

      userChildNode = document.createElement("tel");
      userChildNode = document.createElement("tel");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("tel")));
      userNode.appendChild(userChildNode);

      userChildNode = document.createElement("email");
      userChildNode = document.createElement("email");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("email")));
      userNode.appendChild(userChildNode);

      userChildNode = document.createElement("url");
      userChildNode = document.createElement("url");

userChildNode.appendChild(document.createTextNode(myResultSet.getString("url")));
      userNode.appendChild(userChildNode);

      rootNode.appendChild( userNode);
       }
}

As the snippet above demonstrates, nodes can be created using the createElement() and createTextNode() methods, and attached to a parent node via the parent node's appendChild() method. In this manner, a complete DOM tree can be created to represent a well-formed XML document.

This dynamically-constructed DOM tree can now be processed further. You could write it to a file, transform it into something else with XSLT, or just display it to the user (which is what I've done, using the NodeDetails() function from a previous example).

Black Or White

Now, you can parse a document using either DOM or SAX, and achieve the same result. But which is better?

As with all complex questions, there's no simple answer.

The SAX approach is event-centric - as the parser travels through the document, it executes specific functions depending on what it finds. Additionally, the SAX approach is sequential - tags are parsed one after the other, in the sequence in which they appear. Both these features add to the speed of the parser; however, they also limit its flexibility in quickly accessing any node of the DOM tree.

As opposed to this, the DOM approach builds a complete tree of the document in memory, making it possible to easily move from one node to another (in a non-sequential manner). Since the parser has the additional overhead of maintaining the tree structure in memory, speed is an issue here; however, navigation between the various "branches" of the tree is easier. Since the approach is not dependent on events, developers need to use the exposed methods and attributes of the various DOM objects to process the XML data.

The DOM parser is a little slower, since it has to build a complete tree of the XML data, whereas the SAX parser is faster, since it's calling a function each time it encounters a specific tag type. You should experiment with both methods to see which one works better for you.

Link Out

And that's about it from me. I hope you enjoyed reading this article, and that it offered you some insight into the Xerces DOM parser's capabilities. If you'd like to learn more about the topic, you should check out the following resources:

The W3C's DOM page, at http://www.w3.org/DOM/

The Xerces home page, at http://xml.apache.org/xerces-j/

The Xerces API, at http://xml.apache.org/xerces-j/apiDocs/

XML Parsing With SAX And Xerces, at http://www.melonfire.com/community/columns/trog/article.php?id=106

A JavaScript view of the DOM, at http://www.melonfire.com/community/columns/trog/article.php?id=58

Until next time...stay healthy!

Note: All examples in this article have been tested with JDK 1.3.0, Apache 1.3.11, mod_jk 1.1.0, Xerces 1.4.4 and Tomcat 3.3. Examples are illustrative only, and are not meant for a production environment. YMMV!

This article was first published on15 Feb 2002.