Thursday, July 19, 2012

XML schema and DTD validation using SAX API

SAX (Sequential Access Parser) is a XML parser, of which the java implementation is used to demonstrate here of how to validate a XML document using a DTD or a schema using this simple API.
This particular example uses the DefaultHandler class which is implemented mainly from the ContentHandler class among other classes. This serves as a concrete base in building our program as it's not required to implement the whole ContentHandler class in your program, but override the relevant methods to get the expected behavior.

Here's a very simple code snippet of a XMLSchema validator,

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.helpers.DefaultHandler;

public class SAXParserForXSD extends DefaultHandler {
    public static void main(String[] args) {        
        /*
         * Checking whether a filename is present
         * */
        if(args.length ==0){
            System.out.println("Enter a file name to be parsed.");
            System.exit(1);
        }
        else{
            /*
             * grab the filename
             * */
            String input = args[0];
            
            try{                
                SAXParser parser = null;
                SAXParserFactory factory = SAXParserFactory.newInstance();
                
                //enable namespace facility
                factory.setNamespaceAware(true);
                
                //set validating on
                factory.setValidating(true);
                parser = factory.newSAXParser();
                parser.setProperty(
                          "http://java.sun.com/xml/jaxp/properties/schemaLanguage",
                          "http://www.w3.org/2001/XMLSchema"
                         );
                System.out.println("Validation by XMLSchema.");
                
                SAXParserForXSD handler = new SAXParserForXSD();
                parser.parse(input, handler);
                
            }catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}

Here the most interesting code changes from a parsing program is depicted in red.
First you have to tell the parser that this document has to be validated and as XMLSchema supports the usage of namespaces in XML, you have to specify that in your code.(factory.setNamespaceAware(true);)
The key distinction between a schema validator and a DTD validator is the parser property that I've set here. It allows the parser to know that Schema is to be used.

In a DTD validator, you'd have to comment out the line
                factory.setNamespaceAware(true); and you have to remove the setproperty line as it's not required for DTD validation. 

You can create a program to run both validation methods using another argument to select the type of validation required(dtd or xsd using arg[1]).