Back in CSCI399 … Web usage But really, not for end

Back in CSCI399 …
• In CSCI399 introduced XML and XSLT in
context of displaying data in web-browsers.
XML
– XML
• Mechanism for representing data collections in
structured text files
eXtensible Markup Language
– XSLT
• Rules and rule interpreter
– Rule matching part, and action part
– Search for data elements in XML document
– Display data as specified in action part
1
2
Web usage
But really, not for end-user Web
• Well, all the browsers support XML/XSLT
• Use of XML in browser is only a minor
aspect.
• Primary role is the one IBM identified when
XML being initially introduced
PORTABLE DATA
3
4
Data
Going via XML is costly
• Conventional
• Anything!
–
–
–
–
– Configuration rules for programs
• Web.xml, ant scripts, …
– Chemical reactions
– SVG graphics files
– Product listings
– ….
Interrogate data-source using focused queries
Extract required data directly from result set
Process data
Generate outputs
• XML route
– Server-side:
• Interrogate data-source with a general query
• Create verbose XML text document containing all data in result set
– Client-side:
5
•
•
•
•
•
Retrieve XML text document from server
Re-analyze its contents
Extract required data
Process data
Generate output
6
1
Exchanging data
Data exchange
• Many needs for data exchange –
• Verbose text files, that must be repeatedly
(and expensively) re-parsed,
– Medical centre records treatment and invoices patient’s
insurance company
• Need standard data defining
–
–
–
–
–
• are independent of any specific application
technology
• and can convey data that can be saved in
any form of data store.
Patient
Insurance number
Treatment codes
Dates
…
– Astronomer records data on observations on spectral
properties of star cluster, needs to publish data so others
can analyze and compare with other clusters
• Need standard data defining spatial coordinates, time, spectrum
7
8
XML in CSCI398
XML files
1. A small segment on defining XML
documents, DTDs, schemas etc
2. Simple introduction to parsing of XML
with SAX, StAX and DOM
3. Generating valid XML to hold data
extracted from relational databases
• XML document contents
– Processing instructions
• First, one intended for XML parser identifying dialect of
XML in use (currently there is only one)
<?xml version="1.0" encoding="ISO-8859-1"?>
• There could be other <?xml related instructions, for
examples something specifying use of an XSL style
sheet
9
10
XML files
XML files
– Document type declarations
• Content
<!DOCTYPE web-app
PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN“
– Nesting rules for “well formed” document result
in a tree structure in the document content
"http://java.sun.com/j2ee/dtds/web-app_2_2.dtd">
• SYSTEM
– URI of a DTD in local file system
• PUBLIC
web-app
– Using a published DTD that is used by many organizations as a standard data
representation
» Name
» URI for where official version published
(actually, DTD or schema isn't necessarily at that URI – but URIs are just a
standard way of defining a unique name)
servlet
• ?both
– You have a local copy on disk of the official DTD
icon
11
servlet-name
servlet-class
param-name
init-param
param-value 12
2
<web-app>
<servlet>
<servlet-name>Xfiles</servlet-name>
<servlet-class>XServlet</servlet-class>
</servlet>
<security-constraint>
<web-resource-collection>
<web-resource-name>Xfiles</web-resource-name>
<url-pattern>/XXX/ViewControlled</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>Agent-class-00</role-name>
</auth-constraint>
…
</security-constraint>
…
</web-app>
<ejb-JAR>
<input-file>HelloJetaceIn.jar</input-file>
<output-file>Hello.jar</output-file>
<session-bean dname="Hello.ser">
<session-timeout>0</session-timeout>
<state-management>
STATELESS_SESSION
</state-management>
…
<transaction-attr value="TX_NOT_SUPPORTED"/>
<isolation-level value="READ_COMMITTED"/>
</session-bean>
</ejb-JAR>
13
14
Attributes or elements
XML-Content
<isolation-level value="READ_COMMITTED"/>
• <tag>…</tag>
• <tag values for required and optional attributes>
…
</tag>
• <tag values for required and optional attributes />
• Why not:
<isolation-level>
"READ_COMMITTED“
</isolation-level>
• Could use either.
– If it’s primarily system control data, then use an attribute
– If it’s something that user of XML document might be interested in
then use an element
– Often unclear, you DTD designer made some arbitrary choice
15
Validity
16
Don't always require validity
<tomcat-users>
<user name="tomcat" password="tomcat" roles="tomcat" />
<user name="role1" password="tomcat" roles="role1,worker" />
• But is it valid?
• Are those tag names defined?
• They may be properly nested, but is the nesting
meaningful
– A servlet-name tag shouldn’t appear immediately
below the web-app level, it should only appear within a
<servlet>…</servlet>
• Are attributes specified appropriately?
<user name="John" password="Nhoj" roles="worker,role1" />
…
…
<user name="Colin" password="Password"
roles="boss,manager,worker" />
</tomcat-users>
17
18
3
DTD
Elements
• <!ELEMENT TagOnly EMPTY>
• DTD file contents
– No body, just a tag to hold some attributes
• <!ELEMENT AnyThingGoes ANY>
– Element definitions
– Completely unrestricted, and therefore essentially useless
• <!ELEMENT [Element Name] [Element Definition/Type]>
<AnyThingGoes>text and or nested tags</AnyThingGoes>
– Attribute definitions
• <!ELEMENT TypicalInfoField (#PCDATA)>
• <!ATTLIST [Owner element]
[Attribute name] [type] [modifier]
…
>
– Defining an element whose value is “parsed character data” – really a
string (limited to single line)
<TypicalInfoField>text </TypicalInfoField>
• <!ELEMENT StructuralElement
([Nested Element], [Nested Element], … ) >
– Entity references
• <!ENTITY [Entity Name] “[Replacement/Identifier]”>
19
Structural Element
Interpreted examples
<!ELEMENT ejb-JAR
(input-file,
output-file,
(entity-bean | session-bean)+)>
• Rules for defining nested elements similar
to simple regular expressions
–
–
–
–
?
*
+
|
– Defines real structure of document by specifying how elements are
composed of nested sub-elements
– Structure list has capabilities of specifying optional elements,
20
repeating elements etc.
Tag for optional sub-element
Tag for (0..n) repeatable optional sub-element
Tag for (1..n) repeatable required sub-element
Used to specify alternatives
• “Body” for <ejb-JAR>…</ejb-JAR> must include
specification of input-file, output-file, and at least one and
possibly many entity-bean or session-bean definitions.
21
Interpreted examples
Typical Info field
<!ELEMENT web-app (icon?, display-name?,
description?, distributable?,context-param*,
servlet*, servlet-mapping*, sessionconfig?,mime-mapping*, welcome-file-list?,
error-page*, taglib*,resource-ref*, securityconstraint*, login-config?, security-role*,enventry*, ejb-ref*)>
• “Body” for <web-app>…</web-app>
–
–
–
–
–
22
<!ELEMENT servlet-name (#PCDATA)>
<!ELEMENT output-file (#PCDATA)>
• Familiar as things like
<servlet-name>A4Servlet</servlet-name>
Optional icon, display-name, distributable specifier
Possibly some context-parameters
Possibly some servlets
…
(it could even be empty and still be a valid web-app element!).
23
24
4
Attributes …
Attributes …
• From ejb’s DTD
• From ejb’s DTD
<!ELEMENT re-entrant EMPTY>
<!ATTLIST re-entrant value
<!ATTLIST entity-bean
dname CDATA #REQUIRED>
(true | false | TRUE | FALSE | True | False)
#REQUIRED>
• “re-entrant” tag that is required to be nested
inside all entity-bean definitions
• Just a tag – must have a true / false value given
• Entity-bean tag (which has lots of nested subelements) must include a value for its dname
attribute, value is any character data
<entity-bean dname=eg>
lots of stuff
</entity-bean>
<re-entrant value=false />
25
Attributes …
Entity references
• Seem to be mainly for inserting either small
pieces of fixed text, or contents of complete
files, into XML file.
• Example: DTD
• From ejb’s DTD
<!ELEMENT
<!ATTLIST
name
comment
26
env-prop (#PCDATA)>
env-prop
CDATA #REQUIRED
CDATA #IMPLIED>
• An env-prop tag (<env-prop …>stuff</env-prop>)
must have a name, and can have a comment
<!ELEMENT CNOTICE (#PCDATA)>
<!ENTITY MyShortNotice “Copyright by me”>
<!ENTITY MyLONGNotice PUBLIC “http://me.org/legal/copyright.xml” >
• Example: XML
<CNOTICE>&MyShortNotice</CNOTICE>
27
Complete DTD files
28
Schema
<?xml encoding="US-ASCII"?>
<!ELEMENT ejb-JAR (input-file, output-file,
(entity-bean | session-bean)+)>
<!ELEMENT input-file
(#PCDATA)>
<!ELEMENT output-file
(#PCDATA)>
<!ELEMENT entity-bean
(primary-key,
re-entrant?, container-managed*,
home-interface,
etc etc env-prop*,
dependency*)>
<!ATTLIST entity-bean dname CDATA #REQUIRED>
…
<!ATTLIST env-prop name CDATA #REQUIRED
comment CDATA #IMPLIED>
• XML should allow you to define any kind
of structured data
– But DTD rules (a form of structured data) are
represented in an entirely different format!
– Inconsistent!
29
• XML Schema (xml schema definition file
.xsd) created to allow rules for valid XML
application file to be defined using an XML
file
30
5
Schema
W3.org XMLSchema
• “An XML schema is a description of a type of XML
document, typically expressed in terms of constraints on
the structure and content of documents of that type, above
and beyond the basic syntactical constraints imposed by
XML itself.
These constraints are generally expressed using some
combination of grammatical rules governing the order of
elements, Boolean predicates that the content must satisfy,
data types governing the content of elements and
attributes, and more specialized rules such as uniqueness
and referential integrity constraints.”
Wikipedia
31
• Provides some basic data types
– Can use these to specify the data types for elements in
your application’s XML files
• String, base64Binary, hexBinary, integer, long, int, decimal,
double, boolean, duration, date, anyURI, language, …
• Specifies how to define the elements in your XML
document
– Some elements will just be “simple types” – some data
represented as one of the basic data types
– Others will be “complex types” – built up from simpler
elements using constraint rules
32
W3C example
Purchase Order
• Purchase order …
<?xml version="1.0"?>
<purchaseOrder orderDate="1999-10-20">
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<billTo country="US”>
…
</purchaseOrder>
• Name (as in the shipto element) – it’s a
string
• Zip (as in the shipto element) – it’s a
“decimal”!
• Order date (attribute in Purchase order) –
it’s a date.
• Schema will specify such restrictions
33
34
Purchase order - structure
Parts of the schema
• A purchase order should have
1.
2.
3.
4.
• Header
“Ship to” information
“bill to” information
An optional comment
And some number of “items”
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xml:lang="en">
• Those data should appear in that order.
• What’s an “item”?
Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved.
– It’s an instance of another complex type that will have to be
defined in terms of its simpler constituent elements
35
</xsd:documentation>
</xsd:annotation>
<xsd:element name="purchaseOrder"
type="PurchaseOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
36
6
What’s a purchase order type?
What’s a USAddress?
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
</xsd:complexType>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
37
38
What’s an “items”?
Occurrence constraints
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/> 39
•
•
•
•
minOccurs
maxOccurs
fixed
(default)
40
Extracting data
• Intended recipient shares DTD (or schema) and so knows
the form of the tree-structure of data and knows which
nodes contain the specific data required.
• So, can write a program that extracts the data from those
nodes.
• Two styles¶
Parsing XML
– XML document is read, as each node constructed it can be
processed – most discarded, those with required data are analyzed
– XML document is read and a complete tree-structure is built,
tree then traversed to locate and process nodes with required data
41
¶Well maybe 2.5 styles; the on-the-fly processing can now be done
in both "push" and "pull" modes.
42
7
Getting a Parser
• System configuration determines parser
– “Standard Java environment” has “.jar” files with a
parser implementation
(not necessarily same one for Java applications, servlets, J2EE!)
– You don’t specify the specific class, you use a factory
to create the parser using whatever is defined in
environment
Simple API for XML
43
44
Simple SAX
ContentHandler
• Sax parser
• void startDocument()
– somewhere in your application environment,
default one in standard JDK/JRE
– Called at the beginning of a document;
use as opportunity to create storage structures, initialize
data etc
• Parser scrambles through contents of XML
file, extracts elements, their attributes, and
any character data.
• As each thing found, it invokes methods of
a “ContentHandler” to deal with the data.
• void endDocument()
– Received at the end of a document;
tidy up, prepare final data for analysis.
45
46
ContentHandler : …Element,
characters
startElement(),
• startElement called when Parser has finished with
something like
<THING class=“Aclass” color=“red”>
• void startElement(String namespaceURI, String
localName, String qName, Attributes atts)
– Receive notification of the beginning of an element.
• void characters(char[] ch, int start, int length)
– Called routine can find that dealing with a “THING”
element, and can get at list of its attributes
– Receive notification of character data.
• void endElement(String namespaceURI, String localName,
String qName)
– Receive notification of the end of an element.
• These are the main functions
47
• Use this to:
– Set context for processing any character data associated
with THINGS (<THING>text</THING>)
(“currentElement”)
– Possibly create object that will collect data collected
from nested elements of other types.
48
8
characters()
characters() - problems
• characters called when Parser has text
associated with <Tag>PCDAT</Tag>
• Use this to:
• You may get multiple calls to characters() –
– various reasons:
• buffering when reading a file
• special characters in file
– Set the value of the current element
• Simple example that follows will ignore
problem,
will show fix at end
49
50
endElement ()
Example
• XML data file (at this stage it is simply a well-formed XML file,
there is no DTD to define validity)
• endElement called when Parser has
consumed the </THING> tag
• Use this to:
– List of books
• Book characterized by a rank attribute (based on sales)
• Book has sub-elements for
– Finish processing of current element (if you
–
–
–
–
–
–
–
created an object for the current element, then you may
want to add this to a data collection that is being built)
Title
Authors
Format
List price
Sales price
…
(some elements are optional)
51
52
XML file
<LIST>
<BOOK rank="1">
<ISBN code="0130894680"/>
<TITLE>Core Java 2, Volume 1: Fundamentals 5/e</TITLE>
<AUTHORS>Cay S. Horstmann, Gary Cornell</AUTHORS>
<FORMAT>(Paperback)</FORMAT>
<STARS>4.5</STARS>
<SHIPS>1</SHIPS>
<LISTPRICE>44.99</LISTPRICE>
<OURPRICE>31.49</OURPRICE>
<SAVE>13.50</SAVE>
</BOOK>
<BOOK>
53
…
Tasks
• Find:
– Cost of least expensive book
– Cost of most expensive book
– Average cost of a Java book
• Create a collection of “book” objects (and print its
contents); collection to hold all books with rating
4.0* or better
– “book” has
• Title, Authors, Cost, Stars
54
9
SAX based implementation
Prices.java
• Create a class that extends
org.xml.sax.helpers.DefaultHandler
– (DefaultHandler implements ContentHandler interface; defines
defaults for methods)
• Define functions to process elements and characters as
appropriate.
• Simple main program
– Create a SAX parser
– Link to Handler class
– Use parser to process data file
55
Prices.java – SAXy-stuff
import java.io.*;
import java.util.*;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
public class Prices extends DefaultHandler {
…
public static void main (String[] args) {
…
}
56
}
Prices.java
• startElement()
• A report stage (could have been made
“endDocument”)
– Record element type in member variable
currentElement
– Print report on book prices
• characters()
– If currentElement is “OURPRICE” have a price
to process
• Compare with min, max, add in sum, increment
count
57
Prices.java
public static void main (String[] args) {
if(args.length != 1) { …}
String fileName = args[0];
Prices pd = new Prices();
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = factory.newSAXParser();
saxParser.parse( new File(argv[0]), pd);
}
catch (SAXException e) { … }
catch (IOException e) { … }
pd.reportResults();
}
58
Prices.java
59
public class Prices extends DefaultHandler {
String currentElement;
float minPrice;
float maxPrice;
float averagePrice;
int count;
public Prices()
{
currentElement = null;
minPrice = Float.MAX_VALUE;
maxPrice = (float) 0.0;
averagePrice = (float) 0.0;
count = 0;
}
60
10
Prices.java
Prices.java
public void startElement (String uri, String local,
String qName, Attributes atts)
throws SAXException
{
currentElement=qName;
}
public void endElement(String namespaceURI,
String localName,
String qName)
throws SAXException
{
currentElement=null;
}
public void characters(char[] ch,
int start,
int length)
throws SAXException
{
if("OURPRICE".equals(currentElement))
processOurPrice(new String(ch, start, length));
}
<BOOK rank="1">
61
Prices.java
<ISBN code="0130894680"/>
…
<OURPRICE>31.49</OURPRICE>
…
</BOOK>
62
Prices.java
private void processOurPrice(String str)
{
try {
float cost = Float.parseFloat(str);
minPrice = (minPrice < cost) ? minPrice : cost;
maxPrice = (maxPrice > cost) ? maxPrice : cost;
averagePrice += cost;
count++;
}
catch(Exception e) { }
}
63
Characters problem …
public void reportResults()
{
if(count==0) {
System.out.println("There are no results to report");
}
else {
averagePrice = averagePrice/((float)count);
System.out.println("Minimum price $"+minPrice);
System.out.println("Maximum price $"+maxPrice);
System.out.println("Average price $“
+averagePrice);
}
}
64
characters()
• Have to make it a bit more elaborate to handle
cases where get multiple calls to characters()
• Handler class has private StringBuffer sb;
public void characters(char buf[],
int offset, int len) throws SAXException
{
String s = new String(buf, offset, len);
if(sb==null) sb = new StringBuffer();
sb.append(s);
}
• characters()
– accumulate string
• process at endElement()
65
66
11
endElement()
Books.java
• Second SAX exercise
public void endElement(String namespaceURI,
String sName, String qName )
throws SAXException
{
String content = null;
if(sb != null) {
content = sb.toString(); sb = null;
}
if(“OURPRICE".equals(qName)) {
processOurPrice(content);
}
else …
– build collection of books that satisfy certain
requirements
• As read each element – save its data
• When get to end of “book” element, check
the saved data to see if constraints satisfied;
if book satisfactory, add to a collection
67
68
Books.java
Books.java
• Define “book” class (~a struct)
• Define Books extends DefaultHandler
• Define Books extends DefaultHandler
– …
– characters
– startDocument
• If current element is AUTHOR, OURPRICE, STARS of
TITLE then set field of current book
• Allocate a Vector to store books
– startElement
• If it’s a BOOK, create a new book object as “current
book”
• Always, set currentElement
–…
– endElement
• If it’s a book, check “star” rating of current book object; if this
>=4.0 add to collection of books
– endDocument
• Print contents of book collection
69
Books.java
70
This code uses the potentially buggy simpler characters() processing
class book
import the-usual!
class book { … }
public class Books extends DefaultHandler {
…
}
71
class book {
private String title;
private String authors;
private float cost;
private float stars;
public void setTitle(String str) { title = str; }
public void setAuthors(String str) { authors = str; }
public void setCost(String cst) {
try {
cost = Float.parseFloat(cst);
}
catch(Exception e) { }
}
72
12
class book
class book
public String toString() {
StringBuffer buf = new StringBuffer();
buf.append(title);
buf.append("; ");
buf.append(authors);
buf.append("; $");
buf.append(Float.toString(cost));
buf.append("; ");
buf.append(Float.toString(stars));
buf.append("*");
return buf.toString();
}
public void setStars(String strs) {
try {
stars = Float.parseFloat(strs);
}
catch(Exception e) { }
}
public float getStars() { return stars; }
73
Books.java : main
74
}
Books.java : instance
public class Books extends DefaultHandler {
…
public static void main (String[] args) {
if(args.length != 1) { … }
String fileName = args[0];
Books bks = new Books();
// Get factory, create parser
…
try { parser.parse(fileName, bks); }
catch (SAXException e) { System.err.println (e); }
catch (IOException e) { System.err.println (e); }
}
}
public class Books extends DefaultHandler {
Vector
myBooks;
book
currentbook;
String
currentElement;
public Books()
{
currentElement = null;
currentbook = null;
myBooks = null;
}
75
Books.java :
start/end Document
76
Books.java :
startElement
public void startDocument()
throws SAXException
{
myBooks = new Vector();
}
public void endDocument()
throws SAXException
{
System.out.println("Good Java Books");
Enumeration e = myBooks.elements();
while(e.hasMoreElements())
System.out.println(e.nextElement());
}
public void startElement (String uri, String local,
String qName, Attributes atts)
throws SAXException
{
if("BOOK".equals(qName)) {
currentbook = new book();
}
currentElement=qName;
}
77
78
13
Books.java : characters
public void characters(char[] ch,
int start, int length) throws SAXException
{
String str = new String(ch, start, length);
if("AUTHORS".equals(currentElement))
currentbook.setAuthors(str);
else if("OURPRICE".equals(currentElement))
currentbook.setCost(str);
else if("STARS".equals(currentElement))
currentbook.setStars(str);
else if("TITLE".equals(currentElement))
currentbook.setTitle(str);
}
Books.java : endElement
public void endElement(String namespaceURI,
String localName,
String qName)
throws SAXException
{
if("BOOK".equals(qName)) {
if(currentbook.getStars()>=4.0)
myBooks.addElement(currentbook);
currentbook=null;
}
currentElement=null;
}
79
80
Bad notes in your SAX
Deployment
• A mal-formed XML file will result in an
exception – probably at the very end of the
document when the parser finds that an
opening tag wasn’t appropriately matched.
• Nothing special
– XML parsers are part of standard JDK, simply
have to use the ParserFactory interface.
81
82
Push or pull
• SAX
StAX
– Parser
• finds lexemes ("start element", characters etc)
• "pushes" them into a registered handler
An alternative light weight parser
– Your code – a kind of "event handler"
• StAX
– Your code "pulls" successive elements from the
parser
83
84
14
Push and pull – Sun's view
Advantage pull? Sun says …
• Streaming pull parsing refers to a programming model in
which a client application calls methods on an XML
parsing library when it needs to interact with an XML
infoset;
that is, the client only gets (pulls) XML data when it
explicitly asks for it.
• Streaming push parsing refers to a programming model in
which an XML parser sends (pushes) XML data to the
client as the parser encounters elements in an XML infoset;
that is, the parser sends the data whether or not the client is
ready to use it at that time.
•
Pull parsing provides several advantages over push
parsing when working with XML streams:
1. With pull parsing, the client controls the application thread, and
can call methods on the parser when needed. By contrast, with
push processing, the parser controls the application thread, and
the client can only accept invocations from the parser.
2. Pull parsing libraries can be much smaller and the client code to
interact with those libraries much simpler than with push libraries,
even for more complex documents.
3. Pull clients can read multiple documents at one time with a single
thread.
4. A StAX pull parser can filter XML documents such that elements
unnecessary to the client can be ignored, and it can support XML
views of non-XML data.
85
86
StAX and SAX
Comparison
• "StAX-enabled clients are generally easier to code than SAX clients.
While it can be argued that SAX parsers are marginally easier to write,
StAX parser code can be smaller and the code necessary for the client
to interact with the parser simpler.
• StAX is a bidirectional API, meaning that it can both read and write
XML documents. SAX is read only, so another API is needed if you
want to write XML documents.
• SAX is a push API, whereas StAX is pull."
Feature
StAX
SAX
API Type
Pull, streaming
Push, streaming
DOM
In memory tree
Ease of Use
High
Medium
High
XPath Capability
Not supported
Not supported
Supported
CPU and Memory
Efficiency
Good
Good
Varies
Forward Only
Supported
Supported
Not supported
Read XML
Supported
Supported
Supported
Write XML
Supported
Not supported
Supported
87
StAX
88
Book price example – StAX-iterator style
• Dual personality:
public class Main {
private static float minPrice;
…
private static void reportResults() { … }
private static void processStartElement(
StartElement se) { … }
private static void processOurPrice(String str) { .. }
private static void processCharacters(Characters ch)
{ … }
private static void processEndElement() { … }
private static void processEvent(XMLEvent xmle) { … }
public static void main(String[] args)
throws Exception
{ … }
}
– Cursor API
public interface XMLStreamReader {
public int next() throws XMLStreamException;
public boolean hasNext() throws XMLStreamException;
public String getText();
public String getLocalName();
public String getNamespaceURI();
// ... other methods not shown
}
– Iterator API
public interface XMLEventReader extends Iterator {
public XMLEvent nextEvent() throws XMLStreamException;
public boolean hasNext();
public XMLEvent peek() throws XMLStreamException;
...
}
89
90
15
Main – get StAX parser, use it as iterator
public static void main(String[] args)
throws Exception {
String filename = args[0];
XMLInputFactory factory =
XMLInputFactory.newInstance();
factory.setProperty(
XMLInputFactory.IS_COALESCING,true);
XMLEventReader r = factory.createXMLEventReader(
filename, new FileInputStream(filename));
while (r.hasNext()) {
XMLEvent e = r.nextEvent();
processEvent(e);
}
reportResults();
}
processEvent – what is it?
private static void processEvent(XMLEvent xmle) {
if (xmle.isStartElement())
processStartElement(xmle.asStartElement());
else if (xmle.isCharacters())
processCharacters(xmle.asCharacters());
else if(xmle.isEndElement())
processEndElement();
}
91
Start and End elements
92
processCharacters
private static void processStartElement(
StartElement se) {
QName qname = se.getName();
String str = qname.getLocalPart();
currentElement = str;
private static void processCharacters(
Characters ch) {
if (ch.isWhiteSpace())
return;
String str = ch.getData();
if("OURPRICE".equals(currentElement))
processOurPrice(str);
}
}
private static void processEndElement() {
currentElement = null;
}
93
94
processOurPrice
private static void processOurPrice(String str) {
try {
float cost = Float.parseFloat(str);
minPrice = (minPrice < cost) ?
minPrice : cost;
maxPrice = (maxPrice > cost) ?
maxPrice : cost;
averagePrice += cost;
count++;
}
catch(Exception e) { … }
}
95
DOM
Document Object Model
96
16
DOM
DOM's role
• Builds a tree-representation of the contents of an
XML document
• DOM started out as an idea for trying to organize
data in web-pages so that can provide more than
just browsing.
– Nodes (many types)
– Links to parents/children/siblings
– Data to be structured and described by tags with
defined semantics.
– Search programs would then work more effectively as
they could "understand" the contents of a document.
• Supports mechanisms for
• DOM evolved to become a method of representing
and manipulating structured data – XML data.
–
–
–
–
Traversing the tree
Searching for specific nodes
Processing groups of nodes
Tree surgery – removing, rearranging, and adding
nodes
97
98
SAX or DOM
DOM
• DOM more costly
• SAX
– One pass through XML
– Reads entire XML file into memory structures
– Memory structures that are not economic (lots
of little objects and links)
– Lots more processing and checking
• Select small subset of data
• Possibly build own data structure to hold required subset of
data
• DOM
– One pass through XML to build "Document" (the
DOM-tree)
– Any number of passes through Document to perform
different tasks
– Structure built automatically to hold all data
• But
– DOM Document is the complete data set
– You can do more elaborate processing
99
Programming …
100
Java 1.5
• Program
• Life is a bit more complex with Java 1.5
– Instantiate DOM parser via factory
– Tell parser to process XML file
• Catch any exceptions and quit
– Ask parser for its "Document" (DOM-tree)
– Write functions that traverse the DOM-tree performing
whatever data analysis you need.
– Should parser check "validity"?
• Your choice. DOM parser doesn't need a reference DTD. But
if you have a DTD then you can ask the DOM to be strict and
enforce validity. Uses a lot of the DOM's time and therefore
costly.
101
– More elaborate mechanisms for getting parser
and hence document
– Parsing and validation now separated
• Parse the input to get a DOM document
• Create a validator
• Ask validator to validate document
– requires XSL schema – DTD not acceptable (sigh,
those schema are hard to write and hard to read)
102
17
DOMProgram
DOMProgram.java : 1
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import java.io.*;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.ParserConfigurationException;
public class BasicDom {
private void traverse (Node node, int indent) {
…
}
public static void main (String[] args) {
if(args.length<1) { … }
…
}
}
• Parse a document Walk the resulting tree
– When encounter "ELEMENT" nodes, print out node
name, attributes
– When get a TEXT_NODE, print contents
• DOMProgram
– Essentially procedural
103
Getting parser
104
Parsing, then invoke tree traversal
DocumentBuilderFactory factory = null;
DocumentBuilder parser = null;
try {
factory = DocumentBuilderFactory.newInstance();
parser = factory.newDocumentBuilder();
}
catch(ParserConfigurationException pce) {
System.out.println(pce);
System.exit(1);
}
Document doc;
try {
doc = parser.parse(args[0]);
}
catch (SAXException sae) { … }
catch (IOException e) { … }
Node n = document.getFirstChild();
traverse (n, 0);
105
Tree walking
private void traverse (Node node, int indent)
{
int type = node.getNodeType();
if (type == Node.ELEMENT_NODE) {… }
if(type == Node.TEXT_NODE) { … }
NodeList children = node.getChildNodes();
if (children != null) {
for (int i=0; i< children.getLength(); i++)
traverse (children.item(i), indent+1);
}
}
107
106
if (type == Node.ELEMENT_NODE) {
for(int i=0;i<indent;i++) System.out.print("\t");
System.out.print(node.getNodeName());
if(node.hasAttributes()) {
System.out.print("(");
NamedNodeMap nnm = node.getAttributes();
for(int j=0;j<nnm.getLength();j++) {
Node item = nnm.item(j);
System.out.print("\t" + item.getNodeName());
Node firstchild = item.getFirstChild();
System.out.print("\t" +
firstchild.getNodeValue());
}
System.out.print(")");
108
}
}
18
output
if(type == Node.TEXT_NODE) {
String content = node.getNodeValue();
System.out.println(" " + content);
}
LIST
BOOK(
rank
1)
ISBN(
code
0130894680)
TITLE Core Java 2, Volume 1: Fundamentals 5/e
AUTHORS Cay S. Horstmann, Gary Cornell
FORMAT (Paperback)
STARS 4.5
109
SHIPS 1
110
Prices & Books
DOM style
Why so many newlines?
• Build the Document DOM-tree
• Use standard traversal functions to find the nodes
• Parser finds whitespace in input file and
creates TEXT_NODE representing white
space:
• Code to build the Document essentially as shown
previously.
• Once built, iterate through it finding prices
associated with book nodes
• Then again to print details of nodes with good star
ratings.
<STARS>4.5</STARS>□◊
□□□□□□□□<SHIPS>
111
Validation
112
Schemas – simple elements
<?xml version="1.0" ?>
• This time validate the XML document
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="TITLE" type="xs:string" />
<xs:element name="AUTHORS" type="xs:string" />
<xs:element name="FORMAT" type="xs:string" />
<xs:element name="SHIPS" type="xs:string" />
<xs:element name="STARS" type="xs:string" />
<xs:element name="LISTPRICE" type="xs:string" />
<xs:element name="OURPRICE" type="xs:string" />
<xs:element name="SAVE" type="xs:string" />
<xs:element name="USEDPRICE" type="xs:string" />
• But that means have to provide a schema
file.
• Example on schemas
http://www.xml.com/lpt/a/2000/11/29/schemas/part1.html
113
114
19
Schemas – element with attributes
<!-- ISBN type, could do better define restriction that code
must be string of digits -->
<xs:element name="ISBN">
<xs:complexType>
<xs:attribute name="code" type="xs:string" />
</xs:complexType>
</xs:element>
What I’m really saying is ISBN has an attribute but no content –
<ISBN code="0130894680"/>
115
Schemas – structured element
<xs:element name="BOOK">
<xs:complexType>
<!-- Not certain that data were always defined in same order! so try "all" rather than sequence -->
<xs:all>
<xs:element ref="ISBN" />
<xs:element ref="TITLE" />
<xs:element ref="AUTHORS" />
<xs:element ref="FORMAT" />
<xs:element ref="STARS" minOccurs="0" maxOccurs="1" />
<xs:element ref="SHIPS" />
<xs:element ref="LISTPRICE" minOccurs="0" maxOccurs="1" />
<xs:element ref="OURPRICE" />
<xs:element ref="SAVE" minOccurs="0" maxOccurs="1" />
<xs:element ref="USEDPRICE" minOccurs="0" maxOccurs="1" />
</xs:all>
<xs:attribute name="rank" type="xs:string" />
</xs:complexType>
</xs:element>
116
Schemas – structured element
Setup … get schema etc
import javax.xml.XMLConstants;
import javax.xml.validation.*;
import javax.xml.transform.dom.DOMSource;
class … {
… process (String xmlfile, String xsdfile) {
SchemaFactory schFactory = SchemaFactory.newInstance
(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = null;
try {
schema = schFactory.newSchema(new File(xsdFile));
}
catch(SAXException saxe) { … }
Validator validator = schema.newValidator();
<xs:element name="LIST">
<xs:complexType>
<xs:sequence>
<xs:element ref="BOOK" minOccurs="1"
maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
117
118
Parse XML file
DocumentBuilderFactory factory = null;
DocumentBuilder parser = null;
try {
factory = DocumentBuilderFactory.newInstance();
parser = factory.newDocumentBuilder();
}
catch(ParserConfigurationException pce) { … }
Document document = null;
try {
document = parser.parse(xmlFile);
}
catch (SAXException sae) { … }
catch (IOException ioe) { … }
Validate document
try {
validator.validate(new DOMSource(document));
}
catch (SAXException sae2) { … }
catch (IOException ioe2) { … }
119
120
20
Walk the tree – gather price data
NodeList bookNodes =
document.getElementsByTagName("BOOK");
float minprice = Float.MAX_VALUE;
float maxprice = (float) 0.0;
int count = 0;
float average = (float) 0.0;
for(int i=0;i<bookNodes.getLength();i++) {
…
}
System.out.println("Price report - ");
System.out.println("Minimum price $" + minprice);
…
Walk the tree – gather price data
for(int i=0;i<bookNodes.getLength();i++) {
Element bNode = (Element) bookNodes.item(i);
Node priceNode = bNode.getElementsByTagName(
"OURPRICE").item(0);
String price = priceNode.getFirstChild().getNodeValue();
float aprice = Float.parseFloat(price);
minprice = (minprice < aprice) ? minprice : aprice;
maxprice = (maxprice > aprice) ? maxprice : aprice;
average += aprice;
count++;
}
121
122
Java books
Nodes
System.out.println("The popular Java books:");
for(int i=0;i<bookNodes.getLength();i++) {
Element bNode = (Element) bookNodes.item(i);
NodeList temp = bNode.getElementsByTagName("STARS");
if(temp.getLength()==0)continue; // A few books don't have star ratings
Node starsNode = temp.item(0);
String starCode = starsNode.getFirstChild().getNodeValue();
float stars = Float.parseFloat(starCode);
if(stars<4.0) continue;
Node titleNode = bNode.getElementsByTagName("TITLE").item(0);
System.out.print(titleNode.getFirstChild().getNodeValue());
…
• Lots of different kinds of nodes
– Document
– Attribute
– Element
– Entity
– TEXT
–…
123
Trees
Processing functions
• Tree structure – large and complex!
• The tree for Books XML file
• Lots of functions (repertoire varies with
Node type)
– Root element “LIST”
• Children include BOOK ELEMENT-NODES and other junk
(maybe some nodes representing “ignorable white space
characters” found between each book)
– BOOK ELEMENT
• Children
elements
124
- Attributes, and ELEMENT nodes for nested
– Create iterator to walk through nodes
– Create treewalker to explore entire subtree
– Find element by tag name
–…
– Text nodes with content
125
126
21
Creating a new tree & output of
XML
Creating a new tree & output of
XML
• Sometimes useful to create new XML
document with own data
• An example at
http://www.javazoom.net/services/newsletter/xmlgeneration.html
– create DOM tree in memory,
•
•
•
•
create root node
add children nodes (and child nodes of child nodes)
add attribute nodes to other nodes
add text nodes for data
– get a “transformer”
– use transformer to output structure as XML text
• Suppose wanted a “good Java books” XML output
<goodjava>
<goodie>
<auth>Hu Mee</auth>
<title>Wizard Web Sites with JSP</title>
127
128
Creating a new DOM tree
Output as XML
• Create an empty document –
TransformerFactory tFactory =
TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
Document newdoc = parser.newDocument();
• Create a root element for new doc
Element root = newdoc.createElement(“goodjava");
• Add another child
Element goodbook = newdoc.createElement(“goodie");
Element name = newdoc.createElement(“auth");
name.appendChild(newdoc.createTextNode(“Hu Mee”));
goodbook.appendChild(name);
Element title = newdoc.createElement(“title");
title.appendChild(newdoc.createTextNode(“Wizard Web Sites with JSP”));
goodbook.appendChild(title);
root.appendChild(goodbook );
transformer.setOutputProperty(OutputKeys.INDENT,"yes");
DOMSource source = new DOMSource(d);
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
129
130
Why XML?
• Why?
</XML>
– Safe data exchange and processing
• Avoids intimate contacts between data
consumer and producer (and with all the viruses
around these days, you really do want the safety factor).
131
132
22
XML
• Producer
– Dump data in a file with format defined by a
DTD
• Consumer
– Display data as desired
– Extract specific data needed
– Possibly, generate new data – in another XML
document defined by another DTD – and return
to original data source.
133
23