Back in CSCI399 … • In CSCI399 introduced XML and XSLT in context of displaying data in web-browsers. XML – XML • Mechanism for representing data collections in structured text files eXtensible Markup Language – XSLT • Rules and rule interpreter – Rule matching part, and action part – Search for data elements in XML document – Display data as specified in action part 1 2 Web usage But really, not for end-user Web • Well, all the browsers support XML/XSLT • Use of XML in browser is only a minor aspect. • Primary role is the one IBM identified when XML being initially introduced PORTABLE DATA 3 4 Data Going via XML is costly • Conventional • Anything! – – – – – Configuration rules for programs • Web.xml, ant scripts, … – Chemical reactions – SVG graphics files – Product listings – …. Interrogate data-source using focused queries Extract required data directly from result set Process data Generate outputs • XML route – Server-side: • Interrogate data-source with a general query • Create verbose XML text document containing all data in result set – Client-side: 5 • • • • • Retrieve XML text document from server Re-analyze its contents Extract required data Process data Generate output 6 1 Exchanging data Data exchange • Many needs for data exchange – • Verbose text files, that must be repeatedly (and expensively) re-parsed, – Medical centre records treatment and invoices patient’s insurance company • Need standard data defining – – – – – • are independent of any specific application technology • and can convey data that can be saved in any form of data store. Patient Insurance number Treatment codes Dates … – Astronomer records data on observations on spectral properties of star cluster, needs to publish data so others can analyze and compare with other clusters • Need standard data defining spatial coordinates, time, spectrum 7 8 XML in CSCI398 XML files 1. A small segment on defining XML documents, DTDs, schemas etc 2. Simple introduction to parsing of XML with SAX, StAX and DOM 3. Generating valid XML to hold data extracted from relational databases • XML document contents – Processing instructions • First, one intended for XML parser identifying dialect of XML in use (currently there is only one) <?xml version="1.0" encoding="ISO-8859-1"?> • There could be other <?xml related instructions, for examples something specifying use of an XSL style sheet 9 10 XML files XML files – Document type declarations • Content <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN“ – Nesting rules for “well formed” document result in a tree structure in the document content "http://java.sun.com/j2ee/dtds/web-app_2_2.dtd"> • SYSTEM – URI of a DTD in local file system • PUBLIC web-app – Using a published DTD that is used by many organizations as a standard data representation » Name » URI for where official version published (actually, DTD or schema isn't necessarily at that URI – but URIs are just a standard way of defining a unique name) servlet • ?both – You have a local copy on disk of the official DTD icon 11 servlet-name servlet-class param-name init-param param-value 12 2 <web-app> <servlet> <servlet-name>Xfiles</servlet-name> <servlet-class>XServlet</servlet-class> </servlet> <security-constraint> <web-resource-collection> <web-resource-name>Xfiles</web-resource-name> <url-pattern>/XXX/ViewControlled</url-pattern> </web-resource-collection> <auth-constraint> <role-name>Agent-class-00</role-name> </auth-constraint> … </security-constraint> … </web-app> <ejb-JAR> <input-file>HelloJetaceIn.jar</input-file> <output-file>Hello.jar</output-file> <session-bean dname="Hello.ser"> <session-timeout>0</session-timeout> <state-management> STATELESS_SESSION </state-management> … <transaction-attr value="TX_NOT_SUPPORTED"/> <isolation-level value="READ_COMMITTED"/> </session-bean> </ejb-JAR> 13 14 Attributes or elements XML-Content <isolation-level value="READ_COMMITTED"/> • <tag>…</tag> • <tag values for required and optional attributes> … </tag> • <tag values for required and optional attributes /> • Why not: <isolation-level> "READ_COMMITTED“ </isolation-level> • Could use either. – If it’s primarily system control data, then use an attribute – If it’s something that user of XML document might be interested in then use an element – Often unclear, you DTD designer made some arbitrary choice 15 Validity 16 Don't always require validity <tomcat-users> <user name="tomcat" password="tomcat" roles="tomcat" /> <user name="role1" password="tomcat" roles="role1,worker" /> • But is it valid? • Are those tag names defined? • They may be properly nested, but is the nesting meaningful – A servlet-name tag shouldn’t appear immediately below the web-app level, it should only appear within a <servlet>…</servlet> • Are attributes specified appropriately? <user name="John" password="Nhoj" roles="worker,role1" /> … … <user name="Colin" password="Password" roles="boss,manager,worker" /> </tomcat-users> 17 18 3 DTD Elements • <!ELEMENT TagOnly EMPTY> • DTD file contents – No body, just a tag to hold some attributes • <!ELEMENT AnyThingGoes ANY> – Element definitions – Completely unrestricted, and therefore essentially useless • <!ELEMENT [Element Name] [Element Definition/Type]> <AnyThingGoes>text and or nested tags</AnyThingGoes> – Attribute definitions • <!ELEMENT TypicalInfoField (#PCDATA)> • <!ATTLIST [Owner element] [Attribute name] [type] [modifier] … > – Defining an element whose value is “parsed character data” – really a string (limited to single line) <TypicalInfoField>text </TypicalInfoField> • <!ELEMENT StructuralElement ([Nested Element], [Nested Element], … ) > – Entity references • <!ENTITY [Entity Name] “[Replacement/Identifier]”> 19 Structural Element Interpreted examples <!ELEMENT ejb-JAR (input-file, output-file, (entity-bean | session-bean)+)> • Rules for defining nested elements similar to simple regular expressions – – – – ? * + | – Defines real structure of document by specifying how elements are composed of nested sub-elements – Structure list has capabilities of specifying optional elements, 20 repeating elements etc. Tag for optional sub-element Tag for (0..n) repeatable optional sub-element Tag for (1..n) repeatable required sub-element Used to specify alternatives • “Body” for <ejb-JAR>…</ejb-JAR> must include specification of input-file, output-file, and at least one and possibly many entity-bean or session-bean definitions. 21 Interpreted examples Typical Info field <!ELEMENT web-app (icon?, display-name?, description?, distributable?,context-param*, servlet*, servlet-mapping*, sessionconfig?,mime-mapping*, welcome-file-list?, error-page*, taglib*,resource-ref*, securityconstraint*, login-config?, security-role*,enventry*, ejb-ref*)> • “Body” for <web-app>…</web-app> – – – – – 22 <!ELEMENT servlet-name (#PCDATA)> <!ELEMENT output-file (#PCDATA)> • Familiar as things like <servlet-name>A4Servlet</servlet-name> Optional icon, display-name, distributable specifier Possibly some context-parameters Possibly some servlets … (it could even be empty and still be a valid web-app element!). 23 24 4 Attributes … Attributes … • From ejb’s DTD • From ejb’s DTD <!ELEMENT re-entrant EMPTY> <!ATTLIST re-entrant value <!ATTLIST entity-bean dname CDATA #REQUIRED> (true | false | TRUE | FALSE | True | False) #REQUIRED> • “re-entrant” tag that is required to be nested inside all entity-bean definitions • Just a tag – must have a true / false value given • Entity-bean tag (which has lots of nested subelements) must include a value for its dname attribute, value is any character data <entity-bean dname=eg> lots of stuff </entity-bean> <re-entrant value=false /> 25 Attributes … Entity references • Seem to be mainly for inserting either small pieces of fixed text, or contents of complete files, into XML file. • Example: DTD • From ejb’s DTD <!ELEMENT <!ATTLIST name comment 26 env-prop (#PCDATA)> env-prop CDATA #REQUIRED CDATA #IMPLIED> • An env-prop tag (<env-prop …>stuff</env-prop>) must have a name, and can have a comment <!ELEMENT CNOTICE (#PCDATA)> <!ENTITY MyShortNotice “Copyright by me”> <!ENTITY MyLONGNotice PUBLIC “http://me.org/legal/copyright.xml” > • Example: XML <CNOTICE>&MyShortNotice</CNOTICE> 27 Complete DTD files 28 Schema <?xml encoding="US-ASCII"?> <!ELEMENT ejb-JAR (input-file, output-file, (entity-bean | session-bean)+)> <!ELEMENT input-file (#PCDATA)> <!ELEMENT output-file (#PCDATA)> <!ELEMENT entity-bean (primary-key, re-entrant?, container-managed*, home-interface, etc etc env-prop*, dependency*)> <!ATTLIST entity-bean dname CDATA #REQUIRED> … <!ATTLIST env-prop name CDATA #REQUIRED comment CDATA #IMPLIED> • XML should allow you to define any kind of structured data – But DTD rules (a form of structured data) are represented in an entirely different format! – Inconsistent! 29 • XML Schema (xml schema definition file .xsd) created to allow rules for valid XML application file to be defined using an XML file 30 5 Schema W3.org XMLSchema • “An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.” Wikipedia 31 • Provides some basic data types – Can use these to specify the data types for elements in your application’s XML files • String, base64Binary, hexBinary, integer, long, int, decimal, double, boolean, duration, date, anyURI, language, … • Specifies how to define the elements in your XML document – Some elements will just be “simple types” – some data represented as one of the basic data types – Others will be “complex types” – built up from simpler elements using constraint rules 32 W3C example Purchase Order • Purchase order … <?xml version="1.0"?> <purchaseOrder orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US”> … </purchaseOrder> • Name (as in the shipto element) – it’s a string • Zip (as in the shipto element) – it’s a “decimal”! • Order date (attribute in Purchase order) – it’s a date. • Schema will specify such restrictions 33 34 Purchase order - structure Parts of the schema • A purchase order should have 1. 2. 3. 4. • Header “Ship to” information “bill to” information An optional comment And some number of “items” <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> <xsd:documentation xml:lang="en"> • Those data should appear in that order. • What’s an “item”? Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved. – It’s an instance of another complex type that will have to be defined in terms of its simpler constituent elements 35 </xsd:documentation> </xsd:annotation> <xsd:element name="purchaseOrder" type="PurchaseOrderType"/> <xsd:element name="comment" type="xsd:string"/> 36 6 What’s a purchase order type? What’s a USAddress? <xsd:complexType name="USAddress"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/> </xsd:complexType> <xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType> 37 38 What’s an “items”? Occurrence constraints <xsd:complexType name="Items"> <xsd:sequence> <xsd:element name="item" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="USPrice" type="xsd:decimal"/> 39 • • • • minOccurs maxOccurs fixed (default) 40 Extracting data • Intended recipient shares DTD (or schema) and so knows the form of the tree-structure of data and knows which nodes contain the specific data required. • So, can write a program that extracts the data from those nodes. • Two styles¶ Parsing XML – XML document is read, as each node constructed it can be processed – most discarded, those with required data are analyzed – XML document is read and a complete tree-structure is built, tree then traversed to locate and process nodes with required data 41 ¶Well maybe 2.5 styles; the on-the-fly processing can now be done in both "push" and "pull" modes. 42 7 Getting a Parser • System configuration determines parser – “Standard Java environment” has “.jar” files with a parser implementation (not necessarily same one for Java applications, servlets, J2EE!) – You don’t specify the specific class, you use a factory to create the parser using whatever is defined in environment Simple API for XML 43 44 Simple SAX ContentHandler • Sax parser • void startDocument() – somewhere in your application environment, default one in standard JDK/JRE – Called at the beginning of a document; use as opportunity to create storage structures, initialize data etc • Parser scrambles through contents of XML file, extracts elements, their attributes, and any character data. • As each thing found, it invokes methods of a “ContentHandler” to deal with the data. • void endDocument() – Received at the end of a document; tidy up, prepare final data for analysis. 45 46 ContentHandler : …Element, characters startElement(), • startElement called when Parser has finished with something like <THING class=“Aclass” color=“red”> • void startElement(String namespaceURI, String localName, String qName, Attributes atts) – Receive notification of the beginning of an element. • void characters(char[] ch, int start, int length) – Called routine can find that dealing with a “THING” element, and can get at list of its attributes – Receive notification of character data. • void endElement(String namespaceURI, String localName, String qName) – Receive notification of the end of an element. • These are the main functions 47 • Use this to: – Set context for processing any character data associated with THINGS (<THING>text</THING>) (“currentElement”) – Possibly create object that will collect data collected from nested elements of other types. 48 8 characters() characters() - problems • characters called when Parser has text associated with <Tag>PCDAT</Tag> • Use this to: • You may get multiple calls to characters() – – various reasons: • buffering when reading a file • special characters in file – Set the value of the current element • Simple example that follows will ignore problem, will show fix at end 49 50 endElement () Example • XML data file (at this stage it is simply a well-formed XML file, there is no DTD to define validity) • endElement called when Parser has consumed the </THING> tag • Use this to: – List of books • Book characterized by a rank attribute (based on sales) • Book has sub-elements for – Finish processing of current element (if you – – – – – – – created an object for the current element, then you may want to add this to a data collection that is being built) Title Authors Format List price Sales price … (some elements are optional) 51 52 XML file <LIST> <BOOK rank="1"> <ISBN code="0130894680"/> <TITLE>Core Java 2, Volume 1: Fundamentals 5/e</TITLE> <AUTHORS>Cay S. Horstmann, Gary Cornell</AUTHORS> <FORMAT>(Paperback)</FORMAT> <STARS>4.5</STARS> <SHIPS>1</SHIPS> <LISTPRICE>44.99</LISTPRICE> <OURPRICE>31.49</OURPRICE> <SAVE>13.50</SAVE> </BOOK> <BOOK> 53 … Tasks • Find: – Cost of least expensive book – Cost of most expensive book – Average cost of a Java book • Create a collection of “book” objects (and print its contents); collection to hold all books with rating 4.0* or better – “book” has • Title, Authors, Cost, Stars 54 9 SAX based implementation Prices.java • Create a class that extends org.xml.sax.helpers.DefaultHandler – (DefaultHandler implements ContentHandler interface; defines defaults for methods) • Define functions to process elements and characters as appropriate. • Simple main program – Create a SAX parser – Link to Handler class – Use parser to process data file 55 Prices.java – SAXy-stuff import java.io.*; import java.util.*; import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; public class Prices extends DefaultHandler { … public static void main (String[] args) { … } 56 } Prices.java • startElement() • A report stage (could have been made “endDocument”) – Record element type in member variable currentElement – Print report on book prices • characters() – If currentElement is “OURPRICE” have a price to process • Compare with min, max, add in sum, increment count 57 Prices.java public static void main (String[] args) { if(args.length != 1) { …} String fileName = args[0]; Prices pd = new Prices(); SAXParserFactory factory = SAXParserFactory.newInstance(); try { SAXParser saxParser = factory.newSAXParser(); saxParser.parse( new File(argv[0]), pd); } catch (SAXException e) { … } catch (IOException e) { … } pd.reportResults(); } 58 Prices.java 59 public class Prices extends DefaultHandler { String currentElement; float minPrice; float maxPrice; float averagePrice; int count; public Prices() { currentElement = null; minPrice = Float.MAX_VALUE; maxPrice = (float) 0.0; averagePrice = (float) 0.0; count = 0; } 60 10 Prices.java Prices.java public void startElement (String uri, String local, String qName, Attributes atts) throws SAXException { currentElement=qName; } public void endElement(String namespaceURI, String localName, String qName) throws SAXException { currentElement=null; } public void characters(char[] ch, int start, int length) throws SAXException { if("OURPRICE".equals(currentElement)) processOurPrice(new String(ch, start, length)); } <BOOK rank="1"> 61 Prices.java <ISBN code="0130894680"/> … <OURPRICE>31.49</OURPRICE> … </BOOK> 62 Prices.java private void processOurPrice(String str) { try { float cost = Float.parseFloat(str); minPrice = (minPrice < cost) ? minPrice : cost; maxPrice = (maxPrice > cost) ? maxPrice : cost; averagePrice += cost; count++; } catch(Exception e) { } } 63 Characters problem … public void reportResults() { if(count==0) { System.out.println("There are no results to report"); } else { averagePrice = averagePrice/((float)count); System.out.println("Minimum price $"+minPrice); System.out.println("Maximum price $"+maxPrice); System.out.println("Average price $“ +averagePrice); } } 64 characters() • Have to make it a bit more elaborate to handle cases where get multiple calls to characters() • Handler class has private StringBuffer sb; public void characters(char buf[], int offset, int len) throws SAXException { String s = new String(buf, offset, len); if(sb==null) sb = new StringBuffer(); sb.append(s); } • characters() – accumulate string • process at endElement() 65 66 11 endElement() Books.java • Second SAX exercise public void endElement(String namespaceURI, String sName, String qName ) throws SAXException { String content = null; if(sb != null) { content = sb.toString(); sb = null; } if(“OURPRICE".equals(qName)) { processOurPrice(content); } else … – build collection of books that satisfy certain requirements • As read each element – save its data • When get to end of “book” element, check the saved data to see if constraints satisfied; if book satisfactory, add to a collection 67 68 Books.java Books.java • Define “book” class (~a struct) • Define Books extends DefaultHandler • Define Books extends DefaultHandler – … – characters – startDocument • If current element is AUTHOR, OURPRICE, STARS of TITLE then set field of current book • Allocate a Vector to store books – startElement • If it’s a BOOK, create a new book object as “current book” • Always, set currentElement –… – endElement • If it’s a book, check “star” rating of current book object; if this >=4.0 add to collection of books – endDocument • Print contents of book collection 69 Books.java 70 This code uses the potentially buggy simpler characters() processing class book import the-usual! class book { … } public class Books extends DefaultHandler { … } 71 class book { private String title; private String authors; private float cost; private float stars; public void setTitle(String str) { title = str; } public void setAuthors(String str) { authors = str; } public void setCost(String cst) { try { cost = Float.parseFloat(cst); } catch(Exception e) { } } 72 12 class book class book public String toString() { StringBuffer buf = new StringBuffer(); buf.append(title); buf.append("; "); buf.append(authors); buf.append("; $"); buf.append(Float.toString(cost)); buf.append("; "); buf.append(Float.toString(stars)); buf.append("*"); return buf.toString(); } public void setStars(String strs) { try { stars = Float.parseFloat(strs); } catch(Exception e) { } } public float getStars() { return stars; } 73 Books.java : main 74 } Books.java : instance public class Books extends DefaultHandler { … public static void main (String[] args) { if(args.length != 1) { … } String fileName = args[0]; Books bks = new Books(); // Get factory, create parser … try { parser.parse(fileName, bks); } catch (SAXException e) { System.err.println (e); } catch (IOException e) { System.err.println (e); } } } public class Books extends DefaultHandler { Vector myBooks; book currentbook; String currentElement; public Books() { currentElement = null; currentbook = null; myBooks = null; } 75 Books.java : start/end Document 76 Books.java : startElement public void startDocument() throws SAXException { myBooks = new Vector(); } public void endDocument() throws SAXException { System.out.println("Good Java Books"); Enumeration e = myBooks.elements(); while(e.hasMoreElements()) System.out.println(e.nextElement()); } public void startElement (String uri, String local, String qName, Attributes atts) throws SAXException { if("BOOK".equals(qName)) { currentbook = new book(); } currentElement=qName; } 77 78 13 Books.java : characters public void characters(char[] ch, int start, int length) throws SAXException { String str = new String(ch, start, length); if("AUTHORS".equals(currentElement)) currentbook.setAuthors(str); else if("OURPRICE".equals(currentElement)) currentbook.setCost(str); else if("STARS".equals(currentElement)) currentbook.setStars(str); else if("TITLE".equals(currentElement)) currentbook.setTitle(str); } Books.java : endElement public void endElement(String namespaceURI, String localName, String qName) throws SAXException { if("BOOK".equals(qName)) { if(currentbook.getStars()>=4.0) myBooks.addElement(currentbook); currentbook=null; } currentElement=null; } 79 80 Bad notes in your SAX Deployment • A mal-formed XML file will result in an exception – probably at the very end of the document when the parser finds that an opening tag wasn’t appropriately matched. • Nothing special – XML parsers are part of standard JDK, simply have to use the ParserFactory interface. 81 82 Push or pull • SAX StAX – Parser • finds lexemes ("start element", characters etc) • "pushes" them into a registered handler An alternative light weight parser – Your code – a kind of "event handler" • StAX – Your code "pulls" successive elements from the parser 83 84 14 Push and pull – Sun's view Advantage pull? Sun says … • Streaming pull parsing refers to a programming model in which a client application calls methods on an XML parsing library when it needs to interact with an XML infoset; that is, the client only gets (pulls) XML data when it explicitly asks for it. • Streaming push parsing refers to a programming model in which an XML parser sends (pushes) XML data to the client as the parser encounters elements in an XML infoset; that is, the parser sends the data whether or not the client is ready to use it at that time. • Pull parsing provides several advantages over push parsing when working with XML streams: 1. With pull parsing, the client controls the application thread, and can call methods on the parser when needed. By contrast, with push processing, the parser controls the application thread, and the client can only accept invocations from the parser. 2. Pull parsing libraries can be much smaller and the client code to interact with those libraries much simpler than with push libraries, even for more complex documents. 3. Pull clients can read multiple documents at one time with a single thread. 4. A StAX pull parser can filter XML documents such that elements unnecessary to the client can be ignored, and it can support XML views of non-XML data. 85 86 StAX and SAX Comparison • "StAX-enabled clients are generally easier to code than SAX clients. While it can be argued that SAX parsers are marginally easier to write, StAX parser code can be smaller and the code necessary for the client to interact with the parser simpler. • StAX is a bidirectional API, meaning that it can both read and write XML documents. SAX is read only, so another API is needed if you want to write XML documents. • SAX is a push API, whereas StAX is pull." Feature StAX SAX API Type Pull, streaming Push, streaming DOM In memory tree Ease of Use High Medium High XPath Capability Not supported Not supported Supported CPU and Memory Efficiency Good Good Varies Forward Only Supported Supported Not supported Read XML Supported Supported Supported Write XML Supported Not supported Supported 87 StAX 88 Book price example – StAX-iterator style • Dual personality: public class Main { private static float minPrice; … private static void reportResults() { … } private static void processStartElement( StartElement se) { … } private static void processOurPrice(String str) { .. } private static void processCharacters(Characters ch) { … } private static void processEndElement() { … } private static void processEvent(XMLEvent xmle) { … } public static void main(String[] args) throws Exception { … } } – Cursor API public interface XMLStreamReader { public int next() throws XMLStreamException; public boolean hasNext() throws XMLStreamException; public String getText(); public String getLocalName(); public String getNamespaceURI(); // ... other methods not shown } – Iterator API public interface XMLEventReader extends Iterator { public XMLEvent nextEvent() throws XMLStreamException; public boolean hasNext(); public XMLEvent peek() throws XMLStreamException; ... } 89 90 15 Main – get StAX parser, use it as iterator public static void main(String[] args) throws Exception { String filename = args[0]; XMLInputFactory factory = XMLInputFactory.newInstance(); factory.setProperty( XMLInputFactory.IS_COALESCING,true); XMLEventReader r = factory.createXMLEventReader( filename, new FileInputStream(filename)); while (r.hasNext()) { XMLEvent e = r.nextEvent(); processEvent(e); } reportResults(); } processEvent – what is it? private static void processEvent(XMLEvent xmle) { if (xmle.isStartElement()) processStartElement(xmle.asStartElement()); else if (xmle.isCharacters()) processCharacters(xmle.asCharacters()); else if(xmle.isEndElement()) processEndElement(); } 91 Start and End elements 92 processCharacters private static void processStartElement( StartElement se) { QName qname = se.getName(); String str = qname.getLocalPart(); currentElement = str; private static void processCharacters( Characters ch) { if (ch.isWhiteSpace()) return; String str = ch.getData(); if("OURPRICE".equals(currentElement)) processOurPrice(str); } } private static void processEndElement() { currentElement = null; } 93 94 processOurPrice private static void processOurPrice(String str) { try { float cost = Float.parseFloat(str); minPrice = (minPrice < cost) ? minPrice : cost; maxPrice = (maxPrice > cost) ? maxPrice : cost; averagePrice += cost; count++; } catch(Exception e) { … } } 95 DOM Document Object Model 96 16 DOM DOM's role • Builds a tree-representation of the contents of an XML document • DOM started out as an idea for trying to organize data in web-pages so that can provide more than just browsing. – Nodes (many types) – Links to parents/children/siblings – Data to be structured and described by tags with defined semantics. – Search programs would then work more effectively as they could "understand" the contents of a document. • Supports mechanisms for • DOM evolved to become a method of representing and manipulating structured data – XML data. – – – – Traversing the tree Searching for specific nodes Processing groups of nodes Tree surgery – removing, rearranging, and adding nodes 97 98 SAX or DOM DOM • DOM more costly • SAX – One pass through XML – Reads entire XML file into memory structures – Memory structures that are not economic (lots of little objects and links) – Lots more processing and checking • Select small subset of data • Possibly build own data structure to hold required subset of data • DOM – One pass through XML to build "Document" (the DOM-tree) – Any number of passes through Document to perform different tasks – Structure built automatically to hold all data • But – DOM Document is the complete data set – You can do more elaborate processing 99 Programming … 100 Java 1.5 • Program • Life is a bit more complex with Java 1.5 – Instantiate DOM parser via factory – Tell parser to process XML file • Catch any exceptions and quit – Ask parser for its "Document" (DOM-tree) – Write functions that traverse the DOM-tree performing whatever data analysis you need. – Should parser check "validity"? • Your choice. DOM parser doesn't need a reference DTD. But if you have a DTD then you can ask the DOM to be strict and enforce validity. Uses a lot of the DOM's time and therefore costly. 101 – More elaborate mechanisms for getting parser and hence document – Parsing and validation now separated • Parse the input to get a DOM document • Create a validator • Ask validator to validate document – requires XSL schema – DTD not acceptable (sigh, those schema are hard to write and hard to read) 102 17 DOMProgram DOMProgram.java : 1 import org.w3c.dom.*; import org.xml.sax.SAXException; import java.io.*; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.ParserConfigurationException; public class BasicDom { private void traverse (Node node, int indent) { … } public static void main (String[] args) { if(args.length<1) { … } … } } • Parse a document Walk the resulting tree – When encounter "ELEMENT" nodes, print out node name, attributes – When get a TEXT_NODE, print contents • DOMProgram – Essentially procedural 103 Getting parser 104 Parsing, then invoke tree traversal DocumentBuilderFactory factory = null; DocumentBuilder parser = null; try { factory = DocumentBuilderFactory.newInstance(); parser = factory.newDocumentBuilder(); } catch(ParserConfigurationException pce) { System.out.println(pce); System.exit(1); } Document doc; try { doc = parser.parse(args[0]); } catch (SAXException sae) { … } catch (IOException e) { … } Node n = document.getFirstChild(); traverse (n, 0); 105 Tree walking private void traverse (Node node, int indent) { int type = node.getNodeType(); if (type == Node.ELEMENT_NODE) {… } if(type == Node.TEXT_NODE) { … } NodeList children = node.getChildNodes(); if (children != null) { for (int i=0; i< children.getLength(); i++) traverse (children.item(i), indent+1); } } 107 106 if (type == Node.ELEMENT_NODE) { for(int i=0;i<indent;i++) System.out.print("\t"); System.out.print(node.getNodeName()); if(node.hasAttributes()) { System.out.print("("); NamedNodeMap nnm = node.getAttributes(); for(int j=0;j<nnm.getLength();j++) { Node item = nnm.item(j); System.out.print("\t" + item.getNodeName()); Node firstchild = item.getFirstChild(); System.out.print("\t" + firstchild.getNodeValue()); } System.out.print(")"); 108 } } 18 output if(type == Node.TEXT_NODE) { String content = node.getNodeValue(); System.out.println(" " + content); } LIST BOOK( rank 1) ISBN( code 0130894680) TITLE Core Java 2, Volume 1: Fundamentals 5/e AUTHORS Cay S. Horstmann, Gary Cornell FORMAT (Paperback) STARS 4.5 109 SHIPS 1 110 Prices & Books DOM style Why so many newlines? • Build the Document DOM-tree • Use standard traversal functions to find the nodes • Parser finds whitespace in input file and creates TEXT_NODE representing white space: • Code to build the Document essentially as shown previously. • Once built, iterate through it finding prices associated with book nodes • Then again to print details of nodes with good star ratings. <STARS>4.5</STARS>□◊ □□□□□□□□<SHIPS> 111 Validation 112 Schemas – simple elements <?xml version="1.0" ?> • This time validate the XML document <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="TITLE" type="xs:string" /> <xs:element name="AUTHORS" type="xs:string" /> <xs:element name="FORMAT" type="xs:string" /> <xs:element name="SHIPS" type="xs:string" /> <xs:element name="STARS" type="xs:string" /> <xs:element name="LISTPRICE" type="xs:string" /> <xs:element name="OURPRICE" type="xs:string" /> <xs:element name="SAVE" type="xs:string" /> <xs:element name="USEDPRICE" type="xs:string" /> • But that means have to provide a schema file. • Example on schemas http://www.xml.com/lpt/a/2000/11/29/schemas/part1.html 113 114 19 Schemas – element with attributes <!-- ISBN type, could do better define restriction that code must be string of digits --> <xs:element name="ISBN"> <xs:complexType> <xs:attribute name="code" type="xs:string" /> </xs:complexType> </xs:element> What I’m really saying is ISBN has an attribute but no content – <ISBN code="0130894680"/> 115 Schemas – structured element <xs:element name="BOOK"> <xs:complexType> <!-- Not certain that data were always defined in same order! so try "all" rather than sequence --> <xs:all> <xs:element ref="ISBN" /> <xs:element ref="TITLE" /> <xs:element ref="AUTHORS" /> <xs:element ref="FORMAT" /> <xs:element ref="STARS" minOccurs="0" maxOccurs="1" /> <xs:element ref="SHIPS" /> <xs:element ref="LISTPRICE" minOccurs="0" maxOccurs="1" /> <xs:element ref="OURPRICE" /> <xs:element ref="SAVE" minOccurs="0" maxOccurs="1" /> <xs:element ref="USEDPRICE" minOccurs="0" maxOccurs="1" /> </xs:all> <xs:attribute name="rank" type="xs:string" /> </xs:complexType> </xs:element> 116 Schemas – structured element Setup … get schema etc import javax.xml.XMLConstants; import javax.xml.validation.*; import javax.xml.transform.dom.DOMSource; class … { … process (String xmlfile, String xsdfile) { SchemaFactory schFactory = SchemaFactory.newInstance (XMLConstants.W3C_XML_SCHEMA_NS_URI); Schema schema = null; try { schema = schFactory.newSchema(new File(xsdFile)); } catch(SAXException saxe) { … } Validator validator = schema.newValidator(); <xs:element name="LIST"> <xs:complexType> <xs:sequence> <xs:element ref="BOOK" minOccurs="1" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> 117 118 Parse XML file DocumentBuilderFactory factory = null; DocumentBuilder parser = null; try { factory = DocumentBuilderFactory.newInstance(); parser = factory.newDocumentBuilder(); } catch(ParserConfigurationException pce) { … } Document document = null; try { document = parser.parse(xmlFile); } catch (SAXException sae) { … } catch (IOException ioe) { … } Validate document try { validator.validate(new DOMSource(document)); } catch (SAXException sae2) { … } catch (IOException ioe2) { … } 119 120 20 Walk the tree – gather price data NodeList bookNodes = document.getElementsByTagName("BOOK"); float minprice = Float.MAX_VALUE; float maxprice = (float) 0.0; int count = 0; float average = (float) 0.0; for(int i=0;i<bookNodes.getLength();i++) { … } System.out.println("Price report - "); System.out.println("Minimum price $" + minprice); … Walk the tree – gather price data for(int i=0;i<bookNodes.getLength();i++) { Element bNode = (Element) bookNodes.item(i); Node priceNode = bNode.getElementsByTagName( "OURPRICE").item(0); String price = priceNode.getFirstChild().getNodeValue(); float aprice = Float.parseFloat(price); minprice = (minprice < aprice) ? minprice : aprice; maxprice = (maxprice > aprice) ? maxprice : aprice; average += aprice; count++; } 121 122 Java books Nodes System.out.println("The popular Java books:"); for(int i=0;i<bookNodes.getLength();i++) { Element bNode = (Element) bookNodes.item(i); NodeList temp = bNode.getElementsByTagName("STARS"); if(temp.getLength()==0)continue; // A few books don't have star ratings Node starsNode = temp.item(0); String starCode = starsNode.getFirstChild().getNodeValue(); float stars = Float.parseFloat(starCode); if(stars<4.0) continue; Node titleNode = bNode.getElementsByTagName("TITLE").item(0); System.out.print(titleNode.getFirstChild().getNodeValue()); … • Lots of different kinds of nodes – Document – Attribute – Element – Entity – TEXT –… 123 Trees Processing functions • Tree structure – large and complex! • The tree for Books XML file • Lots of functions (repertoire varies with Node type) – Root element “LIST” • Children include BOOK ELEMENT-NODES and other junk (maybe some nodes representing “ignorable white space characters” found between each book) – BOOK ELEMENT • Children elements 124 - Attributes, and ELEMENT nodes for nested – Create iterator to walk through nodes – Create treewalker to explore entire subtree – Find element by tag name –… – Text nodes with content 125 126 21 Creating a new tree & output of XML Creating a new tree & output of XML • Sometimes useful to create new XML document with own data • An example at http://www.javazoom.net/services/newsletter/xmlgeneration.html – create DOM tree in memory, • • • • create root node add children nodes (and child nodes of child nodes) add attribute nodes to other nodes add text nodes for data – get a “transformer” – use transformer to output structure as XML text • Suppose wanted a “good Java books” XML output <goodjava> <goodie> <auth>Hu Mee</auth> <title>Wizard Web Sites with JSP</title> 127 128 Creating a new DOM tree Output as XML • Create an empty document – TransformerFactory tFactory = TransformerFactory.newInstance(); Transformer transformer = tFactory.newTransformer(); Document newdoc = parser.newDocument(); • Create a root element for new doc Element root = newdoc.createElement(“goodjava"); • Add another child Element goodbook = newdoc.createElement(“goodie"); Element name = newdoc.createElement(“auth"); name.appendChild(newdoc.createTextNode(“Hu Mee”)); goodbook.appendChild(name); Element title = newdoc.createElement(“title"); title.appendChild(newdoc.createTextNode(“Wizard Web Sites with JSP”)); goodbook.appendChild(title); root.appendChild(goodbook ); transformer.setOutputProperty(OutputKeys.INDENT,"yes"); DOMSource source = new DOMSource(d); StreamResult result = new StreamResult(System.out); transformer.transform(source, result); 129 130 Why XML? • Why? </XML> – Safe data exchange and processing • Avoids intimate contacts between data consumer and producer (and with all the viruses around these days, you really do want the safety factor). 131 132 22 XML • Producer – Dump data in a file with format defined by a DTD • Consumer – Display data as desired – Extract specific data needed – Possibly, generate new data – in another XML document defined by another DTD – and return to original data source. 133 23
© Copyright 2024