DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH
 

XML::DOM



NAME

XML::DOM - A perl module for building DOM Level 1 compliant document structures


SYNOPSIS

 use XML::DOM;
 my $parser = new XML::DOM::Parser;
 my $doc = $parser->parsefile ("file.xml");
 # print all HREF attributes of all CODEBASE elements
 my $nodes = $doc->getElementsByTagName ("CODEBASE");
 my $n = $nodes->getLength;
 for (my $i = 0; $i < $n; $i++)
 {
     my $node = $nodes->item ($i);
     my $href = $node->getAttributeNode ("HREF");
     print $href->getValue . "\n";
 }
 # Print doc file
 $doc->printToFile ("out.xml");
 # Print to string
 print $doc->toString;
 # Avoid memory leaks - cleanup circular references for garbage collection
 $doc->dispose;


DESCRIPTION

This module extends the XML::Parser module by Clark Cooper. The XML::Parser module is built on top of XML::Parser::Expat, which is a lower level interface to James Clark's expat library.

XML::DOM::Parser is derived from XML::Parser. It parses XML strings or files and builds a data structure that conforms to the API of the Document Object Model as described at http://www.w3.org/TR/REC-DOM-Level-1. See the XML::Parser manpage for other available features of the XML::DOM::Parser class. Note that the 'Style' property should not be used (it is set internally.)

The XML::Parser NoExpand option is more or less supported, in that it will generate EntityReference objects whenever an entity reference is encountered in character data. I'm not sure how useful this is. Any comments are welcome.

As described in the synopsis, when you create an XML::DOM::Parser object, the parse and parsefile methods create an XML::DOM::Document object from the specified input. This Document object can then be examined, modified and written back out to a file or converted to a string.

When using XML::DOM with XML::Parser version 2.19 and up, setting the XML::DOM::Parser option KeepCDATA to 1 will store CDATASections in CDATASection nodes, instead of converting them to Text nodes. Subsequent CDATASection nodes will be merged into one. Let me know if this is a problem.

When using XML::Parser 2.27 and above, you can suppress expansion of parameter entity references (e.g. %pent;) in the DTD, by setting ParseParamEnt to 1 and ExpandParamEnt to 0. See Hidden Nodes for details.

A Document has a tree structure consisting of Node objects. A Node may contain other nodes, depending on its type. A Document may have Element, Text, Comment, and CDATASection nodes. Element nodes may have Attr, Element, Text, Comment, and CDATASection nodes. The other nodes may not have any child nodes.

This module adds several node types that are not part of the DOM spec (yet.) These are: ElementDecl (for <!ELEMENT ...> declarations), AttlistDecl (for <!ATTLIST ...> declarations), XMLDecl (for <?xml ...?> declarations) and AttDef (for attribute definitions in an AttlistDecl.)


XML::DOM Classes

The XML::DOM module stores XML documents in a tree structure with a root node of type XML::DOM::Document. Different nodes in tree represent different parts of the XML file. The DOM Level 1 Specification defines the following node types:

In addition, the XML::DOM module contains the following nodes that are not part of the DOM Level 1 Specification:

Other classes that are part of the DOM Level 1 Spec:

Other classes that are not part of the DOM Level 1 Spec:


XML::DOM package

Constant definitions

The following predefined constants indicate which type of node it is.

 UNKNOWN_NODE (0)                The node type is unknown (not part of DOM)
 ELEMENT_NODE (1)                The node is an Element.
 ATTRIBUTE_NODE (2)              The node is an Attr.
 TEXT_NODE (3)                   The node is a Text node.
 CDATA_SECTION_NODE (4)          The node is a CDATASection.
 ENTITY_REFERENCE_NODE (5)       The node is an EntityReference.
 ENTITY_NODE (6)                 The node is an Entity.
 PROCESSING_INSTRUCTION_NODE (7) The node is a ProcessingInstruction.
 COMMENT_NODE (8)                The node is a Comment.
 DOCUMENT_NODE (9)               The node is a Document.
 DOCUMENT_TYPE_NODE (10)         The node is a DocumentType.
 DOCUMENT_FRAGMENT_NODE (11)     The node is a DocumentFragment.
 NOTATION_NODE (12)              The node is a Notation.
 ELEMENT_DECL_NODE (13)          The node is an ElementDecl (not part of DOM)
 ATT_DEF_NODE (14)               The node is an AttDef (not part of DOM)
 XML_DECL_NODE (15)              The node is an XMLDecl (not part of DOM)
 ATTLIST_DECL_NODE (16)          The node is an AttlistDecl (not part of DOM)
 Usage:
   if ($node->getNodeType == ELEMENT_NODE)
   {
       print "It's an Element";
   }

Not In DOM Spec: The DOM Spec does not mention UNKNOWN_NODE and, quite frankly, you should never encounter it. The last 4 node types were added to support the 4 added node classes.

Global Variables

$VERSION

The variable $XML::DOM::VERSION contains the version number of this implementation, e.g. ``1.43''.

METHODS

These methods are not part of the DOM Level 1 Specification.

getIgnoreReadOnly and ignoreReadOnly (readOnly)

The DOM Level 1 Spec does not allow you to edit certain sections of the document, e.g. the DocumentType, so by default this implementation throws DOMExceptions (i.e. NO_MODIFICATION_ALLOWED_ERR) when you try to edit a readonly node. These readonly checks can be disabled by (temporarily) setting the global IgnoreReadOnly flag.

The ignoreReadOnly method sets the global IgnoreReadOnly flag and returns its previous value. The getIgnoreReadOnly method simply returns its current value.

 my $oldIgnore = XML::DOM::ignoreReadOnly (1);
 eval {
 ... do whatever you want, catching any other exceptions ...
 };
 XML::DOM::ignoreReadOnly ($oldIgnore);     # restore previous value

Another way to do it, using a local variable:

 { # start new scope
    local $XML::DOM::IgnoreReadOnly = 1;
    ... do whatever you want, don't worry about exceptions ...
 } # end of scope ($IgnoreReadOnly is set back to its previous value)
isValidName (name)

Whether the specified name is a valid ``Name'' as specified in the XML spec. Characters with Unicode values > 127 are now also supported.

getAllowReservedNames and allowReservedNames (boolean)

The first method returns whether reserved names are allowed. The second takes a boolean argument and sets whether reserved names are allowed. The initial value is 1 (i.e. allow reserved names.)

The XML spec states that ``Names'' starting with (X|x)(M|m)(L|l) are reserved for future use. (Amusingly enough, the XML version of the XML spec (REC-xml-19980210.xml) breaks that very rule by defining an ENTITY with the name 'xmlpio'.) A ``Name'' in this context means the Name token as found in the BNF rules in the XML spec.

XML::DOM only checks for errors when you modify the DOM tree, not when the DOM tree is built by the XML::DOM::Parser.

setTagCompression (funcref)

There are 3 possible styles for printing empty Element tags:

Style 0
 <empty/> or <empty attr="val"/>

XML::DOM uses this style by default for all Elements.

Style 1
  <empty></empty> or <empty attr="val"></empty>
Style 2
  <empty /> or <empty attr="val" />

This style is sometimes desired when using XHTML. (Note the extra space before the slash ``/'') See http://www.w3.org/TR/xhtml1 Appendix C for more details.

By default XML::DOM compresses all empty Element tags (style 0.) You can control which style is used for a particular Element by calling XML::DOM::setTagCompression with a reference to a function that takes 2 arguments. The first is the tag name of the Element, the second is the XML::DOM::Element that is being printed. The function should return 0, 1 or 2 to indicate which style should be used to print the empty tag. E.g.

 XML::DOM::setTagCompression (\&my_tag_compression);
 sub my_tag_compression
 {
    my ($tag, $elem) = @_;
    # Print empty br, hr and img tags like this: <br />
    return 2 if $tag =~ /^(br|hr|img)$/;
    # Print other empty tags like this: <empty></empty>
    return 1;
 }


IMPLEMENTATION DETAILS


SEE ALSO

the XML::DOM::XPath manpage

The Japanese version of this document by Takanori Kawai (Hippo2000) at http://member.nifty.ne.jp/hippo2000/perltips/xml/dom.htm

The DOM Level 1 specification at http://www.w3.org/TR/REC-DOM-Level-1

The XML spec (Extensible Markup Language 1.0) at http://www.w3.org/TR/REC-xml

The the XML::Parser manpage and the XML::Parser::Expat manpage manual pages.

the XML::LibXML manpage also provides a DOM Parser, and is significantly faster than XML::DOM, and is under active development. It requires that you download the Gnome libxml library.

the XML::GDOME manpage will provide the DOM Level 2 Core API, and should be as fast as XML::LibXML, but more robust, since it uses the memory management functions of libgdome. For more details see http://tjmather.com/xml-gdome/


CAVEATS

The method getElementsByTagName() does not return a ``live'' NodeList. Whether this is an actual caveat is debatable, but a few people on the www-dom mailing list seemed to think so. I haven't decided yet. It's a pain to implement, it slows things down and the benefits seem marginal. Let me know what you think.


AUTHOR

Enno Derksen is the original author.

Send patches to T.J. Mather at <tjmather@maxmind.com>.

Paid support is available from directly from the maintainers of this package. Please see http://www.maxmind.com/app/opensourceservices for more details.

Thanks to Clark Cooper for his help with the initial version.