Prev Class | Next Class | Frames | No Frames |
Summary: Nested | Field | Method | Constr | Detail: Nested | Field | Method | Constr |
java.lang.Object
gnu.xml.util.XMLWriter
public class XMLWriter
extends java.lang.Object
implements ContentHandler, LexicalHandler, DTDHandler, DeclHandler
NSFilter
is
one solution to this problem, in the context of processing pipelines.
Something as simple as connecting this handler to a parser might not
generate the correct output. Another workaround is to ensure that the
namespace-prefixes feature is always set to true, if you're
hooking this directly up to some XMLReader implementation.
TextConsumer
Constructor Summary | |
| |
| |
| |
|
Method Summary | |
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
protected void |
|
void |
|
void |
|
void |
|
boolean |
|
boolean |
|
boolean |
|
boolean |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
void |
|
public XMLWriter() throws IOException
Constructs this handler with System.out used to write SAX events using the UTF-8 encoding. Avoid using this except when you know it's safe to close System.out at the end of the document.
public XMLWriter(OutputStream out) throws IOException
Constructs a handler which writes all input to the output stream in the UTF-8 encoding, and closes it when endDocument is called. (Yes it's annoying that this throws an exception -- but there's really no way around it, since it's barely possible a JDK may exist somewhere that doesn't know how to emit UTF-8.)
public XMLWriter(Writer writer)
Constructs a handler which writes all input to the writer, and then closes the writer when the document ends. If an XML declaration is written onto the output, and this class can determine the name of the character encoding for this writer, that encoding name will be included in the XML declaration. See the description of the constructor which takes an encoding name for imporant information about selection of encodings.
- Parameters:
writer
- XML text is written to this writer.
public XMLWriter(Writer writer, String encoding)
Constructs a handler which writes all input to the writer, and then closes the writer when the document ends. If an XML declaration is written onto the output, this class will use the specified encoding name in that declaration. If no encoding name is specified, no encoding name will be declared unless this class can otherwise determine the name of the character encoding for this writer. At this time, only the UTF-8 ("UTF8") and UTF-16 ("Unicode") output encodings are fully lossless with respect to XML data. If you use any other encoding you risk having your data be silently mangled on output, as the standard Java character encoding subsystem silently maps non-encodable characters to a question mark ("?") and will not report such errors to applications. For a few other encodings the risk can be reduced. If the writer is a java.io.OutputStreamWriter, and uses either the ISO-8859-1 ("8859_1", "ISO8859_1", etc) or US-ASCII ("ASCII") encodings, content which can't be encoded in those encodings will be written safely. Where relevant, the XHTML entity names will be used; otherwise, numeric character references will be emitted. However, there remain a number of cases where substituting such entity or character references is not an option. Such references are not usable within a DTD, comment, PI, or CDATA section. Neither may they be used when element, attribute, entity, or notation names have the problematic characters.
- Parameters:
writer
- XML text is written to this writer.encoding
- if non-null, and an XML declaration is written, this is the name that will be used for the character encoding.
public final void attributeDecl(String eName, String aName, String type, String mode, String value) throws SAXException
SAX2: called on attribute declarations
- Specified by:
- attributeDecl in interface DeclHandler
public final void characters(ch[] , int start, int length) throws SAXException
SAX1: reports content characters
- Specified by:
- characters in interface ContentHandler
public final void comment(ch[] , int start, int length) throws SAXException
SAX2: called when comments are parsed. When XHTML is used, the old HTML tradition of using comments to for inline CSS, or for JavaScript code is discouraged. This is because XML processors are encouraged to discard, on the grounds that comments are for users (and perhaps text editors) not programs. Instead, use external scripts
- Specified by:
- comment in interface LexicalHandler
public final void elementDecl(String name, String model) throws SAXException
SAX2: called on element declarations
- Specified by:
- elementDecl in interface DeclHandler
public final void endCDATA() throws SAXException
SAX2: called after parsing CDATA characters
- Specified by:
- endCDATA in interface LexicalHandler
public final void endDTD() throws SAXException
SAX2: called after the doctype is parsed
- Specified by:
- endDTD in interface LexicalHandler
public void endDocument() throws SAXException
SAX1: indicates the completion of a parse. Note that all complete SAX event streams make this call, even if an error is reported during a parse.
- Specified by:
- endDocument in interface ContentHandler
public final void endElement(String uri, String localName, String qName) throws SAXException
SAX2: indicates the end of an element
- Specified by:
- endElement in interface ContentHandler
public final void endEntity(String name) throws SAXException
SAX2: called after parsing a general entity in content
- Specified by:
- endEntity in interface LexicalHandler
public final void endPrefixMapping(String prefix)
SAX2: ignored.
- Specified by:
- endPrefixMapping in interface ContentHandler
public final void externalEntityDecl(String name, String publicId, String systemId) throws SAXException
SAX2: called on external entity declarations
- Specified by:
- externalEntityDecl in interface DeclHandler
protected void fatal(String message, Exception e) throws SAXException
Used internally and by subclasses, this encapsulates the logic involved in reporting fatal errors. It uses locator information for good diagnostics, if available, and gives the application's ErrorHandler the opportunity to handle the error before throwing an exception.
public final void flush() throws IOException
Flushes the output stream. When this handler is used in long lived pipelines, it can be important to flush buffered state, for example so that it can reach the disk as part of a state checkpoint.
public final void ignorableWhitespace(ch[] , int start, int length) throws SAXException
SAX1: reports ignorable whitespace
- Specified by:
- ignorableWhitespace in interface ContentHandler
public final void internalEntityDecl(String name, String value) throws SAXException
SAX2: called on internal entity declarations
- Specified by:
- internalEntityDecl in interface DeclHandler
public final boolean isCanonical()
Returns value of flag controlling canonical output.
public final boolean isExpandingEntities()
Returns true if the output will have no entity references; returns false (the default) otherwise.
public final boolean isPrettyPrinting()
Returns value of flag controlling pretty printing.
public final boolean isXhtml()
Returns true if the output attempts to echo the input following "transitional" XHTML rules and matching the "HTML Compatibility Guidelines" so that an HTML version 3 browser can read the output as HTML; returns false (the default) othewise.
public final void notationDecl(String name, String publicId, String systemId) throws SAXException
SAX1: called on notation declarations
- Specified by:
- notationDecl in interface DTDHandler
public final void processingInstruction(String target, String data) throws SAXException
SAX1: reports a PI. This doesn't check for illegal target names, such as "xml" or "XML", or namespace-incompatible ones like "big:dog"; the caller is responsible for ensuring those names are legal.
- Specified by:
- processingInstruction in interface ContentHandler
public final void setCanonical(boolean value)
Sets the output style to be canonicalized. Input events must meet requirements that are slightly more stringent than the basic well-formedness ones, and include:Note that fragments of XML documents, as specified by an XPath node set, may be canonicalized. In such cases, elements may need some fixup (for xml:* attributes and application-specific context).
- Namespace prefixes must not have been changed from those in the original document. (This may only be ensured by setting the SAX2 XMLReader namespace-prefixes feature flag; by default, it is cleared.)
- Redundant namespace declaration attributes have been removed. (If an ancestor element defines a namespace prefix and that declaration hasn't been overriden, an element must not redeclare it.)
- If comments are not to be included in the canonical output, they must first be removed from the input event stream; this Canonical XML with comments by default.
- If the input character encoding was not UCS-based, the character data must have been normalized using Unicode Normalization Form C. (UTF-8 and UTF-16 are UCS-based.)
- Attribute values must have been normalized, as is done by any conformant XML processor which processes all external parameter entities.
- Similarly, attribute value defaulting has been performed.
public final void setDocumentLocator(Locator l)
SAX1: provides parser status information
- Specified by:
- setDocumentLocator in interface ContentHandler
public final void setEOL(String eolString)
Assigns the line ending style to be used on output.
- Parameters:
eolString
- null to use the system default; else "\n", "\r", or "\r\n".
public void setErrorHandler(ErrorHandler handler)
Assigns the error handler to be used to present most fatal errors.
public final void setExpandingEntities(boolean value)
Controls whether the output text contains references to entities (the default), or instead contains the expanded values of those entities.
public final void setPrettyPrinting(boolean value)
Controls pretty-printing, which by default is not enabled (and currently is most useful for XHTML output). Pretty printing enables structural indentation, sorting of attributes by name, line wrapping, and potentially other mechanisms for making output more or less readable. At this writing, structural indentation and line wrapping are enabled when pretty printing is enabled and the xml:space attribute has the value default (its other legal value is preserve, as defined in the XML specification). The three XHTML element types which use another value are recognized by their names (namespaces are ignored). Also, for the record, the "pretty" aspect of printing here is more to provide basic structure on outputs that would otherwise risk being a single long line of text. For now, expect the structure to be ragged ... unless you'd like to submit a patch to make this be more strictly formatted!
public final void setWriter(Writer writer, String encoding)
Resets the handler to write a new text document.
- Parameters:
writer
- XML text is written to this writer.encoding
- if non-null, and an XML declaration is written, this is the name that will be used for the character encoding.
public final void setXhtml(boolean value)
Controls whether the output should attempt to follow the "transitional" XHTML rules so that it meets the "HTML Compatibility Guidelines" appendix in the XHTML specification. A "transitional" Document Type Declaration (DTD) is placed near the beginning of the output document, instead of whatever DTD would otherwise have been placed there, and XHTML empty elements are printed specially. When writing text in US-ASCII or ISO-8859-1 encodings, the predefined XHTML internal entity names are used (in preference to character references) when writing content characters which can't be expressed in those encodings. When this option is enabled, it is the caller's responsibility to ensure that the input is otherwise valid as XHTML. Things to be careful of in all cases, as described in the appendix referenced above, include:Additionally, some of the oldest browsers have additional quirks, to address with guidelines such as:
- Element and attribute names must be in lower case, both in the document and in any CSS style sheet.
- All XML constructs must be valid as defined by the XHTML "transitional" DTD (including all familiar constructs, even deprecated ones).
- The root element must be "html".
- Elements that must be empty (such as <br> must have no content.
- Use both lang and xml:lang attributes when specifying language.
- Similarly, use both id and name attributes when defining elements that may be referred to through URI fragment identifiers ... and make sure that the value is a legal NMTOKEN, since not all such HTML 4.0 identifiers are valid in XML.
- Be careful with character encodings; make sure you provide a <meta http-equiv="Content-type" content="text/xml;charset=..." /> element in the HTML "head" element, naming the same encoding used to create this handler. Also, if that encoding is anything other than US-ASCII, make sure that if the document is given a MIME content type, it has a charset=... attribute with that encoding.
Also, some characteristics of the resulting output may be a function of whether the document is later given a MIME content type of text/html rather than one indicating XML (application/xml or text/xml). Worse, some browsers ignore MIME content types and prefer to rely URI name suffixes -- so an "index.xml" could always be XML, never XHTML, no matter its MIME type.
- Processing instructions may be rendered, so avoid them. (Similarly for an XML declaration.)
- Embedded style sheets and scripts should not contain XML markup delimiters: &, <, and ]]> are trouble.
- Attribute values should not have line breaks or multiple consecutive white space characters.
- Use no more than one of the deprecated (transitional) <isindex> elements.
- Some boolean attributes (such as compact, checked, disabled, readonly, selected, and more) confuse some browsers, since they only understand minimized versions which are illegal in XML.
public void skippedEntity(String name) throws SAXException
SAX1: indicates a non-expanded entity reference
- Specified by:
- skippedEntity in interface ContentHandler
public final void startCDATA() throws SAXException
SAX2: called before parsing CDATA characters
- Specified by:
- startCDATA in interface LexicalHandler
public final void startDTD(String name, String publicId, String systemId) throws SAXException
SAX2: called when the doctype is partially parsed Note that this, like other doctype related calls, is ignored when XHTML is in use.
- Specified by:
- startDTD in interface LexicalHandler
public void startDocument() throws SAXException
SAX1: indicates the beginning of a document parse. If you're writing (well formed) fragments of XML, neither this nor endDocument should be called.
- Specified by:
- startDocument in interface ContentHandler
public final void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
SAX2: indicates the start of an element. When XHTML is in use, avoid attribute values with line breaks or multiple whitespace characters, since not all user agents handle them correctly.
- Specified by:
- startElement in interface ContentHandler
public final void startEntity(String name) throws SAXException
SAX2: called before parsing a general entity in content
- Specified by:
- startEntity in interface LexicalHandler
public final void startPrefixMapping(String prefix, String uri)
SAX2: ignored.
- Specified by:
- startPrefixMapping in interface ContentHandler
public final void unparsedEntityDecl(String name, String publicId, String systemId, String notationName) throws SAXException
SAX1: called on unparsed entity declarations
- Specified by:
- unparsedEntityDecl in interface DTDHandler
public final void write(String data) throws SAXException
Writes the string as if characters() had been called on the contents of the string. This is particularly useful when applications act as producers and write data directly to event consumers.
public void writeElement(String uri, String localName, String qName, Attributes atts, String content) throws SAXException
Writes an element that has content consisting of a single string.
public void writeElement(String uri, String localName, String qName, Attributes atts, int content) throws SAXException
Writes an element that has content consisting of a single integer, encoded as a decimal string.
public void writeEmptyElement(String uri, String localName, String qName, Attributes atts) throws SAXException
Writes an empty element.