javax.mail.internet
Class MimeUtility
java.lang.Object
javax.mail.internet.MimeUtility
public class MimeUtility
extends java.lang.Object
This is a utility class that provides various MIME related functionality.
There are a set of methods to encode and decode MIME headers as per
RFC 2047. A brief description on handling such headers is given below:
RFC 822 mail headers must contain only US-ASCII characters. Headers that
contain non US-ASCII characters must be encoded so that they contain only
US-ASCII characters. Basically, this process involves using either BASE64
or QP to encode certain characters. RFC 2047 describes this in detail.
In Java, Strings contain(16 bit) Unicode characters. ASCII is a subset of
Unicode(and occupies the range 0 - 127). A String that contains only ASCII
characters is already mail-safe. If the String contains non US-ASCII
characters, it must be encoded. An additional complexity in this step is that
since Unicode is not yet a widely used charset, one might want to first
charset-encode the String into another charset and then do the
transfer-encoding.
Note that to get the actual bytes of a mail-safe String(say, for sending
over SMTP), one must do
byte[] bytes = string.getBytes("iso-8859-1");
The setHeader()
and addHeader()
methods on
MimeMessage and MimeBodyPart assume that the given header values are
Unicode strings that contain only US-ASCII characters. Hence the callers
of those methods must insure that the values they pass do not contain non
US-ASCII characters. The methods in this class help do this.
The getHeader()
family of methods on MimeMessage and
MimeBodyPart return the raw header value. These might be encoded as per
RFC 2047, and if so, must be decoded into Unicode Strings.
The methods in this class help to do this.
static InputStream | decode(InputStream is, String encoding) - Decode the given input stream.
|
static String | decodeText(String etext) - Decode "unstructured" headers, that is, headers that are defined as '*text'
as per RFC 822.
|
static String | decodeWord(String text) - The string is parsed using the rules in RFC 2047 for parsing an
"encoded-word".
|
static OutputStream | encode(OutputStream os, String encoding) - Wrap an encoder around the given output stream.
|
static OutputStream | encode(OutputStream os, String encoding, String filename) - Wrap an encoder around the given output stream.
|
static String | encodeText(String text) - Encode a RFC 822 "text" token into mail-safe form as per RFC 2047.
|
static String | encodeText(String text, String charset, String encoding) - Encode a RFC 822 "text" token into mail-safe form as per RFC 2047.
|
static String | encodeWord(String text) - Encode a RFC 822 "word" token into mail-safe form as per RFC 2047.
|
static String | encodeWord(String text, String charset, String encoding) - Encode a RFC 822 "word" token into mail-safe form as per RFC 2047.
|
static String | getDefaultJavaCharset() - Get the default charset corresponding to the system's current default
locale.
|
static String | getEncoding(DataHandler dh) - Same as getEncoding(DataSource) except that instead of reading the data
from an InputStream it uses the writeTo method to examine the data.
|
static String | getEncoding(DataSource ds) - Get the content-transfer-encoding that should be applied to the input
stream of this datasource, to make it mailsafe.
|
static String | javaCharset(String charset) - Convert a MIME charset name into a valid Java charset name.
|
static String | mimeCharset(String charset) - Convert a java charset into its MIME charset name.
|
static String | quote(String text, String specials) - A utility method to quote a word, if the word contains any characters
from the specified 'specials' list.
|
decode
public static InputStream decode(InputStream is,
String encoding)
throws MessagingException
Decode the given input stream.
The Input stream returned is the decoded input stream.
All the encodings defined in RFC 2045 are supported here.
They include "base64", "quoted-printable", "7bit", "8bit", and
"binary". In addition, "uuencode" is also supported.
is
- input streamencoding
- the encoding of the stream.
- decoded input stream.
decodeText
public static String decodeText(String etext)
throws UnsupportedEncodingException
Decode "unstructured" headers, that is, headers that are defined as '*text'
as per RFC 822.
The string is decoded using the algorithm specified in RFC 2047, Section
6.1.1. If the charset-conversion fails for any sequence, an
UnsupportedEncodingException is thrown. If the String is not an RFC 2047
style encoded header, it is returned as-is
Example of usage:
MimePart part = ...
String rawvalue = null;
String value = null;
try {
if ((rawvalue = part.getHeader("X-mailer")[0]) != null)
value = MimeUtility.decodeText(rawvalue);
} catch (UnsupportedEncodingException e) {
// Don't care
value = rawvalue;
} catch (MessagingException me) { }
return value;
etext
- the possibly encoded value
decodeWord
public static String decodeWord(String text)
throws ParseException,
UnsupportedEncodingException
The string is parsed using the rules in RFC 2047 for parsing an
"encoded-word".
If the parse fails, a ParseException is thrown. Otherwise, it is
transfer-decoded, and then charset-converted into Unicode. If the
charset-conversion fails, an UnsupportedEncodingException is thrown.
ParseException
- if the string is not an encoded-word as per
RFC 2047.
encode
public static OutputStream encode(OutputStream os,
String encoding)
throws MessagingException
Wrap an encoder around the given output stream.
All the encodings defined in RFC 2045 are supported here.
They include "base64", "quoted-printable", "7bit", "8bit" and "binary".
In addition, "uuencode" is also supported.
os
- output streamencoding
- the encoding of the stream.
- output stream that applies the specified encoding.
encode
public static OutputStream encode(OutputStream os,
String encoding,
String filename)
throws MessagingException
Wrap an encoder around the given output stream.
All the encodings defined in RFC 2045 are supported here.
They include "base64", "quoted-printable", "7bit", "8bit" and "binary".
In addition, "uuencode" is also supported. The filename
parameter is used with the "uuencode" encoding and is included in the
encoded output.
os
- output streamencoding
- the encoding of the stream.filename
- name for the file being encoded(only used with uuencode)
- output stream that applies the specified encoding.
encodeText
public static String encodeText(String text)
throws UnsupportedEncodingException
Encode a RFC 822 "text" token into mail-safe form as per RFC 2047.
The given Unicode string is examined for non US-ASCII characters. If the
string contains only US-ASCII characters, it is returned as-is. If the
string contains non US-ASCII characters, it is first character-encoded
using the platform's default charset, then transfer-encoded using either
the B or Q encoding. The resulting bytes are then returned as a Unicode
string containing only ASCII characters.
Note that this method should be used to encode only "unstructured"
RFC 822 headers.
Example of usage:
MimePart part = ...
String rawvalue = "FooBar Mailer, Japanese version 1.1"
try {
// If we know for sure that rawvalue contains only US-ASCII
// characters, we can skip the encoding part
part.setHeader("X-mailer", MimeUtility.encodeText(rawvalue));
} catch (UnsupportedEncodingException e) {
// encoding failure
} catch (MessagingException me) {
// setHeader() failure
}
text
- unicode string
- Unicode string containing only US-ASCII characters
encodeText
public static String encodeText(String text,
String charset,
String encoding)
throws UnsupportedEncodingException
Encode a RFC 822 "text" token into mail-safe form as per RFC 2047.
The given Unicode string is examined for non US-ASCII characters. If the
string contains only US-ASCII characters, it is returned as-is. If the
string contains non US-ASCII characters, it is first character-encoded
using the platform's default charset, then transfer-encoded using either
the B or Q encoding. The resulting bytes are then returned as a Unicode
string containing only ASCII characters.
Note that this method should be used to encode only "unstructured"
RFC 822 headers.
text
- the header valuecharset
- the charset. If this parameter is null, the platform's
default chatset is used.encoding
- the encoding to be used.
Currently supported values are "B" and "Q".
If this parameter is null, then the "Q" encoding is used if most of the
characters to be encoded are in the ASCII charset, otherwise "B"
encoding is used.
- Unicode string containing only US-ASCII characters
encodeWord
public static String encodeWord(String text)
throws UnsupportedEncodingException
Encode a RFC 822 "word" token into mail-safe form as per RFC 2047.
The given Unicode string is examined for non US-ASCII characters.
If the string contains only US-ASCII characters, it is returned as-is.
If the string contains non US-ASCII characters, it is first
character-encoded using the platform's default charset, then
transfer-encoded using either the B or Q encoding.
The resulting bytes are then returned as a Unicode string containing
only ASCII characters.
This method is meant to be used when creating RFC 822 "phrases". The
InternetAddress class, for example, uses this to encode it's 'phrase'
component.
text
- unicode string
- Unicode string containing only US-ASCII characters.
encodeWord
public static String encodeWord(String text,
String charset,
String encoding)
throws UnsupportedEncodingException
Encode a RFC 822 "word" token into mail-safe form as per RFC 2047.
The given Unicode string is examined for non US-ASCII characters.
If the string contains only US-ASCII characters, it is returned as-is.
If the string contains non US-ASCII characters, it is first
character-encoded using the platform's default charset, then
transfer-encoded using either the B or Q encoding.
The resulting bytes are then returned as a Unicode string containing
only ASCII characters.
text
- unicode stringcharset
- the MIME charsetencoding
- the encoding to be used.
Currently supported values are "B" and "Q".
If this parameter is null, then the "Q" encoding is used if most of the
characters to be encoded are in the ASCII charset, otherwise "B"
encoding is used.
- Unicode string containing only US-ASCII characters
getDefaultJavaCharset
public static String getDefaultJavaCharset()
Get the default charset corresponding to the system's current default
locale.
- the default charset of the system's default locale,
as a Java charset.
getEncoding
public static String getEncoding(DataHandler dh)
Same as getEncoding(DataSource) except that instead of reading the data
from an InputStream it uses the writeTo method to examine the data.
This is more efficient in the common case of a DataHandler created
with an object and a MIME type(for example, a "text/plain" String)
because all the I/O is done in this thread.
In the case requiring an InputStream the DataHandler uses a thread,
a pair of pipe streams, and the writeTo method to produce the data.
getEncoding
public static String getEncoding(DataSource ds)
Get the content-transfer-encoding that should be applied to the input
stream of this datasource, to make it mailsafe.
The algorithm used here is:
- If the primary type of this datasource is "text" and if all the bytes
in its input stream are US-ASCII, then the encoding is "7bit". If more
than half of the bytes are non-US-ASCII, then the encoding is "base64".
If less than half of the bytes are non-US-ASCII, then the encoding is
"quoted-printable".
- If the primary type of this datasource is not "text", then if all the
bytes of its input stream are US-ASCII, the encoding is "7bit". If
there is even one non-US-ASCII character, the encoding is "base64".
ds
- DataSource
- the encoding.
This is either "7bit", "quoted-printable" or "base64"
javaCharset
public static String javaCharset(String charset)
Convert a MIME charset name into a valid Java charset name.
charset
- the MIME charset name
- the Java charset equivalent.
If a suitable mapping is not available, the passed in charset is
itself returned.
mimeCharset
public static String mimeCharset(String charset)
Convert a java charset into its MIME charset name.
Note that a future version of JDK(post 1.2) might provide this
functionality, in which case, we may deprecate this method then.
charset
- the JDK charset
- the MIME/IANA equivalent.
If a mapping is not possible, the passed in charset itself is returned.
quote
public static String quote(String text,
String specials)
A utility method to quote a word, if the word contains any characters
from the specified 'specials' list.
The HeaderTokenizer class defines two special sets of delimiters -
MIME and RFC 822.
This method is typically used during the generation of RFC 822 and MIME
header fields.
specials
- the set of special characters
- the possibly quoted word