Next: Examples, Previous: TLD Functions, Up: Top
A deficiency in the specification of Unicode Normalization Forms has been found. The consequence is that some strings can be normalized into different strings by different implementations. In other words, two different implementations may return different output for the same input (because the interpretation of the specification is ambiguous). Further, an implementation invoked again on the one of the output strings may return a different string (because one of the interpretation of the ambiguous specification make normalization non-idempotent). Fortunately, only a select few character sequence exhibit this problem, and none of them are expected to occur in natural languages (due to different linguistic uses of the involved characters).
A full discussion of the problem may be found at:
http://www.unicode.org/review/pr-29.html
The PR29 functions below allow you to detect the problem sequence. So when would you want to use these functions? For most applications, such as those using Nameprep for IDN, this is likely only to be an interoperability problem. Thus, you may not want to care about it, as the character sequences will rarely occur naturally. However, if you are using a profile, such as SASLPrep, to process authentication tokens; authorization tokens; or passwords, there is a real danger that attackers may try to use the peculiarities in these strings to attack parts of your system. As only a small number of strings, and no naturally occurring strings, exhibit this problem, the conservative approach of rejecting the strings is recommended. If this approach is not used, you should instead verify that all parts of your system, that process the tokens and passwords, use a NFKC implementation that produce the same output for the same input.
Technically inclined readers may be interested in knowing more about the implementation aspects of the PR29 flaw. See PR29 discussion.
pr29.h
To use the functions explained in this chapter, you need to include the file pr29.h using:
#include <pr29.h>
in: input array with unicode code points.
len: length of input array with unicode code points.
Check the input to see if it may be normalized into different strings by different NFKC implementations, due to an anomaly in the NFKC specifications.
Return value: Returns the
Pr29_rc
valuePR29_SUCCESS
on success, andPR29_PROBLEM
if the input sequence is a "problem sequence" (i.e., may be normalized into different strings by different implementations).
in: zero terminated array of Unicode code points.
Check the input to see if it may be normalized into different strings by different NFKC implementations, due to an anomaly in the NFKC specifications.
Return value: Returns the
Pr29_rc
valuePR29_SUCCESS
on success, andPR29_PROBLEM
if the input sequence is a "problem sequence" (i.e., may be normalized into different strings by different implementations).
in: zero terminated input UTF-8 string.
Check the input to see if it may be normalized into different strings by different NFKC implementations, due to an anomaly in the NFKC specifications.
Return value: Returns the
Pr29_rc
valuePR29_SUCCESS
on success, andPR29_PROBLEM
if the input sequence is a "problem sequence" (i.e., may be normalized into different strings by different implementations), orPR29_STRINGPREP_ERROR
if there was a problem converting the string from UTF-8 to UCS-4.
rc: an
Pr29_rc
return code.Convert a return code integer to a text string. This string can be used to output a diagnostic message to the user.
PR29_SUCCESS: Successful operation. This value is guaranteed to always be zero, the remaining ones are only guaranteed to hold non-zero values, for logical comparison purposes.
PR29_PROBLEM: A problem sequence was encountered.
PR29_STRINGPREP_ERROR: The character set conversion failed (only for
pr29_8()
andpr29_8z()
).Return value: Returns a pointer to a statically allocated string containing a description of the error with the return code
rc
.