idna2008 is a ground-up, pure-Haskell implementation of
Internationalized Domain Names in Applications (IDNA 2008):
- RFC 5891 (Protocol)
- RFC 5892 (Tables)
- RFC 5893 (Right-to-Left Scripts / Bidi)
- RFC 5895 (Mappings)
- RFC 3492 (Punycode)
Codepoint tables are derived from the Unicode Character Database
(17.0.0) by an in-tree generator script. Later Unicode editions can
be easily swapped in. There are no external C dependencies:
neither libicu nor libidn are required.
The motivating drivers for this implementation were:
To implement IDNA2008 faithfully, without some of the
questionable additions from UTS #46.To model the transformations as either parsing of textual
inputs to aShortByteStringDNS wire-form, or decoding
from a DNS wire-form to presentation-form text. This is
more flexible than just the “toASCII” or “fromASCII”
text-to-text APIs.To handle application-configurable mixtures of label “forms”,
allowing e.g. parsing of names like “*.αβγ.gr” if the application
also wants to admit “wildcard” labels as well, and perhaps
examine the classification of each parsed label.To avoid external C library dependencies.
Two configuration knobs drive most of the behaviour:
LabelFormSet— a constraint on which kinds of labels are
admissible when parsing or decoding (after any mappings
are applied when parsing).IDNAOpts— a flag set controlling validation strictness and
use of mappings.
The extreme example below shows the JSON representation of the
decoding of an input that exhibits all the label forms, the DNS
presentation form of the wire domain, and the decoding of that
back to text, when non-default settings admit all those forms:
{
"input": {
"text": "*._tcp.abc$def.la--la--la.xn--ls8h.хn--нет.αβγ.example",
"forms": [
"WILDLABEL",
"ATTRLEAF",
"OCTET",
"RLDH",
"FAKEA",
"LAXULABEL",
"ULABEL",
"LDH"
]
},
"presentation": "*._tcp.abc\\$def.la--la--la.xn--ls8h.xn--n---tdd3b5ap.xn--mxacd.example",
"output": {
"text": "*._tcp.abc\\$def.la--la--la.💩.хn--нет.αβγ.example",
"forms": [
"WILDLABEL",
"ATTRLEAF",
"OCTET",
"RLDH",
"LAXULABEL",
"LAXULABEL",
"ULABEL",
"LDH"
]
}
}
The “input” element’s “text” field shows what the parser read.
The “output” element shows the result of decoding the wire domain with
the shown DNS zone presentation form.
The two LAXULABEL forms in the output above are admitted only when explicitly
requested; in normal use they would be left as ASCII, designated FAKEA
(a label that looks like an A-label, based on its xn-- prefix, but doesn’t
decode to a valid U-label), if that form is allowed. Otherwise the decoder
would report an error.
The recently announced dnsbase library meshes with idna2008 as a
parser implementation provider for its dnLit compile-time literal
domain TH splice, and idna2008 can also be used to render wire-form
domains to text with valid A-labels converted to U-label form.
Links
- Hackage: https://hackage.haskell.org/package/idna2008
- Source: https://github.com/vdukhovni/idna2008
- Companion: https://hackage.haskell.org/package/dnsbase
Feedback, bug reports and PRs welcome.