Home | Developer's corner | Knowledge base | Working documents | Documentation | Tutorials | Training | How do I | Frequently asked questions | Technical support |
How do I make my test suites Unicode-aware?
The upcoming OpenTTCN Tester 2.57.0 release will have full support for Unicode and universal charstring type. This means that you can now seamlessly mix multiple human languages in the same test suite and even in the same module or test case. This long-awaited addition makes OpenTTCN even more suitable for testing of Internet and internationalized protocols.
In this article we explain how you can deal with Unicode using OpenTTCN and what you can expect from the tool in this regard.
Contents |
End user can observe Unicode awareness of the tool through one the following interface points:
importer3)
tester, or through standard interfaces like TCI-TM
While the end user is not directly concerned with the internal storage format of universal charstring literals in the precompiled test suite in the OpenTTCN system repository, some basic knowledge of this may be beneficial, too.
Starting with 2.57.0.RC2, default document encoding for TTCN-3 and ASN.1 source files changed from ISO-8859-1 to UTF-8. You need to explicitly specify alternative encoding using the --encoding switch of one of the importer3 commands if you use anything else than UTF-8. Argument of the --encoding option can be any value from the "Canonical Name for java.nio API" column of the following table:
http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html
Note that the table in the link above is applicable to Java SE 6, so not all encodings may be supported by earlier versions of Java. Also note that this dependency on Java version is applicable only during translation phase and not during execution, because OpenTTCN runtime has no dependency on Java.
UTF-8 is just one encoding for characters from the UCS charset. While UTF-8 encoding can represent all characters from the UCS charset, some other encodings cannot, and they are limited only to a certain character repertoire that is a subset of full UCS charset. No matter what is the original encoding of the document however, after translation all universal charstring literals are normalized and are stored in the OpenTTCN repository in some uniform way that is independent of the original document encoding. While repository storage of universal charstring literals is transparent for the user, it guarantees to be able to properly represent all Unicode code points. This same normalized 32-bit representation of each Unicode character is also applicable to OpenTTCN executor during runtime, and it is also transparent for the user.
End user can observe Unicode characters during runtime only through the TCI data interface. C version of this interface allocates 32 bits per each universal character, thus allowing to represent full UCS character repertoire in UTF-32 form which is effectively a raw unencoded representation of UCS. Java version of this interface uses built-in Java String type to represent universal charstring literals, hence it effectively uses UTF-16 encoding to represent UCS character repertoire.
UTF-8 documents with BOM mark EF BB BF in the beginning of the document are currently not supported, so you must remove the BOM mark from a UTF-8 document if it is present in order to have it parsed by importer3 correctly. For example, in the EditPlus text editor you can do this from here (as of version 2.31 (406)):
Tools | Preferences | Files | UTF-8 signature | Always remove signature
The following algorithm applies to detection of a character string literal type during module translation by importer3:
| Algorithm item | Example |
|---|---|
|
"Hi!"U |
| "Здравствуйте!" |
| var universal charstring v1 := "Hi!";
|
| p.send("Hi!");
|