From OpenTTCN DocZone

Jump to: navigation, search

  OpenTTCN DocZone

  Home | Developer's corner | Knowledge base | Working documents | Documentation | Tutorials | Training | How do I | Frequently asked questions | Technical support

Article's permanent URL

Last modified November 14, 2007

How do I make my test suites Unicode-aware?


The upcoming OpenTTCN Tester 2.57.0 release will have full support for Unicode and universal charstring type. This means that you can now seamlessly mix multiple human languages in the same test suite and even in the same module or test case. This long-awaited addition makes OpenTTCN even more suitable for testing of Internet and internationalized protocols.

In this article we explain how you can deal with Unicode using OpenTTCN and what you can expect from the tool in this regard.

Where Unicode shows up

Contents

End user can observe Unicode awareness of the tool through one the following interface points:

  • TTCN-3 source code used as an input for translator (importer3)
  • log output, obtained either through a command-line tool like tester, or through standard interfaces like TCI-TM
  • by dealing with TCI data interface that has an API for introspection and construction of values of universal charstring type

While the end user is not directly concerned with the internal storage format of universal charstring literals in the precompiled test suite in the OpenTTCN system repository, some basic knowledge of this may be beneficial, too.

How Unicode support works

Starting with 2.57.0.RC2, default document encoding for TTCN-3 and ASN.1 source files changed from ISO-8859-1 to UTF-8. You need to explicitly specify alternative encoding using the --encoding switch of one of the importer3 commands if you use anything else than UTF-8. Argument of the --encoding option can be any value from the "Canonical Name for java.nio API" column of the following table:

http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html

Note that the table in the link above is applicable to Java SE 6, so not all encodings may be supported by earlier versions of Java. Also note that this dependency on Java version is applicable only during translation phase and not during execution, because OpenTTCN runtime has no dependency on Java.

UTF-8 is just one encoding for characters from the UCS charset. While UTF-8 encoding can represent all characters from the UCS charset, some other encodings cannot, and they are limited only to a certain character repertoire that is a subset of full UCS charset. No matter what is the original encoding of the document however, after translation all universal charstring literals are normalized and are stored in the OpenTTCN repository in some uniform way that is independent of the original document encoding. While repository storage of universal charstring literals is transparent for the user, it guarantees to be able to properly represent all Unicode code points. This same normalized 32-bit representation of each Unicode character is also applicable to OpenTTCN executor during runtime, and it is also transparent for the user.

End user can observe Unicode characters during runtime only through the TCI data interface. C version of this interface allocates 32 bits per each universal character, thus allowing to represent full UCS character repertoire in UTF-32 form which is effectively a raw unencoded representation of UCS. Java version of this interface uses built-in Java String type to represent universal charstring literals, hence it effectively uses UTF-16 encoding to represent UCS character repertoire.

UTF-8 documents with BOM mark EF BB BF in the beginning of the document are currently not supported, so you must remove the BOM mark from a UTF-8 document if it is present in order to have it parsed by importer3 correctly. For example, in the EditPlus text editor you can do this from here (as of version 2.31 (406)):

Tools | Preferences | Files | UTF-8 signature | Always remove signature

Viewing test log results

  • starting with 2.57.0.RC2, test log output is UTF-8 coded, hence universal charstring literals are represented using UTF-8; if there are characters in the log that you do not recognize, we recommend saving log to a file and then opening this file with a browser or UTF-8 aware editor like EditPlus
  • if you execute tests on a Linux server through Windows client using PuTTY, you can configure PuTTY to display log output directly as UTF-8. Here is a relevant setting in PuTTY: "Window / Translation / Received data assumed to be in which character set: UTF-8". This will then let you view the results of execution in the PuTTY terminal window directly "as is" without any further adjustments of the encoding or the codepage.
  • under Windows, UTF-8 is code page 65001, so at least in theory you could do the following from the Windows terminal window: "chcp 65001", and this is supposed to change the codepage of this window to UTF-8. This sometimes works and sometimes does not. Try changing terminal window font from raster to true-type (Lucida Console) to enable this feature, as it may not work for raster fonts.

Detection of character string literal type

The following algorithm applies to detection of a character string literal type during module translation by importer3:

Table 1: Character string literal type recognition algorithm
Algorithm item Example
  • if the character string literal ends with the U postfix, then it is treated as a universal charstring value
    • note that the U postfix is a proprietary OpenTTCN extension introduced to promote stronger typing in TTCN-3

"Hi!"U

  • otherwise, if the character string literal contains non-ASCII characters (i.e. the ones outside of ISO-646 character range), then it is treated as a universal charstring value
"Здравствуйте!"
  • otherwise, if the context in which the character string literal is used dictates it to be a universal charstring value (for example, LHS of an assignment is a variable of universal charstring type), then it is treated as a universal charstring value
var universal charstring v1 := "Hi!";
  • otherwise, the character string literal is treated as plain non-universal charstring value
p.send("Hi!");

Test suites known to use Unicode

  • OMA BCAST ATS
Views
Personal tools