SOME/IP Serialization

Ravitejakv · ‎10-02-2018

Purpose of Serialization

When data needs to be stored on a disk or transmitted over a network, it might happen that the device reading this data has a different architecture, OS, number of bits used for addressing etc. A mechanism must be followed through which data storage and transmission are consistent and hence interoperability is achieved. Serialization is used to this end.

Serialization is the process of converting data from a complex data-structure into a stream of binary numbers which can then be saved on a disk or transmitted over a network. The main purpose of serialization is to save the current state of data and reconstruct it later in the same system or in a different environment. The reverse-process of restoring data from stream of bytes is called Deserialization. Figure below illustrates an example where a complex Struct containing various basic datatypes is serialized and deserialized.

Figure 1 Example of serialization and deserialization

A SOME/IP message can have one or more parameters being transmitted in it. For example, in an automobile, several ECUs might be interested in knowing the engine’s temperature, revolutions per minute (RPM) etc. The ECU in-engine unit transmits these values in a cyclic SOME/IP Event so that all other subscribed ECUs are notified about the values of these parameters. This Event contains three parameters – temperature, RPM and direction. If the engine ECU wants to transmit another parameter, say, ‘air-fuel ratio’ in addition to those mentioned above, it does so by including data related to this parameter in the payload. If the datatype of one or more parameters changes, the length of serialized payload also varies. This is illustrated in the following scenario: Engine temperature being transmitted in whole numbers can be changed to a floating-point number to get high accuracy. Hence, the length of this parameter, which was, say, 16 bits before, changes to either 32 or 64 bits.

In the example above, it can be observed that if any of the parameters change, the length of the serialized payload also changes. In order for the deserializer to know the exact position and length of the parameters in the payload received, it is important to maintain a form of understanding between the serializer and deserializer. Also, the serializer should be informed of any changes in parameters made by the user. In automotive industry, one of the solutions to establish an understanding between serializer and deserializer is using a ‘database file’ which contains configuration description of an ECU. It can have hundreds to thousands of parameters that define an ECU. When the user makes any changes to the ECU, they can simply update the database file about the changes. All applications that want to access this ECU are updated about the changes as they read the database file before accessing the ECU. Two such database files used in automotive industry are:

Field Bus Exchange Format (FIBEX) files by ASAM (officially ASAM-MCD-2 NET (FIBEX))
ARXML files by AUTOSAR

However, implementing serialization/deserialization using a database involves two broad steps – first, parsing the database file to determine the serialization parameters; then, implementing serialization/deserialization using these parameters.

Implementation in the SOME/IP Example project

The serialization/deserialization VIs in the example project allow the user to create easy to use serialization with predefined method parameters. Table below shows a list of a few (but not all) LabVIEW API functions divided into their respective datatypes. First column from the left shows list of VIs that can be used for serialization; the center column shows list of VIs that can be used for deserialization.

Table 1 Serialization and deserialization API functions

The last row in the table above shows polymorphic VIs for serialization and deserialization. Each polymorphic VI contains all the respective VIs shown in the first three rows of the table. Hence, the user needs to just drag-and-drop one of these polymorphic VI functions and select the required datatype from the selector dropdown of the polymorphic VI. All other VI functions are shown in the table to give an overview only and the user need not use them in their LabVIEW code (as polymorphic VIs are available). Once the required datatype is selected from the dropdown, the default VI icon changes to indicate the datatype selected. The figure below shows Polymorphic VI (expanded) used in this project for serialization.

Polymorphic VIs - Seriailzer (left) and Deserializer (right)

Serialization

The process of serialization mainly depends on whether the parameters are:

basic datatypes or complex datatypes
of variable length or fixed length

Serialization is, generally, used on the transmission side of the SOME/IP network. Figure below gives an overview of various steps involved in transmitting a SOME/IP message.

Figure 2 Overview of SOME/IP message transmission

Database represents the XML or ARXML file that contains configuration options for the ECU in the SOME/IP network. This database file must be read using a suitable parser in order to determine the configuration parameters required for serialization. These configuration parameters are, for example, a list of objects in which each object contains all the necessary parameters required for serialization of a single user-input payload data Thus, the number of objects in this list is the same as number of user-input payload data parameters
Serializer then takes user-input payload data and serializes it into serialized payload in accordance with the configuration parameters
LabVIEW SOME/IP API is used to generate the SOME/IP message that contains the serialized payload
SOME/IP message is then transmitted over the Ethernet medium with the help of a SOME/IP Transport protocol such as TCP or UDP

The modules Database Parser (indicated by ① in the figure above) and Serializer (indicated by ②) together form the SOME/IP Serialization module which is a complete implementation of SOME/IP with serialization in LabVIEW. However, the Database Parser is not currently implemented in this API. Instead, it has been assumed that Configuration parameters are read using a Database Parser and are available to implement the Serializer module. Hence, the user must input the configuration parameters into the 'DB Params' control present in ReadDatabase.vi by manually looking up the database file.

Parameters required for serialization and deserialization are:

Byte order
Datatype
Alignment (currently not implemented in this API. Outputs the same alignment as input)
Applicable to arrays:
- Length-field size: Indicates the number of bytes used to carry length information of variable length arrays
- Dimension (currently only 1D arrays are supported)
- Variable or fixed length
- Minimum and maximum size in case of variable length strings
Applicable to strings:
- Length-field size: Indicates the number of bytes used to carry length information of variable length strings
- Variable or fixed length
- Minimum and maximum size in case of variable length strings
- String encoding

Database Parser is assumed to output configuration parameters as shown in Figure 3 (left-side image). It is implemented as an array of clusters (with the name DB Params) in LabVIEW, with each cluster containing various parameters - such as ByteOrder, Alignment, Datatype, ArrayParams, StringParams - required for serialization and deserialization. ArrayParams and StringParams are, in turn, clusters used for serialization of arrays and strings respectively and are ignored during serialization of all other datatypes. However, Alignment parameter is currently ignored inside this API (i.e., alignment of parameters is not currently implemented). Individual controls contained in DB Params are expanded as shown in Figure 3 (right-side image).

Every VI used for serialization has an input called Index. It is used to specify the position of the current parameter being serialized in the payload. The value of Index determines which element from the array of configuration parameters (DB Params) will be used for serialization of current user-input parameter.

Figure 3 Assumed format for configuration parameters in LabVIEW (left) and individual controls of DB params expanded (right)

Deserialization

Deserialization is the process of converting data from on-the-wire format to their respective data structures. It is the inverse process of serialization. Deserialization of data is, generally, performed on the receiving‑side of a communication system. In a SOME/IP network, clients are the receivers of messages such as Responses, and Event and Field notifications. Hence, deserialization is performed, generally, on the client-side. However, deserialization is performed on the server-side when it receives Requests. Like serialization, the process of deserialization also depends not only on datatype of parameters, but also whether they are fixed length or variable length. Figure 4 shows an overview of steps involved during deserialization.Figure 4 Overview of SOME/IP message reception

SOME/IP message, which is transmitted over Ethernet using TCP/UDP, is received at the other end of SOME/IP communication
LabVIEW SOME/IP API separates the payload data from the received SOME/IP message
This payload is then input to the SOME/IP Deserialization API. The Database file is read using a suitable parser to determine the configuration parameters
Deserializer is used to deserialize the payload in accordance with the configuration parameters. Payload data which is in on-the-wire format is converted to user-readable format in their respective datatypes

The module Deserializer together with Database Parser forms the SOME/IP Deserialization module which is a complete implementation of SOME/IP with deserialization in LabVIEW. Configuration parameters used here are the same as shown in Figure 3. Also, like serialization VIs, every deserialization VI has an input parameter called Index whose value indicates the position of configuration parameters in the DB Params cluster to be used for deserialization of the current parameter.

Although the process of deserialization is the inverse of serialization, additional steps are necessary to be performed in order to ensure data integrity. It includes checking whether the length of received data is as expected, arrays have length fields inserted and strings start with a BOM etc.

In this implementation of LabVIEW SOME/IP API, serialization and deserialization of the following datatypes are supported:

Boolean numbers (transmitted as unsigned 8-bit integers)
Signed and unsigned integers of length 8, 16, 32 and 64 bits
Single and double precision floating-point numbers
Fixed and variable length arrays
Fixed and variable length strings

Example

A simple example is shown in the figure below which demonstrates the usage of serialization and deserialization API functions. Various input and output terminals are labelled to indicate their respective usage or significance.Figure 5 Input and output terminals of serialization and deserialization VIs labelled

On the serialization-side of the LabVIEW code, the input parameters must be connected in ascending order starting from the first parameter (index 0 in the figure) till the last parameter (index 2 in the figure). These indices are used to identify the necessary elements containing the required configuration parameters in the ‘DB Params in’ array. Hence, it is important to make sure that the input parameters are connected in the correct order and, also, that the index values indicate the correct element-positions in the DB Params in array. Therefore, if there are ‘n’ user-input parameters, the index values range from 0 to n-1. However, on the deserialization-side of the LabVIEW code, the output parameters connected can appear in any order. In the example shown, they are connected in the ascending order; however, for example, the VI used for deserializing the ‘1st param’ can be connected after either the VI for ‘2nd param’ or ‘3rd param’. The offset of each parameter is calculated internally in every VI and the value of the parameter is copied separately to be output to the user. This enables deserialization of user parameters in any required order. However, like for serialization, the index values connected must correspond to the exact position of applicable configuration parameters in the ‘DB Params in’ array.

Arrays

AUTOSAR specifies that all arrays must be transmitted along with a ‘Length field’ at the beginning that indicates the length of the array that follows. This helps to allocate appropriate buffer-size during deserialization on the receiving end. This is achieved using the ‘Insert length-field’ block. AUTOSAR specifies that the type of length field can be either uint8, uint16 or uint32. So, the possible length of array size with these values for length-field is shown in the table below.

Length field type	Maximum possible array size (bytes)
None	No length-field inserted^#
uint8	256
uint16	65536
uint32	2147483647^*

Table 2 Maximum possible array size according to the length-field type chosen

#This value is not valid. AUTOSAR specifies that the length-field must always be inserted in payload just before the array data starts.

*Theoretical possible value is 2³² = 4294967296 bytes. But, maximum possible length of an array in 32-bit LabVIEW is 2147483647

Strings

Strings, like other datatypes, are stored or transmitted over a network as a stream of bits. When dealing with scripts of various possible languages used in the world, it can be easily observed that there are tens of thousands of alphabets with which human beings write text in their respective languages. If one were to assign a number to each alphabet/character of every language in the world, then each character on a computer would consume several bytes of memory instead of only a few. To facilitate using thousands of characters using few bytes of memory, various encoding formats have been specified globally. ASCII (American Standard Code for Information Interchange) encoding and Unicode encoding are two of the prominent encoding schemes used across the world.

Unicode encoding standard assigns a unique number for all characters of various languages of the world. Thus, implementing Unicode enables software compatibility between various software platforms, devices and applications. Currently, there are three different forms of Unicode: UTF-8 (8-bit Unicode Transformation Format), UTF-16 (16-bit Unicode Transformation Format) and UTF-32 (32-bit Unicode Transformation Format). UTF-8 and UTF-16 are described here as they are used in this project.

UTF-8

UTF-8 encoding uses one to four 8-bit sequences to encode all the valid code points defined by Unicode [22]. When only one byte is used, the first seven bits are used to represent the ASCII characters. Thus, UTF-8 is compatible with ASCII encoding. Two bytes are required to encode the next 1920 characters; three bytes are used to encode a large number of characters including Chinese, Japanese, Korean scripts; four bytes are used to encode the rest of all the characters including various fonts, symbols and emojis.

UTF-16

UTF-16 encoding can be used to encode all the valid code points defined by Unicode. UTF-16 is also a variable-length encoding which uses one or two 16-bit sequences for encoding. Thus, the size of each character is either 16 or 32 bits. Since UTF-16 has two bytes in a single 16-bit unit, its endianness changes depending on the order in which these bytes appear. Hence, two possible UTF-16 encodings exist: UTF-16LE (UTF-16 Little Endian) and UTF-16BE (UTF-16 Big Endian).

In addition to Unicode, there exists another multibyte encoding method called as Multibyte Character Set (MBCS). In this method, variable number of bytes are used to encode various characters.
Windows OS supports MBCS to display text in various international languages. In this method, generally, two bytes are used to represent strings of which the first seven bits of the first byte are the ASCII characters and the other higher bits are used to display text based on the language used by the OS. Hence, a given string value is interpreted differently in operating systems having different languages and code pages.

A few examples are shown in Table 3. Three characters are shown: Latin character ‘A’, Latin capital letter A with Diaeresis (or Umlaut) ‘Ä’ and Japanese Katakana letter ‘ア’ (equivalent to Latin ‘A’ and pronounced as the same).

It can be observed from the table that length of UTF-8 encoded characters varies between 1 and 3 bytes, whereas that of UTF-16 remains 2 bytes. For certain languages, lengths of UTF-8 and UTF-16 encodings can be as long as 4 bytes. On the other hand, length of strings encoded in MBCS encoding varies between 1 and 2 bytes. Usually, the first seven least-significant bits are encoded similar to ASCII and other most-significant bits are used to display characters specific to the language. In the table below, it can be observed that the most-significant bit for ‘A’ is 0 (same as ASCII) and for the other two characters is 1.

Table 3 Unicode encoding examples

Strings in LabVIEW are implemented as Multibyte Character Set (MBCS) strings. LabVIEW for Windows displays strings on the User Interface (UI) using the selected Windows code page. Hence, support for non-ASCII characters depends on the locale of the computer being used. In order to support multiple languages on the same computer, support for Unicode is needed.

In SOME/IP communication, AUTOSAR supports transmission of strings in various Unicode encoded formats. However, in this API, currently, UTF-8, UTF-16LE (UTF-16 Little Endian) and UTF-16BE (UTF-16 Big Endian) encoding formats are used as on-the-wire formats for transmission of strings. Since LabVIEW (not LabVIEW NXG editions) implements Multi-byte Encoding, it is necessary to convert the LabVIEW strings to either UTF-8, UTF-16LE or UTF-16BE encoding before transmitting them over the SOME/IP network. Each string transmitted over SOME/IP network must have a Byte Order Mark (BOM) which is a Unicode character used to indicate the string encoding and its endianness.

Table 4 shows the possible BOM values depending on string encoding. As can be seen, UTF-8 uses a BOM of three bytes long whereas UTF-16 strings use two-byte BOM. Strings encoded in UTF-16 format should contain BOM value at the beginning of the string. This is, however, optional for UTF-8 strings. In this serialization API, BOM value is checked for at the beginning of each string; if it is not found, then the appropriate BOM value is inserted.

Table 4 BOM values for different encodings

A library which takes care of conversion between ASCII and Unicode encoding formats, and conversion between different Unicode-encoded formats, has been used in this API which is available at https://forums.ni.com/t5/Reference-Design-Content/LabVIEW-Unicode-Programming-Tools/ta-p/3493021.

LabVIEW Automotive Ethernet

SOME/IP Serialization

SOME/IP Serialization