NI Blog

Community Browser
cancel
Showing results for 
Search instead for 
Did you mean: 

Re: Designing LabVIEW NXG: How Unicode Benefits You

Active Participant

How Unicode in LabVIEW NXG Benefits You

 

One of the biggest changes in LabVIEW NXG is how the string data type works. Like many other modern programming languages, the updated LabVIEW NXG string type supports Unicode. With this change, you can use the more than 100,000 characters that are currently defined by the Unicode Consortium. Because many applications are science- and engineering-focused, LabVIEW NXG now natively supports many scientific symbols, such as the delta symbol (Δ). Moreover, supporting Unicode greatly improves the ability to create applications that need to function across languages and locales.

 

String Encoding Basics

 

Before diving into how string encoding works in LabVIEW and LabVIEW NXG, it’s important to have a basic understanding of what encoding means. At the highest level, string encoding refers to how a computer stores the binary data that represents human-readable text.

 

String encoding isn’t a topic unique to LabVIEW. You can find many excellent articles about the basics of string encoding, including this article that highlights the required basics. You can skip the latter section, entitled “Encodings and PHP.”

 

The rest of this document assumes that you’ve read or you understand the concepts described in the link above. I highly recommend reading the linked content as a refresher, even if you are familiar with string encoding.

 

In this document, the term “character” is used interchangeably with “code point.”

 

String Functions and Encoding in LabVIEW 

 

In LabVIEW, you can use the string data type for both text and binary data because LabVIEW uses Extended ASCII as its character encoding, just like many early programming languages. Therefore, LabVIEW string functions with length or offset parameters operate “by byte”. LabVIEW users interpret this as “by character” if they are operating on textual data. However, they interpret it as “by bytes” if they are operating on binary data. The API Changes and Examples section below demonstrates this.

 

In Extended ASCII, the first zero through 127 characters follow the ASCII standard and are a single byte. Characters beyond the first 128 are defined by the system’s code page. In English and many Western European languages, the most common code page is Windows-1252. This code page defines some of the most common accented, Latin, and special characters used in Western European languages as a single byte. Because multibyte characters do not exist in Windows-1252, the copyright symbol (©) is represented by a single byte (xA9) on a system where the default code page is Windows-1252.

 

Now, in Japanese systems, the most common code page is Shift JIS. Shift JIS contains the same set of 128 ASCII characters that are a single byte; however, it defines some useful Japanese characters, and can even use multiple bytes for a single character.

 

LabVIEW does offer unofficial support for Unicode. If you have developed code that needs to function in various locales, you may have encountered issues dealing with non-ASCII characters and different code page settings.

 

String Functions and Encoding in LabVIEW NXG

 

Creating LabVIEW NXG provided an opportunity to do as most modern programming languages have done, and truly support Unicode. In LabVIEW NXG, strings are encoded using UTF-8.

 

UTF-8 is a Unicode variable-length multibyte character encoding that can represent every code point established in the Unicode Standard. One of the most beneficial characteristics of UTF-8 is that it is 100 percent compatible with ASCII. The 128 characters that make up standard ASCII have the exact same binary representation in ASCII as they do in UTF-8. Using variable widths for characters also saves memory by not inflating every character to use a defined number of bytes.

 

The string functions in LabVIEW NXG operate on a “by-character” basis because that is how users think of textual data. Because the String Length function needs to return the number of characters in the string regardless of the number of bytes the string takes up in memory, the data in a string must be properly formatted UTF-8 data.

 

Paradigm Shift: Binary Versus String Data

 

The biggest paradigm shift when using LabVIEW NXG is that you cannot use the string data type to represent both binary and string data.

 

If you are dealing with binary data, you should use an array of U8 integers. If you are dealing with text, you should use a string data type. The following section shows how to accomplish the same goal in LabVIEW and LabVIEW NXG.

 

String nodes in LabVIEW NXG expect properly formatted UTF-8 string data. Type casting any data type to a string may cause unexpected behavior when displaying or manipulating the string. Type casting to a string may be prevented in the future.

 

API Changes and Examples

 

Just as string encoding has changed, so have LabVIEW NXG application program interfaces (API). If your algorithm is expecting binary data, you need to make some changes. If your algorithm is just manipulating textual data, it should function identically.

 

Lengths and Offsets

 

One of the most prominent changes involves any string node that has a length or offset parameter. In LabVIEW (Extended ASCII encoding), all characters are a single byte. This means that nodes with length or offset parameters function in units of bytes. In LabVIEW NXG (UTF-8 encoding), this is not the case. Any time you have a string node that has a length or offset parameter, it is in units of characters. Because characters can be multiple bytes in UTF-8, LabVIEW NXG Runtime needs to walk the string and count the number of characters instead of just jumping to a point in memory. One downside is that it turns string functions with a length or offset into Linear Order functions instead of Constant Order functions. If your algorithm nests string functions with lengths and offsets, you create an exponential function that carries a higher performance burden.

 

This also means that, if you’ve casted data to a string data type and you use the String Length function, the runtime can throw errors. The runtime is expecting UTF-8 data, and if there is random data in memory, the function doesn’t work.

 

 

 

pic 1.png

 

pic 2.png

 

Flattening to String Versus Flattening to Byte Array

 

In LabVIEW, the Flatten to String function essentially means flatten to binary data. If you flatten a number to a string, it isn’t a human-readable string, it is the binary representation of the number. In LabVIEW NXG, we’ve replaced this with Flatten and Unflatten to Byte Array functions.

 

A rule of thumb: never type cast any data type to a string.

 

pic 3.png

 

 

pic 4.png

 

Sending Binary Data to an Instrument or Network

 

Many instruments and network devices send and receive raw binary data. In LabVIEW NXG, the VISA, TCP, and UDP read/write commands offer an option to specify whether you want to work with strings or bytes.

 

Much of your existing code probably uses binary data or a mix of binary and string data. For these cases, you should use the binary configuration of these functions.

 

In the example below, the user is trying to send numeric data between the server and client via TCP communication. When the data being sent has nothing to do with text, you should use only byte arrays for data manipulation before sending and after receiving data from the TCP API.

 

In LabVIEW, you can type cast to string and call string length to determine how many bytes of information are present. This does not work in LabVIEW NXG and causes errors in your application. Any time you use the string length node, you are asking the runtime to count the number of Unicode code points present in the string.

 

To convert this code sample, you need to replace the Type Cast nodes with Flatten to Byte arrays and replace the String Length function with an Array Size function. After doing this, you just need to use the binary configuration of the TCP functions.

 

 

pic 5.png

 

 

Working With Different Encodings

 

You may need to interface with other string encodings outside of LabVIEW NXG; for example, if you need to share data files between LabVIEW and LabVIEW NXG. In LabVIEW NXG, several functions help you encode and decode binary string data.

 

In the example below, we need to write a binary file in LabVIEW NXG to use with an existing LabVIEW application. We create a binary file using LabVIEW NXG, but because LabVIEW uses a different encoding than NXG, we convert the string to binary data using the encoding that LabVIEW expects. Using the String to Byte Array node in LabVIEW NXG, you can specify a few common encodings to ensure that the LabVIEW example can parse through the binary data and decode it into a string.

 

 

pic 6.png

 

Converting Code From LabVIEW to LabVIEW NXG

 

The LabVIEW Code Conversion Utility helps with some of the differences described above. For example, the Flatten to String function doesn’t exist in LabVIEW NXG. The conversion process mutates all instances of Flatten to String with a Flatten to Byte Array node and also logs a FlattenToStringReplacedByFlattenToByteArray conversion message in the conversion report. This helps you flag and manually inspect each instance. A manual inspection is required because the downstream nodes need to use the byte array data type instead of string data type.

 

Another mutation occurs if you use the Byte Array to String function in LabVIEW. In LabVIEW NXG, this node has an input that specifies what string encoding to use. The conversion utility creates an Extended ASCII enum constant on this terminal to preserve LabVIEW runtime behavior. Each instance logs a ByteArrayToStringChanged message encouraging you to review it, because the runtime behavior doesn’t change.

 

While we can flag and help mutate code to the correct usage in LabVIEW NXG, we can’t automatically fix all uses. A good rule of thumb is that, if your virtual instrument is purely working with text, the converted GVI should function identically; however, if you were using LabVIEW string functions to manipulate binary data carried on a string wire, it requires manual updates.

 

 

Related Links:

 

LabVIEW NXG 3.0 Behavior Changes 

Regards,

Jon S.
National Instruments
LabVIEW NXG Product Owner
Comments
Member

In LabVIEW you can do some string operations more easily using a byte array (such as swapping characters). Is there an improved range of string manipulation functions (on the character level) to replace this pathway?

Active Participant

Thanks for the question pauldavey,

 

I'm not 100% sure I'm following exactly what you described. It sounds like in LabVIEW you cast a string into a byte array and then do some byte switching to re-arrange characters. The array palette has a number of functions that do have an analogous string function that does work on characters. Here are some examples.

 

Array Size » String Length

Build Array » Concatenate String

Split 1D Array » String Subset

Rotate 1D Array » Rotate String

Reverse 1D Array » Reverse String

 

IF you know that the string data is only ASCII characters, you can still use the Flatten/Unflatten from Byte Array functions to do this.

 

Alternatively, all of the string functions should work by characters so any algorithm you use with the string APIs should work. If you have an example to share, I could probably help more.

Regards,

Jon S.
National Instruments
LabVIEW NXG Product Owner