LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

UTF-8 Output from Excel Byte Stream

Solved!
Go to solution

Sure here it is but it is really trivial. The current default value in the string is when running it with the Language for Non-Unicode applications set to Chinese (Simplified, China).

 

Untitled.png

 

As Andrey mentions, are you sure you are not just trying to look at it wrongly? This file seems to be UTF-8 encoded. LabVIEW strings are not understanding UTF-8 encoding (yet)!. To display Simplified Chinese characters in a non-Unicode application the string needs to use Code Page 936, also known as GBK or GB2312.

 

You can try to enable UTF-8 encoding for Non-Unicode applications by checking the according checkbox in the Control Panel dialog where you select the language, but while this has been in Windows since version 7, it has remained an experimental Beta feature until today and never was declared a fully operational feature (and probably never will as most applications move to full Unicode support anyhow and those that don't are having all kinds of trouble to achieve full operability even with this feature enabled). With that checkbox enabled the string should look correctly if your file is fully UTF-8 encoded, but it's still a Beta feature in Windows.

 

 

 

 

Rolf Kalbermatter  My Blog
DEMO, Electronic and Mechanical Support department, room 36.LB00.390
Message 11 of 13
(85 Views)

Hi Rolf and Andrey,

 

Thank you for your input.

 

I reran the test and confirmed that the byte streams are identical, regardless of whether the “Language for non-Unicode programs” is set to Chinese or English. This indicates that the issue is not related to ZLIB Extract Stream.vi.

 

It seems the problem occurs afterward, when LabVIEW interprets the three-byte UTF-8 characters.

 

 English

Simplified Chinese 

压力值

image.png

image.png

 

Out of curiosity, I also tested a few other Chinese characters. Interestingly, some of them produce the correct UTF-8 values, such as:

 比例阀

 image.png

客户料号

image.png

 

This leads me to suspect that the issue may be character-specific.

Returning to my original example and isolating just the last character:

 压力值 --> 

 image.png

 

I noticed that LabVIEW does not display all three bytes for this character.

 

At this point, I’m not entirely sure what is causing this behavior.

 

0 Kudos
Message 12 of 13
(35 Views)
Solution
Accepted by topic author jamison.suade

We both pointed it out to you.

 

You try to display an UTF-8 string in a non Unicode application. LabVIEW is a Non-Unicode application. As such you have to change the code page of your Windows system in order to display non-standard characters beyond the standard ASCII-7 characters (the western alphabet and numerals and some of the standard signs such as decimal point, comma, semicolon etc. Anything else requires a specific code page to be set. For most Western European characters this can be achieved with codepage 1252, which contains the German Umlauts and French apostrophes and a few others. This codepage is still single byte (only containing 255 character glyphs) and can not display Chinese characters at all. Codepage 936 can display simplified Chinese characters and is a so called DBCS (double byte character set) and uses a fixed two bytes per character except for the standard 7-byte ASCII characters which are also represented as a single byte (which is why the XML format elements still look correct since they are single byte codes and correspond to the standard 7-byte ASCII codes.

 

Since your non 7-byte characters in the string are in UTF-8, they do not match with what Windows believes those bytes should mean according to the codepage 936 encoding and displays basically rubbish. LabVIEW nor Windows is the culprit here. You give the string control UTF-8 encoded data, LabVIEW passes that data to Windows and Windows displays it according to the currently configured codepage. You could convert the data from UTF-8 to the current codepage using the Windows MultiByteToWideChar(CP_UTF8, ......) function to convert from UTF-8 to Unicode UTF-16 and then WideCharToMultiByte(CP_ACP, ..........) to convert it back to the actual current local. However if your current local is not set to Chinese the conversion will fail and cause the string to contain question marks, which is the default replacement in WideCharToMultiByte() for any character that can not be represented in the current (when using CP_ACP, or explicitly when using for instance CP_936) selected target codepage.

 

Or you can experiment with the UTF-8 checkbox that I mentioned earlier. This SHOULD tell Windows that the "current codepage" is UTF-8. This is however a bit of a hack. UTF-8 is technically not a codepage but an entirely different encoding scheme that happens to be similar enough to some of the MBCS (multi byte character set) code pages that it kind of works by using an according UTF-8 specific collation table. But it is a limitation, if you happen to try to read and display a codepage 936 encoded file and display it in a string control you get the same problem again, since those codepage 936 byte sequences mean rubbish in the UTF-8 codespace that all your Non-Unicode applications including LabVIEW will now use.

 

Instead of changing the codepage system wide there is yet another Windows hack since Windows 10 v1903, where you can add an application specific codepage to the manifest of an executable.

 

<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
  <assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
  <application>
    <windowsSettings>
      <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
    </windowsSettings>
  </application>
</assembly>

 

This way you can make only LabVIEW use this while the rest uses your default codepage (probably 936 if you want non Unicode applications to be Simplified Chinese compliant).

But there are many gotchas here. This is a recent Windows hack that changes the current processes codepage but the whole process interfaces in many ways with the whole system so there is always a chance that either LabVIEW or Windows itself has some functions that do depend on the actual system setting rather than the process setting. It may work but has a high chance to cause some problems somewhere. And all your resources you load to display anywhere in LabVIEW need to be in UTF-8 encoding or you get into the same problem again in the reverse way. It almost certainly will play havoc with the localized LabVIEW menus and dialogs if you try to use a Chinese install of LabVIEW, since those are in CP 936 encoded but will then be interpreted as UTF-8 causing unreadable menus and dialogs.

Rolf Kalbermatter  My Blog
DEMO, Electronic and Mechanical Support department, room 36.LB00.390
0 Kudos
Message 13 of 13
(26 Views)