LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

UTF-8 Output from Excel Byte Stream

Solved!
Go to solution

Hi everyone,

 

I’m running into an encoding issue and could use some advice or insight.

 

I have the Chinese string 压力值 stored in a cell in an Excel file.

I’m reading the byte stream from this file using ZLIB Extract Stream__ogtk.vi.

ZLIB.jpg

 

 

Here’s the behavior I’m seeing:

When my “Language for non-Unicode programs” (Region Settings) is set to English (United States), the output byte stream is correct and matches UTF-8 encoding:
E5 8E 8B E5 8A 9B E5 80 BC

jamisonsuade_3-1773975157900.png

Original Character

UTF-8 Bytes

Latin-1

Resulting Character

E5

8E

8B

E5 → å

8E → Ž,

8B → ‹

压

E5

8A

9B

E5 → å,

8A → Š,

9B → ›

力

E5

80

BC

E5 → å,

80 → €,

BC → ¼

值

 

However, when I switch the setting to Simplified Chinese, the last byte changes. 

E5 8E 8B E5 8A 9B E5 80 3F
(the BC becomes 3F, i.e., ?)

Byte.png

 

Misread bytes (GBK)

Resulting character

E5 8E

8B E5

8A 9B

E5 80

BC

? (no mapping)

 

I then take this UTF-8 byte stream and convert it to UTF-16 in order to display the Unicode string. With this issue, the last character is not displayed when the system is in Chinese regional settings.

English

Simplified Chinese

jamisonsuade_5-1773975743476.png

 

 

jamisonsuade_6-1773975743477.png

 

 

So it seems like something in the pipeline is failing to properly interpret the final character under certain locale settings. Or a problem with the ZLIB Extract Stream__ogtk.vi.
 
Hoping to hear your thoughts or any possible workarounds.
Message 1 of 13
(525 Views)

I really hope that zlib doesn't just change any bytes. I am 99.99999% sure that it doesn't.

So the issue rises in the software which creates the excel file.

 

I am not from China, do not speak Chinese either, and have so far only occasionally created software with a Chinese user interface. For me, it looks very difficult to convert chinese from UTF-16 to UTF-8.

There are so many different "simplified chinese" character sets, that the words "simplified chinese" doesn't help me much. if you say, that you switched the regional settings to "simplified chinese", my question is: which code page is then used?

 

I think, that the software, which created the excel file, also failed on this in some way and replaces the unknown UTF-8 character "E5 80 BC" with an "?".

 

 

0 Kudos
Message 2 of 13
(463 Views)

zlib definitely doesn't change bytes based on the user interface language. It works entirely on bytes and not on characters. I wouldn't 100% exclude the possibility that something in the wrapper around it goes wrong. It could be in the C wrapper that I created (doesn't seem very likely but who knows, bugs do happen) or in the LabVIEW layer (slightly bigger chance).

 

I assume that it is with the same input file, so the possibility that the file generator does something wrong should not be any issue.

 

But in order to debug that I would need an example file and try it out and install Simplified Chinese support on my system despite that I understand about 0.01% of Chinese. 😀

 

Also it would help to know which version of the OpenG ZIP library you are using. Is it a recent install through VIPM? The 5.x release series had a significant change in its underlaying architecture to support many new features under the hood, including Unicode support in filenames to allow accessing files with non-standard characters in the path name as well as long path names. Also it supports extracting meta data for files that can be in UTF-8. This should however not affect the actual binary file stream itself, but as I said, bugs can happen.

 

Have you verified that the binary data (not looked at as string but simply UInt8 data) returned really changes? Or is it something in the string control somehow?

Rolf Kalbermatter  My Blog
DEMO, Electronic and Mechanical Support department, room 36.LB00.390
0 Kudos
Message 3 of 13
(437 Views)

Hello Rolf,


You don’t need to fully switch your system language to reproduce the issue. Changing only the “Language for non-Unicode programs” to Simplified Chinese is sufficient. This way, the rest of the system can remain in English, so it’s still manageable.

 

Regarding the OpenG ZIP library version, I unfortunately can’t provide an exact version number. The library was copied into our internal SVN repository, so there’s no straightforward way for me to verify it. However, I’m fairly certain it’s an older version. Is there any chance the version information is stored somewhere, for example in a VI description or front panel metadata? Otherwise, my next step would be to install the latest version via VIPM and test with that.

 

I can also confirm that the issue affects the raw data itself: the UInt8 array output changes when switching the non-Unicode language setting. See the comparison below. So it is definitely not a string control issue.

 

English

Simplified Chinese

English.jpg

Chinese.jpg

0 Kudos
Message 4 of 13
(330 Views)

Could you also post the actual xlsx file, just to be sure?

Rolf Kalbermatter  My Blog
DEMO, Electronic and Mechanical Support department, room 36.LB00.390
0 Kudos
Message 5 of 13
(311 Views)

Hello Rolf,

 

I am using an .ods file but also tested it with .xlsx file, and both have the same issue. I'm attaching it both.

0 Kudos
Message 6 of 13
(279 Views)

Thanks. I'm definitely going to test it to get to the bottom of this. .xlsx or .ods should not matter, I just naively assumed it would be .xlsx, I'm myself predominantly using LibreOffice also at work.

 

But it will be probably this weekend. Lots of other stuff that has high urgency and more strict obligations.

Rolf Kalbermatter  My Blog
DEMO, Electronic and Mechanical Support department, room 36.LB00.390
Message 7 of 13
(265 Views)

I did a quick test with the latest version of OpenG ZIP library 5.0.9 from vipm.io

 

Windows 10, LabVIEW 2020 32-bit with Language for non-Unicode programs set to Chinese (simplified): returns \E5\8E\8B\E5\8A\9B\E5\80\BC

Windows 10, LabVIEW 2020 32-bit with Language for non-Unicode programs set to Dutch (Netherlands): returns \E5\8E\8B\E5\8A\9B\E5\80\BC

Windows 10, LabVIEW 2020 32-bit with Language for non-Unicode programs set to English (United States): returns \E5\8E\8B\E5\8A\9B\E5\80\BC

 

So it looks like it is either fixed in the latest version, or it is more complex than just the language selection.

 

 

Rolf Kalbermatter  My Blog
DEMO, Electronic and Mechanical Support department, room 36.LB00.390
0 Kudos
Message 8 of 13
(250 Views)

Hi Rolf, 
I tried installing the latest version 5.0.9. But still got the same issue. Can you share your test.vi?

0 Kudos
Message 9 of 13
(209 Views)

@jamison.suade wrote:

Hi Rolf, 
I tried installing the latest version 5.0.9. But still got the same issue. Can you share your test.vi?


I think this is just a visual representation issue. LabVIEW has trouble with three‑byte UTF‑8 characters, and ‘BC’ and ‘3F’ at the end look visually the same, which is why you got confused.

 

Screenshot 2026-03-26 20.39.41.png

This is how it should look when the development environment, GUI framework and programming language are fully Unicode‑aware—for example, C#/WPF.

Screenshot 2026-03-26 20.46.59.png

I think you should use two-byte GBK encoding; then these characters will be displayed correctly in LabVIEW.

Screenshot 2026-03-27 07.42.43.png

Message 10 of 13
(186 Views)