UTF-8 Output from Excel Byte Stream

jamison.suade · ‎03-19-2026

Hi everyone,

I’m running into an encoding issue and could use some advice or insight.

I have the Chinese string 压力值 stored in a cell in an Excel file.

I’m reading the byte stream from this file using ZLIB Extract Stream__ogtk.vi.

Here’s the behavior I’m seeing:

When my “Language for non-Unicode programs” (Region Settings) is set to English (United States), the output byte stream is correct and matches UTF-8 encoding:
E5 8E 8B E5 8A 9B E5 80 BC

Original Character	UTF-8 Bytes	Latin-1	Resulting Character
压	E5 8E 8B	E5 → å 8E → Ž, 8B → ‹	åŽ‹
力	E5 8A 9B	E5 → å, 8A → Š, 9B → ›	åŠ›
值	E5 80 BC	E5 → å, 80 → €, BC → ¼	å€¼

However, when I switch the setting to Simplified Chinese, the last byte changes.

E5 8E 8B E5 8A 9B E5 80 3F
(the BC becomes 3F, i.e., ?)

Misread bytes (GBK)	Resulting character
E5 8E	鍘
8B E5	嬪
8A 9B	姏
E5 80	琛
BC	? (no mapping)

I then take this UTF-8 byte stream and convert it to UTF-16 in order to display the Unicode string. With this issue, the last character is not displayed when the system is in Chinese regional settings.

English

Simplified Chinese

So it seems like something in the pipeline is failing to properly interpret the final character under certain locale settings. Or a problem with the ZLIB Extract Stream__ogtk.vi.

Hoping to hear your thoughts or any possible workarounds.

Martin_Henz · ‎03-20-2026

I really hope that zlib doesn't just change any bytes. I am 99.99999% sure that it doesn't.

So the issue rises in the software which creates the excel file.

I am not from China, do not speak Chinese either, and have so far only occasionally created software with a Chinese user interface. For me, it looks very difficult to convert chinese from UTF-16 to UTF-8.

There are so many different "simplified chinese" character sets, that the words "simplified chinese" doesn't help me much. if you say, that you switched the regional settings to "simplified chinese", my question is: which code page is then used?

I think, that the software, which created the excel file, also failed on this in some way and replaces the unknown UTF-8 character "E5 80 BC" with an "?".

rolfk · ‎03-20-2026

zlib definitely doesn't change bytes based on the user interface language. It works entirely on bytes and not on characters. I wouldn't 100% exclude the possibility that something in the wrapper around it goes wrong. It could be in the C wrapper that I created (doesn't seem very likely but who knows, bugs do happen) or in the LabVIEW layer (slightly bigger chance).

I assume that it is with the same input file, so the possibility that the file generator does something wrong should not be any issue.

But in order to debug that I would need an example file and try it out and install Simplified Chinese support on my system despite that I understand about 0.01% of Chinese. 😀

Also it would help to know which version of the OpenG ZIP library you are using. Is it a recent install through VIPM? The 5.x release series had a significant change in its underlaying architecture to support many new features under the hood, including Unicode support in filenames to allow accessing files with non-standard characters in the path name as well as long path names. Also it supports extracting meta data for files that can be in UTF-8. This should however not affect the actual binary file stream itself, but as I said, bugs can happen.

Have you verified that the binary data (not looked at as string but simply UInt8 data) returned really changes? Or is it something in the string control somehow?

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

jamison.suade · ‎03-25-2026

Hello Rolf,

You don’t need to fully switch your system language to reproduce the issue. Changing only the “Language for non-Unicode programs” to Simplified Chinese is sufficient. This way, the rest of the system can remain in English, so it’s still manageable.

Regarding the OpenG ZIP library version, I unfortunately can’t provide an exact version number. The library was copied into our internal SVN repository, so there’s no straightforward way for me to verify it. However, I’m fairly certain it’s an older version. Is there any chance the version information is stored somewhere, for example in a VI description or front panel metadata? Otherwise, my next step would be to install the latest version via VIPM and test with that.

I can also confirm that the issue affects the raw data itself: the UInt8 array output changes when switching the non-Unicode language setting. See the comparison below. So it is definitely not a string control issue.

English	Simplified Chinese

rolfk · ‎03-25-2026

Could you also post the actual xlsx file, just to be sure?

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

jamison.suade · ‎03-26-2026

Hello Rolf,

I am using an .ods file but also tested it with .xlsx file, and both have the same issue. I'm attaching it both.

rolfk · ‎03-26-2026

Thanks. I'm definitely going to test it to get to the bottom of this. .xlsx or .ods should not matter, I just naively assumed it would be .xlsx, I'm myself predominantly using LibreOffice also at work.

But it will be probably this weekend. Lots of other stuff that has high urgency and more strict obligations.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

rolfk · ‎03-26-2026

I did a quick test with the latest version of OpenG ZIP library 5.0.9 from vipm.io

Windows 10, LabVIEW 2020 32-bit with Language for non-Unicode programs set to Chinese (simplified): returns \E5\8E\8B\E5\8A\9B\E5\80\BC

Windows 10, LabVIEW 2020 32-bit with Language for non-Unicode programs set to Dutch (Netherlands): returns \E5\8E\8B\E5\8A\9B\E5\80\BC

Windows 10, LabVIEW 2020 32-bit with Language for non-Unicode programs set to English (United States): returns \E5\8E\8B\E5\8A\9B\E5\80\BC

So it looks like it is either fixed in the latest version, or it is more complex than just the language selection.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

jamison.suade · ‎03-26-2026

Hi Rolf,
I tried installing the latest version 5.0.9. But still got the same issue. Can you share your test.vi?

Andrey_Dmitriev · ‎03-27-2026

@jamison.suade wrote:

Hi Rolf,
I tried installing the latest version 5.0.9. But still got the same issue. Can you share your test.vi?

I think this is just a visual representation issue. LabVIEW has trouble with three‑byte UTF‑8 characters, and ‘BC’ and ‘3F’ at the end look visually the same, which is why you got confused.

This is how it should look when the development environment, GUI framework and programming language are fully Unicode‑aware—for example, C#/WPF.

I think you should use two-byte GBK encoding; then these characters will be displayed correctly in LabVIEW.

LabVIEW

UTF-8 Output from Excel Byte Stream

UTF-8 Output from Excel Byte Stream

Re: UTF-8 Output from Excel Byte Stream

Re: UTF-8 Output from Excel Byte Stream

Re: UTF-8 Output from Excel Byte Stream

Re: UTF-8 Output from Excel Byte Stream

Re: UTF-8 Output from Excel Byte Stream

Re: UTF-8 Output from Excel Byte Stream

Re: UTF-8 Output from Excel Byte Stream

Re: UTF-8 Output from Excel Byte Stream

Re: UTF-8 Output from Excel Byte Stream