LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Unicode to UTF-8 conversion

Solved!
Go to solution

Hello,

 

Has anyone every done conversion of Unicode Characters (0000 - FFFF) to UTF-8? Are there any inbuild modules which can be used for this conversion?

http://www.utf8-chartable.de/unicode-utf8-table.pl?start=128&number=1024

 

There is a module ASCII text to UTF-8 which does not work for unicode characters.

 

-VS

 

 

0 Kudos
Message 1 of 6
(7,342 Views)

The safe answer is "Not Yet", based on a search of LabVIEW Help and a Web Search.  Part of the problem is that Unicode is U16, while ASCII (which LabVIEW basically uses) is U8.  You could write your own Translator/Mapper, but it would have to deal with the Many-to-One problem (i.e. you'd need to decide which 256 Unicode characters you wanted to display as which Ascii character).

 

Bob Schor

 

P.S. -- If I'm wrong about this, I have every confidence that other readers of this Forum will point out my error ...

0 Kudos
Message 2 of 6
(7,324 Views)

The conversion would be simple.  Why don't you write a VI for it and share it.

"If you weren't supposed to push it, it wouldn't be a button."
0 Kudos
Message 3 of 6
(7,315 Views)
Solution
Accepted by topic author V_T_S

Actually ASCII is a VERY LIMITED subset of Unicode. And no, Unicode ist not UTF-16, but UTF-16 LE (little endian) is de Unicode version used on Windows. You also have UTF-8 and UTF-32 and for the 16 and 32 bit versions both an LE and BE version.

 

Windows NT uses internally everywhere UTF-16 LE, while translating it to 8-bit MBCS on demand for applications that don't use Unicode such as LabVIEW. And LabVIEW therefore does not really use ASCII but 8-bit MBCS. For most Windows codepages this means an extended ASCII code page with the lower 128 character codes mapped to the standard 7-bit ASCII characters and the upper 128 character points mapped to code page specific characters. But Asian and Arabian codepages can define more than 256 characters and then a single character suddenly consists of multiple bytes even in LabVIEW.

 

Linux uses nowadays internally mostly UTF-32 (LE or BE) depending on the endianess of the CPU but with most user systems nowadays running on x86/64 or ARM this is usually also LE.

On the user level it uses UTF-8 which is in fact also a MBCS encoding where a single character point can consist of 1 to 4 bytes. The first 128 characters in the ASCII table map exactly to the first 128 characters in the Unicode standard.

So if your LabVIEW was running on on a modern Linux system you theoretically would already use UTF-8 Smiley Very Happy

 

On Windows to get to UTF-8 from UTF-16 LE one needs to simply call the function WideCharToMultyByte() with the first parameter set to CP_UTF8 (65001) instead of CP_ACP (0).

 

And be very careful to allocate a large enough buffer for the returned UTF-8 string. It can theoretically get up to 4 times as long in bytes as there are UTF-16 character points in the incoming string!

 

Rolf Kalbermatter
My Blog
Message 4 of 6
(7,305 Views)

Thanks, RolfK, for the clear and concise response!

 

Bob Schor

0 Kudos
Message 5 of 6
(7,277 Views)

Thank you for the detail description RolfK!!

0 Kudos
Message 6 of 6
(7,271 Views)