LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Converting html documents to plain text using LabVIEW

I have a large number of html test reports that were created in TestStand that I want to post process (search through for data) using LabVIEW. Currently if I open them I see all the html tags and I want to know if there is an easy way to clean that out and get the raw information...(similar to opening a document in a web browser and doing a "save as text file"). I have the internet toolkit and the "Internet Applications in LabVIEW" text, but couldn't find what I wanted in either place. Anyone ever do this before?
My next course of action would be to use ActiveX and try to save it as text through the browser, but that would be very slow (open IE, open file, save as text, close IE, open text file, process, delete
text file...repeat thousands of times). Any info would be appreciated.
0 Kudos
Message 1 of 12
(7,169 Views)
Hi John,

I have not personally explored this stuff myself, but,

On your functions palette goto

Advanced>>>Data manipulation>>>XML

and you will find a sub-palette full of stuff that looks good to me.

Let us know if any of thes are useful.

Ben
Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel
0 Kudos
Message 2 of 12
(7,169 Views)
John,

This VI will get you part way. It will return everything that is not a markup tag. You will still want to unescape HTML codes, which you should be able to do with the Internet Toolkit VIs.

This VI is designed for speed and memory efficiency, so the code looks a little low-level and has no subVIs which would duplicate data memory (but it is very clean, nonetheless)

Good luck,

-Jim
JKI Blog
0 Kudos
Message 3 of 12
(7,169 Views)
Ben and Jim gave you some good answers to help you get started. What I'd like to say is that to avoid all of this in the future, you should start saving the results from TestStand in a database. As you have found out, going through a couple thousand html files is time consuming and requires a significant amoutn of disk space. Haing the same test results in a single database is much more efficient and it's a trivial matter for just about anyone to run some simple queries and analyse it the way they want. A simple Access database will do in a lot of situations. If you really don't want to use a database, at least use the XML report generation in TestStand.
Message 4 of 12
(7,169 Views)
Excellent advice, Dennis!
JKI Blog
0 Kudos
Message 5 of 12
(7,169 Views)
Dennis,
Thanks for the response. You're preaching to the choir here. Back in '97 or 98 I presented at NI week a system I designed using the old Test Executive Toolkit coupled to a MS SQL Server DB, so I'm well aware of the value in that. In 2000 I started working at my present job and the first 3 years we were running full tilt just trying to get test programs written. It's only now that I'm getting to develop high level tools again. This is just a short term solution I was trying to throw together. Kind of frustrating working with out of date tools, but at least we shipped a lot of product in that time.
John
0 Kudos
Message 6 of 12
(7,169 Views)
Thanks Jim,
I tried the VI on one of my reports, but it didn't work. The reason I think is because there are additional ">" characters that TestStand uses to denote a path to a VI within subsequences. I should be able to make minor modifications to it so that it doesn't see them. You're right, it is efficiently coded though!
Thanks again.
John
0 Kudos
Message 7 of 12
(7,169 Views)
Ben,
Thanks for the response. I tried the "Unescape XML.vi" and had no luck. I thought it would work too.
John
0 Kudos
Message 8 of 12
(7,169 Views)

Hi Jim,

Very efficient VI. Is there an OpenG version of this?

Regards,

Khalid

ps: I'm sorry if this spams "notification" mail to all others on this thread.


0 Kudos
Message 9 of 12
(7,169 Views)
Jim,

Great code, very efficient!

For all those of us who can deal with a slight speed penalty, we can get the same result by just using the built-in "search and Replace Pattern" tool as in the attached image (First seen in Kevin Price's early attempt at the recent HTML coding challenge posted here).

Christian

LabVIEW Champion. It all comes together in GCentral GCentral
What does "Engineering Redefined" mean??
0 Kudos
Message 10 of 12
(7,169 Views)