LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Extracting HTML code from URL

I'm having problems extracting a HTML code from a URL Adress.I already
tried to use the Data Socket Read function, but it doesn't always
work. I would like to retrieve the HTML code from an already opened
Website in the Windows Internet Explorer. Like programming a macro
which goes to the "View"-Menu of an opened Website in the Internet
Explorer Browser and chooses to view the HTML code and maybe even
saves it in a normal text document. I'd be very grateful for any
advice.
0 Kudos
Message 1 of 5
(3,972 Views)
Hello Mamian,

From what you have described, I believe that you are on the right
track in using the DataSocket Read VI. One problem that you might
encounter in using this method is if you attempt to access a
redirecting url (a url that sends you to another url). If this is the
case, it will not work. You would have to use the absolute url, which
is the url listed inside of the address window at the top of your
internet browser.

I attached an example VI that hopefully performs something similar to
what you are looking for. Also, I found a good thread
href="http://exchange.ni.com/servlet/ProcessRequest?RHIVEID=101&RNAME=ViewQuestion&HOID=506500000008...

that may help you accomplish this task.

I ho
pe this helps! Let me know if there is anything else I can help
you with or clarify. Have a great weekend!

Jeremy L
National Instruments
0 Kudos
Message 2 of 5
(3,971 Views)
Hi Jeremy,
Thanks a lot for your answer and your help. I still have got the
problem though. I'm afraid to say, i wasn't able to look at the
example you attached in your answer because i've only got LabVIEW 6i.
The link you found, i already saw, but my problem persists. I think I
know what you mean with a redirecting url, it's like for example in
hotmail, when you click a link, hotmail will not open a site with the
absolute url, but with a redirected. I 'll explain to you what is it
about. There's an internet site where i have an account. As soon as i
login, i enter a url with a dynamic ip number (for example:
http://155.155.155.155/whatever.asp) with the Data Socket Read VI, it
sometimes works, but many times, i become a html code, in which says,
i've bee
n automatically loged out, even if i can still use this ip to
navigate on this site. That's why, i'm trying it the way i described
in my former question. Using maybe the ActiveX to enter the Windows IE
window and view the code there, and maybe save it as a text file,
which i can read later with LabVIEW.
Any suggestions will help me. Thanks a lot for your support.
You too have a nice weekend.
Mamian
0 Kudos
Message 3 of 5
(3,971 Views)
Hi Mamian,
Sorry about that example version. However, I found a different way that you can open a website using ActiveX and return that page as an HTML source string, which I believe can then be written to a text file. I have attached a sample vi (version 6i) which contains some comments on the block diagram to explain what's going on. I tested it out on a couple of websites and did not have any problems. Hopefully this is what you're looking for!

Let me know if you need any other help with this vi or anything else!

Jeremy L
National Instruments
Jeremy L.
National Instruments
0 Kudos
Message 4 of 5
(3,972 Views)
I'm just adding a reply here because I think Jeremy's well-documented example might need a small tweak to work better.

In Step 5 ("Go to the activeElement property of the Document object"), I found that I got more of the content I was after by using the "body" property of the document instead. The rest of the code worked without any further changes, but if I stuck with Jeremy's activeElement version, I found that I was only getting a small amount of HTML that seemed to be associated with an ad at the top of the HTML page I was trying to parse.

Overall, this is an excellent little snippet of code that shows how to make better use of the IWebBrowser2 object by delving down into its Document property. When simpler efforts (via TCP or ITK VIs or DataSocket) to obtain the HTML source associated with a URL fail because cookies or authentication is required, you might be able to get the job done using the Microsoft Web Browser ActiveX control and code like this.

Thanks for the example, Jeremy,
John
0 Kudos
Message 5 of 5
(3,925 Views)