Web Browser ActiveX question

DJ_001 · ‎04-02-2002

Hi all,

I'm using MS' web browser control to view info off the web. What I'd like
to accomplish is programmatic saving of images that are currently being
displayed.

Unfortunately I can't seem to get this done. I've included two methods that
I've been trying. Has anyone accomplished this?

Method 1 gets a reference to the image in the control. I can retrieve
various properties from the image, but I can seem to find the actual data.
I think the data can be retrieved using the 'toString' method, but I haven't
been able to figure out how...

Method 2 uses the ExecWB method with the 'SaveAs' command on the control.
Works great, but the problem is that I can seem to disable the file dialog.

If anyone can help me out, I'd a
pprciate it.

Denis.

P.S. I've found some help for the methods/properties here:
http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/webbrows
er/webbrowser.asp

[Attachment Method 1.vi, see below]

[Attachment Method 2.vi, see below]

[Attachment MID7604 14.jpg, see below]

Labviewguru · ‎05-11-2002

Denis,

I am not exactly clear on precisely what you are trying to save, but I can offer this:

If you are only trying to save images, such as .jpg, .gif, etc, then there is a MUCH easier way to do this than trying to do what you are doing.

If you are trying to save an image of the webpage displayed, then this can also be more easily accomplished.

The problem I see is that you are making the common mistake of assuming that the browser is an actual reflection of the content of a webpage. This is not the case. A browser interprets HTML and other code, and displays what the code tells the browser to display. This includes images.

I will describe how to grab images from a webpage, and let you figure out how to save the entire page on your own.
Actually, once you learn to do what I describe, you will have all you need in order to save the entire page.

First, you want to grab the content. I use "Datasocket Read" to obtain the content (just input the complete URL, and a string constant to the "type" input, and it gives you back the content.) You then just parse out the text. I just simply use "Match String" and pull the data between "src="" and """ (note, that means src=" and " are the exact strings searched for.) This gives you the URL of the images. Then, you can input a .jpg or .gif to the Datasocket Read and grab and save teh image.

I hope this answers your questions. If not, email me.

Scott_Jordan · ‎05-23-2002

The DataSocket does succeed in getting to the html for parsing, but it does so by downloading the whole page again. In my case, the html is the result of ~20 seconds of processing on the other end of the Internet. Downloading it a second time (after the user has browsed to it for viewing) is inconvenient, wasteful and basically ugly. The html is right there in the browser... how can I get at it?

Incidentally, the Web Events.vi example is very tantalizing, but only two items are returned in ParamNames by the Wait On ActiveX Event.vi: the URL, and something called PDisp, which is just a text string "Microsoft Web Browser Control". Not too helpful.

What's needed is something that does what the View Source menu selection does everyday in Interne
t Explorer. Near as I can tell, selecting View Source does not trigger any additional Internet activity, so that suggests it's possible to get at the html source without a secondary download.

Any suggestions would be VERY much appreciated.

Thanks in advance.

wiebe@CARYA · ‎05-24-2002

Hi,

I didn't get the start of this discusion, so I don't see why you need to
reload the page in the first place. You can use flags when loading the pages
(navNoReadFromCach=4, navNoWriteToCache=8) to control caching. When nothing
is selected, the document is reloaded from cache, use it in combination with
the offline property. Anyway, this is what I often use to work with the
browser ocx:

To view the source inside the WebBrowser ocx, you can do the following:

a1) Load the page;
a2) get the document property, this is a variant
a3) convert it to a MSHTML.DispHTMLDocument, using Variant To Data
a4) use the documentElement property to get the root document element
a5) use the outerHTML property to get the source code.
a6) close all the references!

Note that this is not exactlly the same as 'view source'. View source is the
code before the browser has processed it, the method above returns the
resulting html.

A better way might be to:

b1) load the url data (using WinInet Easy Get URL Bin.vi, available on
internet, Moore Good Idead site, I think).
b2) put it in the browser, by putting the html data into it. To do this,
attach a string to the Naviagte2 URL (Navigate is obsolete). The string can
be an URL, but also html, just put about: before it:

about:

chapter1

b3) when an update is needed, repeat step 2.

You can combine the methods:

c1) load it and get the data from step a1) to a6)
c2) put it in the browser using b2)

Note that this might not work, the about: method is not suitable for all
html data. E.g. links to reletive URL's are not working, because the base
URL is different!

Regards,

Wiebe.

"Scott Jordan" wrote in message
news:506500000005000000727A0000-1021771306000@exchange.ni.com...
> The DataSocket does succeed in getting to the html for parsing, but it
> does so by downloading the whole page again. In my case, the html is
> the result of ~20 seconds of processing on the other end of the
> Internet. Downloading it a second time (after the user has browsed to
> it for viewing) is inconvenient, wasteful and basically ugly. The
> html is right there in the browser... how can I get at it?
>
> Incidentally, the Web Events.vi example is very tantalizing, but only
> two items are returned in ParamNames by the Wait On ActiveX Event.vi:
> the URL, and something called PDisp, which is just a text string
> "Microsoft Web Browser Control". Not too helpful.
>
> What's needed is something that does what the View Source menu
> selection does everyday in Internet Explorer. Near as I can tell,
> selecting View Source does not trigger any additional Internet
> activity, so that suggests it's possible to get at the html source
> without a secondary download.
>
> Any suggestions would be VERY much appreciated.
>
> Thanks in advance.

Search LabVIEW like a graph!

wiebe@CARYA · ‎05-24-2002

Hi,

I didn't get the start of this discusion, so I don't see why you need to
reload the page in the first place. You can use flags when loading the pages
(navNoReadFromCach=4, navNoWriteToCache=8) to control caching. When nothing
is selected, the document is reloaded from cache, use it in combination with
the offline property. Anyway, this is what I often use to work with the
browser ocx:

To view the source inside the WebBrowser ocx, you can do the following:

a1) Load the page;
a2) get the document property, this is a variant
a3) convert it to a MSHTML.DispHTMLDocument, using Variant To Data
a4) use the documentElement property to get the root document element
a5) use the outerHTML property to get the source code.
a6) close all the references!

Note that this is not exactlly the same as 'view source'. View source is the
code before the browser has processed it, the method above returns the
resulting html.

A better way might be to:

b1) load the url data (using WinInet Easy Get URL Bin.vi, available on
internet, Moore Good Idead site, I think).
b2) put it in the browser, by putting the html data into it. To do this,
attach a string to the Naviagte2 URL (Navigate is obsolete). The string can
be an URL, but also html, just put about: before it:

about:

chapter1

b3) when an update is needed, repeat step 2.

You can combine the methods:

c1) load it and get the data from step a1) to a6)
c2) put it in the browser using b2)

Note that this might not work, the about: method is not suitable for all
html data. E.g. links to reletive URL's are not working, because the base
URL is different!

Regards,

Wiebe.

"Scott Jordan" wrote in message
news:506500000005000000727A0000-1021771306000@exchange.ni.com...
> The DataSocket does succeed in getting to the html for parsing, but it
> does so by downloading the whole page again. In my case, the html is
> the result of ~20 seconds of processing on the other end of the
> Internet. Downloading it a second time (after the user has browsed to
> it for viewing) is inconvenient, wasteful and basically ugly. The
> html is right there in the browser... how can I get at it?
>
> Incidentally, the Web Events.vi example is very tantalizing, but only
> two items are returned in ParamNames by the Wait On ActiveX Event.vi:
> the URL, and something called PDisp, which is just a text string
> "Microsoft Web Browser Control". Not too helpful.
>
> What's needed is something that does what the View Source menu
> selection does everyday in Internet Explorer. Near as I can tell,
> selecting View Source does not trigger any additional Internet
> activity, so that suggests it's possible to get at the html source
> without a secondary download.
>
> Any suggestions would be VERY much appreciated.
>
> Thanks in advance.

Search LabVIEW like a graph!

Scott_Jordan · ‎05-24-2002

Thank you! I'll try all your suggestions.

Is this documented anywhere? I've spent hours and hours looking for guidance.

Scott_Jordan · ‎05-24-2002

Thanks to the contributions of Cyril at NI and the participants in this thread, I am now the proud possessor of code that works beautifully for extracting the html of a page displayed in a LabVIEW-embedded browser. See the .LLB attached to this post.

Thanks, all!

wiebe@CARYA · ‎05-27-2002

Your welkom,

I get my info from www.microsoft.com, and the ms's sdks (downloadable from
ms) and msdn (just type e.g. DispHTMLDocument, and you'll get link, mostly
..NET, but also WebBrowser2 info) .

Examples from vb, vc++ and javascript are sometimes helpfull, but need
mapping (to LV).

Regards,

Wiebe.

"Scott Jordan" wrote in message
news:5065000000050000009D7A0000-1021771306000@exchange.ni.com...
> Thank you! I'll try all your suggestions.
>
> Is this documented anywhere? I've spent hours and hours looking for
> guidance.

Search LabVIEW like a graph!

BadJelly · ‎10-28-2005

Hi all,

Did anyone actually work out how to directly save an object referenced within a web page? I've had a play with the ideas contained in this thread and can replicate Scott's findings where you are prompted when calling the ExecWB method. However the inputs to this call do not work as I would expect (the filepath provided is ignored, as is the 'don't prompt user' constant).

I don't understand the suggestion made previously about using datasocket reads and putting a 'jpg or gif' (or any other file type for that matter) into it.

Basically I'm looking to expand on Scott's VI and I need assistance.

regards

Andy

LabVIEW