I was poking around for RegEx examples in the Community and found this one titled "Determining Whether a String Contains Only Numbers". I posted two alternate solutions and thought I'd post one of them here.
Read the expression like this: Look from the start of the string (^) for one or more digits ([\d]+) followed, optionally, by a decimal point (\.?) and zero or more digits ([\d]*) up to the end of the string ($) OR (|) look from the start of the string (^) for a decimal point followed by one or more digits ([\d]*) up to the end of the string ($).
This will find a number formatted like .#, #.#, #. or #
You can modify this to look for a number anywhere in your input string by removing the carets and dollar signs. You'll have to replace each instance of "\." if you use the wrong symbol to indicate the decimal point (ie: a comma). 😉
Submatches are useful in extracting a part of your match but they also can also be reused in the regexp. An example is to match beginning and ending tags when you don't know what tag you're looking for.
The brackets are not special characters so they'll result in a literal match. The expression "(\w*)" creates a submatch that will contain zero or more word characters. The second element should look familiar to you but the last element is special. The "</" marks the beginning of a closing tag, but you can't just look for the first instance of this because there may be nested tags. Use the backslash to insert the first submatch (called a backreference) indicated by the number one. The desired text is in submatch 2, the second set of parentheses. You can test this by replacing "sarcasm" by another string.
This is a nice example that I can build from to write code that accesses webpages and extracts information from it.
I think I will exploit the power of RegExp's in the code. I may need some help along the way. At least I know where to post 😉
I did some more work on my answer to this post where the OP asked about "regular expression to match html tags". It may help with what you've proposed. I'm having some trouble with it but I'm hoping we can work them out as we go forward.
I had to modify the expression for tags so I could work with multi-line strings. Like the following:
This is what I came up with to strip the tags off one level and skip the new line and tab characters. There is probably a more efficient way to do it, but this is working for me.
I left the input in / codes to show the tab and newline characters.
I'm still trying to wrap my head around even a simple regular expression in LabVIEW.
For instance, let's say I want to delete any instance of <div ****whatever is in here ****>
an example would be:
or even </div>
basically irradicating anything that contains "div" within < >.
How is that written in regEx ease?
<div R> </div>
I'm not sure I understand what you're after, but here's my interpretation. You'll have to put this inside a loop to handle multiple search and replace operations.
The key here is to use a Character Class to search for the closing ">". Read the Search String as: "Find a '<' followed by any number of anything except a '>', followed by the '>', OR look for exactly '</div>'." The Search and Replace String function will do the rest (you'll need to right-click the node and select Regular Expression).
Everything inside the square brackets constitutes a Character Class. Beginning one with a caret immediately after the opening bracket tells the Regexp engine to negate the characters in this class. The pipe | is the OR operator; it matches either the preceeding or following expression.
I can get one or the other to work in seperately:
</div> works (obviously),
is it misinterpreting the pipe as a character?
have a look at the output sting you posted.. 😉
Here's a section of the html code. It simply represents the <div> stuff that I want to remove.
As a matter of fact, I need to parse and extract some information from within the html code.
As I said... just a small portion and sample information to show the <div fields>. I do have something that works, but it is in 2 steps. I tried to get it working in a single step as the code you provided.
Obviously the code below is not intended to work, simply an example.. 😉
<a rel="nofollow" href="http://website.com" target="_blank">some website</a>
Some miscellaneous message, text and things that are not important. <a href="http://www.somewebsite.com" title="a website somewhere on the net"><strong>Title of website</strong></a>.</a>
</div><br /><br /><div style="width:200px"><div align="center"><br />
<img style="max-width: 500px; cursor: pointer;" onclick="window.open(this.src)" src="http://anotherwebsite.com/image.png" border="0" alt="" /><br />
As a matter of fact, you could do it for the source code from this thread's source code... Unfortunately, if I try to post part of the source code from this thread, it quickly goes beyond the limited characters for the message..
There are plenty of <div fields> to remove... 🙂
The purpose of posting was to learn to improve my regEx knowledge.. I'm trying to figure out why the "OR" '|' didn't work....
Let's try this portion of the source code from this thread...
<!-- End Global Navigation -->
<div class="lia-quilt-layout-one-column lia-quilt-forum-topic-page lia-quilt">
<div class="lia-quilt-row-header lia-quilt-row">
<div class="lia-quilt-column-common-header lia-quilt-column-single lia-quilt-column-24 lia-quilt-column">
<div class="lia-quilt-column-alley-single lia-quilt-column-alley">
<div class="lia-component-quilt-header lia-quilt-layout-header lia-quilt-header lia-quilt">
<div class="lia-quilt-row-title lia-quilt-row">
<div class="lia-quilt-column-page-title lia-quilt-column-left lia-quilt-column-10 lia-quilt-column">
<div class="lia-quilt-column-alley-left lia-quilt-column-alley">
<h2 class="lia-component-common-widget-page-title PageTitle"><a class="lia-link-navigation" id="link_0" href="/t5/BreakPoint/bd-p/BreakPoint">BreakPoint</a></h2>
</div><div class="lia-quilt-column-site-navigation lia-quilt-column-right lia-quilt-column-14 lia-quilt-column">
<div class="lia-quilt-column-alley-right lia-quilt-column-alley">
<div class="lia-component-common-widget-site-navigation SiteNavigationDropDown">
<div class="lia-menu-navigation-wrapper" id="siteNavigationDropDown">
<div class="lia-menu-navigation lia-js-click-menu">
<div class="dropdown-default-item"><a name="title" class="lia-link-navigation default-menu-option lia-js-menu-opener" rel="nofollow" id="dropDownLink" href="#">Go To</a>
<ul id="dropdownmenuitems" class="lia-menu-dropdown-items">
In the above example, all the <div class...> should dissappear..