BreakPoint

cancel
Showing results for 
Search instead for 
Did you mean: 

Regular Expressions Board

I was poking around for RegEx examples in the Community and found this one titled "Determining Whether a String Contains Only Numbers".  I posted two alternate solutions and thought I'd post one of them here.

 

20319iD2724CF2F6941A05

 

Read the expression like this: Look from the start of the string (^) for one or more digits ([\d]+) followed, optionally, by a decimal point (\.?) and zero or more digits ([\d]*) up to the end of the string ($) OR (|) look from the start of the string (^) for a decimal point followed by one or more digits ([\d]*) up to the end of the string ($).

 

This will find a number formatted like .#, #.#, #. or #

 

You can modify this to look for a number anywhere in your input string by removing the carets and dollar signs.  You'll have to replace each instance of "\." if you use the wrong symbol to indicate the decimal point (ie: a comma).  😉

Jim
You're entirely bonkers. But I'll tell you a secret. All the best people are. ~ Alice
For he does not know what will happen; So who can tell him when it will occur? Eccl. 8:7

Message 11 of 149
(6,666 Views)

 


@jcarmody wrote:

Submatches are useful in extracting a part of your match but they also can also be reused in the regexp.  An example is to match beginning and ending tags when you don't know what tag you're looking for.

19243iC7BBA3E0B2773182

 

The brackets are not special characters so they'll result in a literal match.  The expression "(\w*)" creates a submatch that will contain zero or more word characters. The second element should look familiar to you but the last element is special.  The "</" marks the beginning of a closing tag, but you can't just look for the first instance of this because there may be nested tags.  Use the backslash to insert the first submatch (called a backreference) indicated by the number one.  The desired text is in submatch 2, the second set of parentheses.  You can test this by replacing "sarcasm" by another string.

 


 

This is a nice example that I can build from to write code that accesses webpages and extracts information from it.

I think I will exploit the power of RegExp's in the code.  I may need some help along the way.  At least I know where to post 😉

______________________________________________________________________
Message 12 of 149
(6,601 Views)

I did some more work on my answer to this post where the OP asked about "regular expression to match html tags".  It may help with what you've proposed.  I'm having some trouble with it but I'm hoping we can work them out as we go forward.

 

21503i4939065D1FF26627

Jim
You're entirely bonkers. But I'll tell you a secret. All the best people are. ~ Alice
For he does not know what will happen; So who can tell him when it will occur? Eccl. 8:7

Message 13 of 149
(6,586 Views)

I had to modify the expression for tags so I could work with multi-line strings. Like the following:

 

<Action>

    <parameters>

        <parameter>Something</parameter>

    </parameters>

</Action>

 

This is what I came up with to strip the tags off one level and skip the new line and tab characters. There is probably a more efficient way to do it, but this is working for me.

 

22190iAE2EBA8FA7F9BFE4

 

I left the input in / codes to show the tab and newline characters.

 

     Rob

Message 14 of 149
(6,529 Views)

I'm still trying to wrap my head around even a simple regular expression in LabVIEW.

 

For instance, let's say I want to delete any instance of <div ****whatever is in here ****>

an example would be:

 

<div class="usenext">

 

or even </div>

 

basically irradicating anything that contains "div" within < >.

 

How is that written in regEx ease?

 

<Thanks>

<div R>

</div>

______________________________________________________________________
Message 15 of 149
(6,429 Views)

I'm not sure I understand what you're after, but here's my interpretation.  You'll have to put this inside a loop to handle multiple search and replace operations.

 

23750iC611934A0CCF7E06

 

The key here is to use a Character Class to search for the closing ">".  Read the Search String as: "Find a '<' followed by any number of anything except a '>', followed by the '>', OR look for exactly '</div>'."  The Search and Replace String function will do the rest (you'll need to right-click the node and select Regular Expression).

Everything inside the square brackets constitutes a Character Class.  Beginning one with a caret immediately after the opening bracket tells the Regexp engine to negate the characters in this class.  The pipe | is the OR operator; it matches either the preceeding or following expression.

Jim
You're entirely bonkers. But I'll tell you a secret. All the best people are. ~ Alice
For he does not know what will happen; So who can tell him when it will occur? Eccl. 8:7

Message 16 of 149
(6,447 Views)

Thanks Jim,

 

I can get one or the other to work in seperately:

 

<div[^>]*>  works

and

</div>  works (obviously),

 

but

 

<div[^>]*>|</div>

 

or

 

(<div[^>]*>)|(</div>)

 

didn't work.

 

is it misinterpreting the pipe as a character?

have a look at the output sting you posted..  😉

______________________________________________________________________
Message 17 of 149
(6,435 Views)

Could you post some of the text you're working with?

Jim
You're entirely bonkers. But I'll tell you a secret. All the best people are. ~ Alice
For he does not know what will happen; So who can tell him when it will occur? Eccl. 8:7

Message 18 of 149
(6,426 Views)

Here's a section of the html code.  It simply represents the <div> stuff that I want to remove.

As a matter of fact, I need to parse and extract some information from within the html code.

 

As I said... just a small portion and sample information to show the <div fields>.   I do have something that works, but it is in 2 steps.  I tried to get it working in a single step as the code you provided.

Obviously the code below is not intended to work, simply an example.. 😉

 

 

============================================

 

 

    <div id="post_message">
      

      <div class="usenext">
<div class="title">
<a rel="nofollow" href="http://website.com" target="_blank">some website</a>
</div>
Some miscellaneous message, text and things that are not important. <a href="http://www.somewebsite.com" title="a website somewhere on the net"><strong>Title of website</strong></a>.</a>
</div><br /><br /><div style="width:200px"><div align="center"><br />
<img style="max-width: 500px; cursor: pointer;" onclick="window.open(this.src)"  src="http://anotherwebsite.com/image.png" border="0" alt="" /><br />

</div><br />
<br />
<br />
</div>

______________________________________________________________________
Message 19 of 149
(6,414 Views)

As a matter of fact, you could do it for the source code from this thread's source code...  Unfortunately, if I try to post part of the source code from this thread, it quickly goes beyond the limited characters for the message..

 

There are plenty of <div fields> to remove...  🙂

 

The purpose of posting was to learn to improve my regEx knowledge..  I'm trying to figure out why the "OR"  '|' didn't work....

 

Thanks.


R

 

Let's try this portion of the source code from this thread...

 

<!-- End Global Navigation -->

                    
    
    <div class="MinimumWidthContainer">
        <div class="min-width-wrapper">
            <div class="min-width">        
                
                        <div class="lia-content">
                            
                            
        
       <div class="lia-quilt-layout-one-column lia-quilt-forum-topic-page lia-quilt">
    <div class="lia-quilt-row-header lia-quilt-row">

        <div class="lia-quilt-column-common-header lia-quilt-column-single lia-quilt-column-24 lia-quilt-column">            
            <div class="lia-quilt-column-alley-single lia-quilt-column-alley">
    <div class="lia-component-quilt-header lia-quilt-layout-header lia-quilt-header lia-quilt">
    <div class="lia-quilt-row-title lia-quilt-row">
        <div class="lia-quilt-column-page-title lia-quilt-column-left lia-quilt-column-10 lia-quilt-column">            
            <div class="lia-quilt-column-alley-left lia-quilt-column-alley">
    <h2 class="lia-component-common-widget-page-title PageTitle"><a class="lia-link-navigation" id="link_0" href="/t5/BreakPoint/bd-p/BreakPoint">BreakPoint</a></h2>
</div>            
        </div><div class="lia-quilt-column-site-navigation lia-quilt-column-right lia-quilt-column-14 lia-quilt-column">            
            <div class="lia-quilt-column-alley-right lia-quilt-column-alley">
    <div class="lia-component-common-widget-site-navigation SiteNavigationDropDown">

    <div class="lia-menu-navigation-wrapper" id="siteNavigationDropDown">    
    <div class="lia-menu-navigation lia-js-click-menu">
        <div class="dropdown-default-item"><a name="title" class="lia-link-navigation default-menu-option lia-js-menu-opener" rel="nofollow" id="dropDownLink" href="#">Go To</a>
            <div class="dropdown-positioning">
                <div class="dropdown-positioning-static">
                    
    <ul id="dropdownmenuitems" class="lia-menu-dropdown-items">

 

In the above example, all the <div class...> should dissappear..

______________________________________________________________________
Message 20 of 149
(6,412 Views)