BreakPoint

cancel
Showing results for 
Search instead for 
Did you mean: 

Regular Expressions Board

I need to parse a large (relatively speaking) text file for different information stored within different header areas..

I do not know if a regular expression is the better way to go, but I am trying to develop an elegant solution.

 

Since I cannot post the actual file, I will post an accurate mockup of the file, well... the area of intest... and take it from there..

 

Picture a large text file that somewhere hidden within it is the following:

(the parameter names have been changed to protect the innocent)

 

......

lots of stuff

.....

 

:30    :00000000
:31    :D00E0000
:03    :00000081
:03    :00000003
##END_OF_ABC_PARM_NORM_TABLE##

##SEP_OF_ME_IS_A_CONFIG_TABLE##
MyName                :ab50_67_cdef
CIRCUS                       :67
MyFirmwareID             :1
MyMinThresh           :10
MyFreq            :3520833
SomeThreshold             :-66
AnotherThreshold         :20480
Mode       :None
somesetup                  :ABCD   :63.0933     :1     :0.5000

##START_OF_OTHER_PARM_TABLE##
Offset    RegValue
:03    :22221000
:03    :22221001
:10   

 

 

 

 

In the above data, I want to extract the value for "SomeThreshold".  Yes... notice all the \s in there..

 

In the same file there is also data in a linear format..

 

 

:C17    :0    :0    :22025    :23225
:C18    :0    :0    :22075    :23275
##END_OF_STUFFABOVE##

##START_OF_MORE_CONF_PARAM##
Mode        AThreshold    Thresh1    Thresh2    Limit1 Limit2 Important ModeAvail AnotherMode SomethingImportant Offset1 Value1 Offset2 Value2 Offset3 Value3 Offset4 Value4 Offset5 Value5
:blabla        :20    :0.36    :0.04    :2745 :13736    :-81    :3    :27    :-23    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0   
:moreblabla        :20    :0.36    :0.04    :10000 :13736    :-81    :3    :27    :-18    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0   
:evenmorebla       :20    :0.36    :0.04    :10000 :15680    :-75    :3    :24.5    :-20.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0   
:theblacontinues        :20    :0.36    :0.04    :10000 :12032    :-70    :3    :24    :-21    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0   
:willnotend        :20    :0.36    :0.04    :10000 :15626    :-68    :3    :22.5    :-22.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0   
:somemore    :20    :0.36    :0.04    :10000 :11650    :-62    :3    :21    :-24    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0   
:thisistheone    :20    :0.36    :0.04    :10000 :13695    :-59    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0   
:moreblabla    :20    :0.36    :0.04    :10000 :11650    :-62    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0   
:lotsofblabla    :20    :0.36    :0.04    :10000 :12995    :-59    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0   
##END_OF_OF_MORE_CONF_PARAM##

##START_OF_YETANOTHER_PARAM##
Mommy

,,,,,,,,,   

 

 

 

In this one, I want to extract the data for line "thisistheone" and values for "Important" and "SomthingImportant".

 

 

One thing I should also mention is that the above examples of text are repeated numerous times with a slightly different header name.

 

 

Message 41 of 150
(10,918 Views)

@Ray.R wrote:

I need to parse a large (relatively speaking) text file for different information stored within different header areas..

I do not know if a regular expression is the better way to go, but I am trying to develop an elegant solution.

 

Since I cannot post the actual file, I will post an accurate mockup of the file, well... the area of intest... and take it from there..

 


 

 



Try this:

 

Regular Expression Parser.png



Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot
Message 42 of 150
(10,910 Views)

I am looking forward to trying out his elegant solution using a regular expression.

You guys are wizzards at this.  I am anxious to get 1% as good as you guys..

 

It will have to wait until morning...  (Unless....)

 

I'll share how it goes.  🙂

 

Thanks!

Message 43 of 150
(10,904 Views)

 

Sorry, I missed an important detail.  It's a little bit more complicated.

Let's deal with the first portion..

 

Similar headers are repeated, as well as as parameter names.  The task is to look for the appropriate header and MyName, then look for the corresponding "SomeThreshold" within the matching area.

 

So for instance, within the following header:

##SEP_OF_ME_IS_A_CONFIG_TABLE##

 

whose name is:

MyName                :ab50_34_cdef


look for:

SomeThreshold             :-55

 

 

 

 

 

 

 

 

 

 

 

 

lots of stuff

.....

 

:30    :00000000
:31    :E00E0000
:03    :00000081
:03    :00000003
##END_OF_ABC_PARM_NORM_TABLE##

##SEP_OF_ME_IS_A_CONFIG_TABLE##
MyName                :ab40_67_cdef
CIRCUS                       :67
MyFirmwareID             :1
MyMinThresh           :10
MyFreq            :3520833
SomeThreshold             :-66
AnotherThreshold         :20480
Mode       :None
somesetup                  :ABCD   :63.0933     :1     :0.5000


##SEP_OF_ME_IS_A_CONFIG_TABLE##
MyName                :ab50_34_cdef
CIRCUS                       :34
MyFirmwareID             :1
MyMinThresh           :10
MyFreq            :3520833
SomeThreshold             :-55
AnotherThreshold         :20480
Mode       :None
somesetup                  :ABCD   :63.0933     :1     :0.5000

 

##SEP_OF_ME_IS_A_CONFIG_TABLE##
MyName                :ab60_78_cdef
CIRCUS                       :78
MyFirmwareID             :1
MyMinThresh           :10
MyFreq            :3520833
SomeThreshold             :-22
AnotherThreshold         :20480
Mode       :None
somesetup                  :ABCD   :63.0933     :1     :0.5000

 

##SEP_OF_ME_IS_A_CONFIG_TABLE##
MyName                :ab70_92_cdef
CIRCUS                       :92
MyFirmwareID             :1
MyMinThresh           :10
MyFreq            :3520833
SomeThreshold             :-44
AnotherThreshold         :20480
Mode       :None
somesetup                  :ABCD   :63.0933     :1     :0.5000

 

 

......

 

some more stuff

Message 44 of 150
(10,891 Views)

The other portion is also repeated with slightly different header names..

 

 

In this example, I want to find values located within the following header

##START_OF_MORE_CONF_PARAM3##

 

For parameters:
Mode        AThreshold    Thresh1    Thresh2    Limit1 Limit2 Important ModeAvail AnotherMode SomethingImportant Offset1 Value1 Offset2 Value2 Offset3 Value3 Offset4 Value4 Offset5 Value5

 

located on this line:
:thisistheone    :20    :0.36    :0.04    :10000 :13695    :-59    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0 

 

I will find out this morning if there are other constrains such as changing fields, which means that the location within the line may not be reliable, thus needing to get the index from the name.  By this time, you get the picture of the complexity and the reason I'd like to go to a regular expression, even if it is a partial solution, it will be much more elegant than what I started to implement 😉

 

 

 

(Can't believe I exceeded 10,000 characters...  splitting the post into 2)

 

 I had to split the post into 3 sections..  The bottom 2 sections should be a single text.  I decided to attach a text file to make things easier..

 

 

 

Message 45 of 150
(10,887 Views)

(took out all the formatting)  ---- DARN!!  I have to split it again!!

 

:C17    :0    :0    :22025    :23225
:C18    :0    :0    :22075    :23275
##END_OF_STUFFABOVE##

##START_OF_MORE_CONF_PARAM1##
Mode        AThreshold    Thresh1    Thresh2    Limit1 Limit2 Important ModeAvail AnotherMode SomethingImportant Offset1 Value1 Offset2 Value2 Offset3 Value3 Offset4 Value4 Offset5 Value5
:blabla        :20    :0.36    :0.04    :2745 :13736    :-81    :3    :27    :-23    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:moreblabla        :20    :0.36    :0.04    :10000 :13736    :-81    :3    :27    :-18    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:evenmorebla       :20    :0.36    :0.04    :10000 :15680    :-75    :3    :24.5    :-20.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:theblacontinues        :20    :0.36    :0.04    :10000 :12032    :-70    :3    :24    :-21    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:willnotend        :20    :0.36    :0.04    :10000 :15626    :-68    :3    :22.5    :-22.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:somemore    :20    :0.36    :0.04    :10000 :11650    :-62    :3    :21    :-24    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:thisistheone    :20    :0.36    :0.04    :10000 :13695    :-99   :3    :19.5    :-10.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:moreblabla    :20    :0.36    :0.04    :10000 :11650    :-62    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:lotsofblabla    :20    :0.36    :0.04    :10000 :12995    :-59    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
##END_OF_OF_MORE_CONF_PARAM##

##START_OF_MORE_CONF_PARAM2##
Mode        AThreshold    Thresh1    Thresh2    Limit1 Limit2 Important ModeAvail AnotherMode SomethingImportant Offset1 Value1 Offset2 Value2 Offset3 Value3 Offset4 Value4 Offset5 Value5
:blabla        :20    :0.36    :0.04    :2745 :13736    :-81    :3    :27    :-23    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:moreblabla        :20    :0.36    :0.04    :10000 :13736    :-81    :3    :27    :-18    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:evenmorebla       :20    :0.36    :0.04    :10000 :15680    :-75    :3    :24.5    :-20.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:theblacontinues        :20    :0.36    :0.04    :10000 :12032    :-70    :3    :24    :-21    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:willnotend        :20    :0.36    :0.04    :10000 :15626    :-68    :3    :22.5    :-22.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:somemore    :20    :0.36    :0.04    :10000 :11650    :-62    :3    :21    :-24    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:thisistheone    :20    :0.36    :0.04    :10000 :13695    :-88    :3    :19.5    :-15.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:moreblabla    :20    :0.36    :0.04    :10000 :11650    :-62    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:lotsofblabla    :20    :0.36    :0.04    :10000 :12995    :-59    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
##END_OF_OF_MORE_CONF_PARAM##

Message 46 of 150
(10,885 Views)


##START_OF_MORE_CONF_PARAM3##
Mode        AThreshold    Thresh1    Thresh2    Limit1 Limit2 Important ModeAvail AnotherMode SomethingImportant Offset1 Value1 Offset2 Value2 Offset3 Value3 Offset4 Value4 Offset5 Value5
:blabla        :20    :0.36    :0.04    :2745 :13736    :-81    :3    :27    :-23    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:moreblabla        :20    :0.36    :0.04    :10000 :13736    :-81    :3    :27    :-18    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:evenmorebla       :20    :0.36    :0.04    :10000 :15680    :-75    :3    :24.5    :-20.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:theblacontinues        :20    :0.36    :0.04    :10000 :12032    :-70    :3    :24    :-21    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:willnotend        :20    :0.36    :0.04    :10000 :15626    :-68    :3    :22.5    :-22.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:somemore    :20    :0.36    :0.04    :10000 :11650    :-62    :3    :21    :-24    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:thisistheone    :20    :0.36    :0.04    :10000 :13695    :-59    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:moreblabla    :20    :0.36    :0.04    :10000 :11650    :-62    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:lotsofblabla    :20    :0.36    :0.04    :10000 :12995    :-59    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
##END_OF_OF_MORE_CONF_PARAM##


##START_OF_MORE_CONF_PARAM4##
Mode        AThreshold    Thresh1    Thresh2    Limit1 Limit2 Important ModeAvail AnotherMode SomethingImportant Offset1 Value1 Offset2 Value2 Offset3 Value3 Offset4 Value4 Offset5 Value5
:blabla        :20    :0.36    :0.04    :2745 :13736    :-81    :3    :27    :-23    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:moreblabla        :20    :0.36    :0.04    :10000 :13736    :-81    :3    :27    :-18    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:evenmorebla       :20    :0.36    :0.04    :10000 :15680    :-75    :3    :24.5    :-20.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:theblacontinues        :20    :0.36    :0.04    :10000 :12032    :-70    :3    :24    :-21    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:willnotend        :20    :0.36    :0.04    :10000 :15626    :-68    :3    :22.5    :-22.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:somemore    :20    :0.36    :0.04    :10000 :11650    :-62    :3    :21    :-24    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:thisistheone    :20    :0.36    :0.04    :10000 :13695    :-77    :3    :19.5    :-35.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:moreblabla    :20    :0.36    :0.04    :10000 :11650    :-62    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
:lotsofblabla    :20    :0.36    :0.04    :10000 :12995    :-59    :3    :19.5    :-25.5    :0    :0    :0    :0    :0    :0    :0    :0    :0    :0  
##END_OF_OF_MORE_CONF_PARAM##


##START_OF_YETANOTHER_PARAM##
Mommy

,,,,,,,,,

 

Message 47 of 150
(10,884 Views)

As expected... I found out that the relative position within the header may change.

In the case of "Important" & "SomethingImportant", their relative locations may also change, which is why the position needs to be determined by the name.

 

I am already taking care of another delta which is the fact that the name may also change, but that's beyond the scope of this exercise..  😮

 

Now I am really curious at seeing the power of regex and compare it with whay I am developing (probably a Rube candidate).

Message 48 of 150
(10,878 Views)

You won't be able to do this with regular expression only. You will need some state logic to parse the data. Your first requirement is fairly easy. Search for teh desired #NAME you want and then get the value for MyNane:I generally like to bracket my searchs so you don't get a value for MyName from some other section. For instance, if you know the section will start with '#' and the pound sign will not appear in any of the data you can use that to bracket the area of teh data that you are interested in. Pull out the chunk of data between the section name and the next #. Now you can search within this for the various parameters of interest. If you need to validate the particular data set using MyName that will be trivial.

 

Using a similar approach to above you can extract the header names from and position from the "Mode" line. Using this information you can construct the regular expression I gave you with the appropriate positions for the data you want.

 

I don't have the time at the moment to actually write up the code. If my explaination is not clear enough let me know. I will see if I can actually put togetther an example.



Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot
Message 49 of 150
(10,871 Views)

Thanks Mark,

 

I am working on a solution which sounds similar to what you are describing.  I am doing it in 2 sections.  The first part would be to match any character between

SEP_OF_ME_IS_A_CONFIG_TABLE
and

:ab50_34_cdef

 

from:

 

##SEP_OF_ME_IS_A_CONFIG_TABLE##
MyName                :ab50_34_cdef
CIRCUS                       :34
MyFirmwareID             :1
MyMinThresh           :10
MyFreq            :3520833
SomeThreshold             :-55
AnotherThreshold         :20480
Mode       :None
somesetup                  :ABCD   :63.0933     :1     :0.5000

 

 

I would then use your code example to find "AnotherThreshold" and it's value (-55).

 

However, I am still experimenting with the regex to do what I described above.  Hopefully I will figure it out before you get to post how that's done.

 

I haven't started working on the 2nd portion, yet.

Message 50 of 150
(10,865 Views)