11-17-2023 01:09 PM
I have string like this: token1; token2, (token3, token4), {token5; token6}.
I would like to split it into array by comma or semicolon as a delimiters, but I do not want to split portion in any sort of parentheses (like {}, [], (), <>).
So, my output array should have four items:
[0]: token1
[1]: token2
[2]: (token3, token4)
[3]: {token5; token6}
"Scan String For Tokens" or similar approaches perfectly do the job without considering parentheses, but I really need it to keep parentheses together within the same output token...
Are there any good ideas?
11-17-2023 01:41 PM
Have you considered first preprocessing the string through the "Match Pattern" function to pull out all instances of strings bounded by parentheses? It's easy enough to do if you don't mind if the order of the elements in the final array is not important. It gets a bit tougher if you want to maintain the same order as found in the string.
11-18-2023 11:23 AM
PCRE is a beast. No preprocessing or any other complicated stuff is needed. A not-so-simple regular expression will do the trick.
My quick solution is
"[;,](?=(?:(?:[^\(\)\{\}\[\]]*+(?<!\\)[\(\)\{\}\[\]]){2})*+[^\(\)\{\}\[\]]*+\Z)"
(without the quotation marks, of course)
Some work may needed. You have to test it with real data.
11-18-2023 01:53 PM - edited 11-18-2023 01:54 PM
If performance and code readability is important (regex are typically slow!), I would just rattle trough the bytes as follows:
Of course if your brackets are not well behaved (i.e. not alternating in on/off function) more code is needed. Not sure if you also want to trim whitespace from the entries. I am sure it needs a bit finetuning.