FYI, I am still slowly reading this book:
Today I reached a page where it gives example of how to use regular expression to split string shown below:
“Acme, Inc”,”Excalibur Pte Ltd”,”Blackrock, Pvt”
Into collection of sub-strings shown below:
Acme, Inc
Excalibur Pte Ltd
Blackrock, Pvt
The regex pattern that the book gives is:
var rgx = new Regex("(?:^|,)(?=[^\"]|(\")?)\"?((?(1)[^\"]*|[^,\"]*))\"?(?=,|$)");
When I saw this pattern, I was totally shell-shocked by my inability to comprehend it. So here’s my attempt to understand it. First, I am breaking the groups found within the pattern to help me digest it.
(?:^|,)
According to the documentation, (?:subPattern) is a non-capturing group. It means this pattern will match to the beginning of line or comma, but it will not create a new group.
(?=[^\"]|(\")?)
(?=subPattern) means Zero-width positive lookahead assertion. So we expect anything other than quote or a quote, appear after the previous group. But the matching string can still be use by the next group.
\"?((?(1)[^\"]*|[^,\"]*))\"?
This pattern will capture everything, except comma
(?=,|$)
Another non-capturing group that will match comma or end of the string
That’s all, I hope it helps!