Kenny Woo

finding new ways to procrastinate.

github twitter linkedin email rss
Regex Sorcery
Feb 4, 2014
2 minutes read

One of my personal development goals this quarter is to learn to use the sorcery that is known as regular expressions (regex).

To quote wikipedia: > “In theoretical computer science and formal language theory, a regular expression (abbreviated regex or regexp) is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations.”

Just for fun, check out this ridiculously long (script-generated) regular expression used to validate RFC822 email addresses.

When I see regexes I think to myself, “I really hope I don’t need to debug that”. But then what would be the point of them? It seems the general consensus is that, though regexes are ugly, they cut down on code that could potentially be much uglier. On top of that, regexes are relatively portable between various languages. So what I’m hearing is regexes provide a portable, powerful, and concise way of searching text. And that sounds pretty darned awesome.

Recently, I had to look through a 500+ line structure filled with pointers in order free its memory from the heap. Having just began to dabble in regexes, I figure this would be a fine exercise in using grep and regexes to find all pointers within the struct definition.

Pointer declarations take the form:

type *var_name;

After about 10 minutes, I had refined my expression to the following:


Literally, this means find a (escaped) star, then zero or more whitespaces, then one or more alpha-numeric characters plus underscore, followed by a semicolon.

Using the above in a grep in bash would look like the following:

egrep "\*[[:space:]]*[a-zA-Z0-9_]+;" "file.c"

And voila! All pointer definitions will be listed without having to manually go through 500+ lines of code to copy and paste each one. If that’s not black magic, I don’t know what is. Time for me to go Expecto Patronum some Dementors.

Back to posts