Software Development Methods and Tools—CSCI-3308

Regular expressions

Download the slides.

We wrote a regular expression to parse obscured email addresses in lecture. It demonstrates character classes, capturing groups, repetition, and alternation.

Awk pipeline

Today I needed to add a partial prefix index to a file path column in MySQL. We store 1024 bytes of the path but MySQL has a 767 byte index length limit, so we can only index the first n bytes.

I was curious what the longest path name was in our git repository, as an example, so I wouldn’t have to guess at the right length.

Here’s the Unix pipeline I used to answer that question:

find . | awk '{print length}' | sort -n

You can run the same command in your home directory to see the output. Run each command alone, without the pipes, to see how the output of each is sent as input to the next command.

The longest file path name in the repository is 199 bytes, and I chose to make the MySQL index 255 bytes wide.