Lab 2—Regular expressions
Objectives
- Use regular expressions with common Unix commands
- Practice using useful Unix commands
- diff
- wc
- cut
- sed
- awk
- Practice creating and running bash shell scripts
- Practice using pipes
Exercises
For each step please record the commands (and options) that you used to complete the task. At the end you will receive credit for the lab by showing your TA these commands.
Step 1 - Download practice files
For today’s lab we will be using the following data files:
- fruitsOld.txt
- fruitsNew.txt
- testPasswd.txt
- grades.txt
- leetSpeak.txt
- regex_practice_data.txt
Copy the provided zip file to your home directory and decompress it using the unzip command:
curl -L https://gist.github.com/dgraham/acfdc4ffc2d6e74fd587/archive/f6f52f1d2a89d627cdee9f3ae76f23f4eefa24ce.zip > lab2.zip
unzip lab2.zip -d lab2
cd lab2
Check to make sure each of the above files was correctly unzipped into the lab2 directory.
Step 2 - Use the diff command
- Which “fruits” have been added to or removed from fruitsOld.txt to get to fruitsNew.txt.
- What do the
>
or<
character mean at the beginning of each line in the output of diff? - Try using the
-c
option. What does that do?
Step 3 - Use the wc command
- Find the number of lines in the testPasswd.txt file
- Find the number of characters in the testPasswd.txt file
Step 4 - Use the cut command
- Print a list of the usernames from the testPasswd.txt file (print the first column only)
- Print out only the LN column and HW1 grade column from the grades.txt file
Step 5 - Practice using pipes
- Use
cut
,sort
, anduniq
to print out the groups that users are in within the testPasswd.txt file. (Each group is a number. When printing each group should get a line and there should be no duplicates printed) - Pipe the output of the above into a file in your home directory
- Use
grep
andcut
to filter the testPasswd.txt file to only display usernames that start withm
w
ors
and their home directories (sixth column)
Step 6 - Use the sed command
- Using
sed
and regular expressions try playing around with the leetSpeak.txt file. - Remove all the letters
- Remove all the numbers
- Replace all numbers with an underscore (
_
) - Create a script that pipes together multiple
sed
commands to replace each number with its matching character. (This can be done without piping, how? For this problem please use pipes) - It is possible that you may want to reuse this script on another file. How can you make it so that the script does not have to change each time you want to run it on a different file?
Step 7 - Use the awk command
- Using the grades.txt file print out the first and last name of each student and calculate/print the grade in percentage that they currently have (assuming equal weights for each assignment).
- Using the grades.txt file calculate and print the class average for the lab1 assignment
Step 8 - More practice with regular expressions
For the following problems use grep
or egrep
with the regex_practice_data.txt
file.
- How many phone numbers are in the dataset?
- How many city of Boulder phone numbers (e.g. starting with 303-441-…)?
- How many email addresses?
- How many email addresses are from government domains (e.g. ‘.gov’)?
- How many email addresses are in ‘first.last’ name format AND involve someone who’s first name starts with a letter in the first half of the alphabet?
Credit
To get credit for this lab exercise, show the TA your code and run your program.
Lab material by Liz Boese.