Software Development Methods and Tools—CSCI-3308

Assignment 1—Bash shell scripts

Due date

February 2, 2017 at 6:00 pm

Objectives

Part 1 - Bash shell scripts

Submit two separate bash shell scripts that perform these steps:

  1. Accept a file name as the first command line argument
  2. Read in the data file
  3. Calculate the average of the scores
  4. Sort the output by last name then first name
  5. Format the output as shown below

The objective of writing two scripts is to see that there are multiple correct solutions to such problems. One solution should use awk, and the other should use bash commands.

An example data file is shown below.

123456789 Lee Johnson 72 85 90
999999999 Jaime Smith 90 92 91
888111818 JC Forney 100 81 97
290010111 Terry Lee 100 99 100
199144454 Tracey Camp 77 84 84
299226663 Laney Camp 70 74 71
434401929 Skyler Camp 78 81 82
928441032 Jess Forester 85 80 82
928441032 Chris Forester 97 94 89

If the data above is stored in a file named grades.txt, we would execute the two scripts with:

$ grades.sh grades.txt
$ grades-awk.sh grades.txt

Output

71 [299226663] Camp, Laney
80 [434401929] Camp, Skyler
81 [199144454] Camp, Tracey
93 [928441032] Forester, Chris
82 [928441032] Forester, Jess
92 [888111818] Forney, JC
82 [123456789] Johnson, Lee
99 [290010111] Lee, Terry
91 [999999999] Smith, Jaime

Note: we will accept both rounded and truncated averages.

Part 2 - Regular expressions

Copy the provided zip file to your home directory and decompress it using the unzip command:

curl -L https://gist.github.com/dgraham/acfdc4ffc2d6e74fd587/archive/f6f52f1d2a89d627cdee9f3ae76f23f4eefa24ce.zip > hw1.zip
unzip hw1.zip -d hw1
cd hw1

Only the regex_practice_data.txt file is required for the assignment.

Write a shell script that uses regular expressions to answer the following questions.

Hint: grep and egrep are your friends (egrep treats { } differently than grep). Be sure to check for word boundaries in your answers using \b where appropriate. Pipe answers to wc –l to get the count.

There are multiple correct solutions for each of the questions below.

  1. How many lines end with a number?
  2. How many lines do not start with a vowel?
  3. How many 12 letter (alphabet only) lines?
  4. How many phone numbers are in the dataset?

     format: _ _ _-_ _ _-_ _ _ _
    
  5. How many city of Boulder phone numbers?

     format: 303-_ _ _-_ _ _ _
    
  6. How many begin with a vowel and end with a number?
  7. How many email addresses are from geocities? (e.g. end with geocities.com)?
  8. How many incorrect email address are there (lines with an @ in it but formatted incorrectly)? An email address has a user id and domain names can consist of letters, numbers, periods, and dashes. An email address has to have a top-level domain (e.g. something.com).

Suggestions

Requirements

  1. Scripts must be bash files named
    • grades.sh
    • grades-awk.sh
    • regex-answers.sh
  2. The first line must be #!/bin/bash
  3. The second line in each file is a comment with your name (and your partner’s name if you pair program).
  4. For all scripts, read in the name of the data file from command line arguments. We will test with additional data files that have different names!
  5. Grades scripts
    • The data files for the grades scripts will be the same format as shown, though it may have more or less lines in the file. All students have three grades in the data files.
    • Print out the data with the average, the ID in square brackets, then the last name, comma, space, first name.
    • The output also needs to be sorted, first based on the last name. If the last name is the same, sort then on the first name. If the person has the same last name and first name, then sort based on the ID. All IDs are unique in the file.
    • If the program is run without a file name as the single command line argument, print out the usage statement:

       Usage: grades.sh filename
       Usage: grades-awk.sh filename
      

    You may notice that this is how most Unix commands are set up.

  6. Regex program
    • Each line of output should map to the question. There are eight questions so you should only have eight lines of output, which is the output from calling wc –l. If you do not know how to do one of the answers print out this placement so that the rest of your answers align in the output: echo "0"
    • If the program is run without a file name as the single command line argument, print out the usage statement:

       Usage: regex-answers.sh filename
      

Credit

To receive credit for this assignment:

Assignment material by Liz Boese.