Assignment Goals

The goals of this assignment are:
1. To match text patterns using Regular Expressions

Please refer to the following readings and examples offering templates to help get you started:

The Assignment

The purpose of this assignment is to practice matching patterns in texts using regular expressions.

Part 1: Warmup

The Python language provides a regular expression processing library called re that you can use to match and replace text in a String. Here’s how it works:

import re

file = open("somefile.txt")

offset = 0
result = []

# This will return a Match object for the first semicolon in the string
match = re.search(r";", data)

# This will print a list of the starting index (from 0) and the ending index (non-inclusive) of the match
print(match.span())

result.append(match.span()) # keep track of all the matching positions, relative to the prior match position

nextIndex = match.span()
nextIndex = nextIndex[1]

offset = offset + nextIndex # keep track of the absolute position of the most recent match

# This will search the rest of the string for the next instance
# ... and return None if no match is found
match = re.search(r";", data[offset:])


Create a text file called somefile.txt (or something else, and you can modify your code to use this file instead!). Modify the code example above to search for semicolons in your file, in a loop, printing out the span of all found matches until the nextIndex exceeds the length of the file data, or until no match is found.

Put this code into a function that returns an array of match spans for a given regular expression string and text data:

import re

def regexmatch(pattern, file):
# You can append your match spans to this list
result = []

return result


Don’t forget to replace r";" with the variable parameter pattern when you merge into the function!

Part 2: Regular Expressions

Write a new program based on your warmup that computes the following regular expressions, and prints a list of all the spans matching each expression. For each span that you find, print the substring (keep in mind that the ending index is non-inclusive, so your substring can omit the very last character index in each returned match).

1. Each substring ending with a semicolon: match the whole substring. Hint: here is the regular expression: r"\w*;". The r character before the String indicates to Python that this is a “raw” String, so no escape characters are needed. Alternatively, you could use this: "\\w*;".
2. All characters inside a set of parentheses. You can use $$ and $$ to represent an opening and closing parenthesis in your regular expression String (don’t forget to mark the String as a raw String so you don’t have to also escape the backslash character! You may use \w again to represent a character to match inside the parenthesis.
3. Valid variable names. A valid variable name can be formed in one of two ways: first, a lowercase letter (not a number), followed by zero or more upper case letters, lower case letters, or numbers; alternatively, a variable name may be an upper case letter (not a number) followed by zero or more upper case letters or numbers (no lower case letters). So these are valid: aVariableName1, piTimesD, FALSE, but these are not: 2piR, AnotherVariable.
4. All phone numbers. Phone numbers contain the following format: 1-215-555-1212; however, the leading “1-“ is optional, and the dashes may be either dashes or spaces.
5. All lines beginning with the word DEPOSIT:.

Part 3: Limitations of Regular Expressions

In your README, discuss why it would be impossible to write a single regular expression to ensure that all opening parentheses are properly nested inside other parentheses, and balanced. In other words, why is it difficult to write a regular expression that matches these Strings: (()), (()()(())), but not these: ((()(), ())? Note: do not attempt to write this regular expression, beyond identifying that it cannot be done!

Part 3: Replacements

The re.sub(pattern, replacement, data) will look for the regular expression defined by pattern in the text specified by data, and replace each instance with the string replacement. Write a program or function to replace all instances of CS374 or CS 374 with Principles of Programming Languages.

Submission

• Describe what you did, how you did it, what challenges you encountered, and how you solved them.
• Please answer any questions found throughout the narrative of this assignment.
• If collaboration with a buddy was permitted, did you work with a buddy on this assignment? If so, who? If not, do you certify that this submission represents your own original work?
• Please identify any and all portions of your submission that were not originally written by you (for example, code originally written by your buddy, or anything taken or adapted from a non-classroom resource). It is always OK to use your textbook and instructor notes; however, you are certifying that any portions not designated as coming from an outside person or source are your own original work.
• Approximately how many hours it took you to finish this assignment (I will not judge you for this at all...I am simply using it to gauge if the assignments are too easy or hard)?
• Your overall impression of the assignment. Did you love it, hate it, or were you neutral? One word answers are fine, but if you have any suggestions for the future let me know.