CS377: Database Design - Data Modeling and Normalization

Activity Goals

The goals of this activity are:
  1. To model database dependencies using a formal notation
  2. To normalize database schemas at varying degrees (first, second, and third normal forms)

The Activity

Directions

Consider the activity models and answer the questions provided. First reflect on these questions on your own briefly, before discussing and comparing your thoughts with your group. Appoint one member of your group to discuss your findings with the class, and the rest of the group should help that member prepare their response. Answer each question individually from the activity on the Class Activity Questions discussion board. After class, think about the questions in the reflective prompt and respond to those individually in your notebook. Report out on areas of disagreement or items for which you and your group identified alternative approaches. Write down and report out questions you encountered along the way for group discussion.

Model 1: Determinant / Dependent Relationships

\(SSN \longrightarrow Name\)
\(SSN, PhoneNumber \longrightarrow Name, PhoneNumber\)
\(SSN \longrightarrow Name\), \(Name \longrightarrow Address\)

Questions

  1. What is the determinant attribute and the dependent attribute above? In other words, if you know the determinant, you can look up the dependent value?
  2. What determinant / dependent relationships can you find from the data schema in the ER Modeling Activity?
  3. Is a primary key a determinant or a dependent? How about records with a foreign key?
  4. Describe, in your own words, the axiom of augmentation from the second rule above.
  5. The Phone Number field should not be part of the primary key; how can we break up this relationship to create a normalized schema?
  6. Describe the axiom of transitivity, showing that Address ultimately depends upon SSN from the third relationship above.

Model 2: Normalization and Normal Forms

Unnormalized data schema from https://en.wikipedia.org/wiki/Database_normalization under a Creative Commons Attribution-ShareAlike License
There are many normal forms, and you have been following unnormalized form already because you incorporate a primary key in your tables! We will explore first, second, and third normal forms.

Questions

  1. The database contains multiple values within one column (Subject). Normalize to first normal form (1NF) by re-designing this schema to make Subject a dependency in another table, and thus make the Subject value atomic (singular valued).
  2. What is the composite primary key of this table? To establish second normal form (2NF), establish a single column primary key. Move the composite key columns into their own table that you can link via a foreign key.
  3. To establish third normal form (3NF), identify any transitive dependencies, and create separate tables for each group of related columns. This way, no non-key field values depend on one another within the same table.
  4. Sixth normal form (6NF) states that each row contains a primary key and just one additional column! What is the major drawback that precludes 6NF in practice? What benefit might be obtained if 6NF were achieved?
  5. Suppose you have a table with a person's entire postal address in a single column. How would you normalize this to 1NF?

Submission

Submit your answers to the questions using the Class Activity Questions discussion board. You may also respond to questions or comments made by others, or ask follow-up questions there. Answer any reflective prompt questions in the Reflective Journal section of your OneNote Classroom personal section.