IDS 410: 4 The Enhanced E-R Model and Business Rules

Introduction

Logical database design is the process of transforming the conceptual data model (Ch3 & 4) into a logical data model. Although there are other data models, the relational data model has been emphasized for two reasons:

The relational data model is most commonly used in contemporary database applications
Some of the principles of logical database design for the relational model apply to the other logical models as well.

The objective of logical database design is to translate the conceptual design (which represents an organization's requirements for data) into a logical database design that can be implemented on a chosen database management system.

The Relational Data Model

The relational data model was first introduced in 1970 by E. F. Codd, then of IBM (Codd, 1970).

Basic Definitions

The relational data model represents data in the form of tables, and consists of the following three components:

Data Structure. Data are organized in the form of tables with rows and columns.
Data Manipulation. Powerful operations (using SQL) are used to manipulate data stored in the relations. (This forms the subject matter of Chapters 7, 8, and 10)
Data Integrity. Facilities are included to specify business rules that maintain the integrity of data when they are manipulated.

Relational Data Structure

Relation: A named, two-dimensional table of data. Each relation (or table) consists of a set of named columns and an arbitrary number of unnamed rows.

An attribute is a named column in a relation. Each row of a relation corresponds to a record that contains data (attribute) values for a single entity.

It is important to note that the term relation refers to structure and not content. In other words, any particular collection of rows and columns (records and attributes) is an instance of the relation (table) that holds it. We can express the structure of a relation by a shorthand notation, thus:

EMPLOYEE(Emp_ID,Name,Dept,Salary)

Relational Key

Primary Key: An attribute (or combination of attributes) that uniquely identifies each row in a relation. We designate a primary key by underlining the attribute name.

(The concept of primary key is related to the term "identifier" defined in Ch3.) The same attributes (or attributes) indicated as an entity's identifier in an E-R diagram may be the same attributes that compose the primary key in the relation representing that entity. There are, however, exceptions:

Associative entities do not have to have an identifier
The identifier of a weak entity forms only part of a weak entity's primary key
There may be several attributes of an entity that may serve as the associated relation's primary key

These situations are explored later.

Composite Key: A primary key that consists of more than one attribute.

Foreign Key: An attribute (possibly composite) in a relation of a database that serves as the primary key of another relation in that same database.

Properties of Relations

We have defined relations as two-dimensional tables of data. However, not all tables are relations. Relations have several properties that distinguish them from nonrelational tables:

Each relation (or table) in a database has a unique name.
An entry at the intersection of each row and column is atomic (or single-valued). There can be no multi-valued attributes in a relation
Each row is unique; no two rows in a relation are identical
Each attribute (column) within a table has a unique name
The sequence of columns (left to right) is insignificant. The columns of a relation can be interchanged without changing the meaning or use of the relation
The sequence of rows (top to bottom) is insignificant. The rows of a relation may be interchanged or stored in any sequence.

Integrity Constraints

The relational model includes several types of constraints, or business rules, whose purpose is to facilitate maintaining the accuracy and integrity of data in the database. The major types of integrity constraints are:

Domain Constraints

All the values that appear in a column of a relation must be taken from the same domain.

A domain is a set of values that may be assigned to an attribute. A domain specification usually consists of the following components: domain name, meaning, data type, size (or length), and allowable values or allowable range (if applicable).

Entity Integrity

The entity integrity rule is designed to assure that every relation has a primary key, and that the data values for that primary key are all valid. In particular, it guarantees that every primary key attribute is non-null. The entity integrity rule states that no primary key attribute (or component of a primary key attribute) may be null.

Null: A value that may be assigned to an attribute when no other value applies or when the applicable value is unknown. (In reality, a null is not a value but rather the absence of a value.)

Referential Integrity

Referential Integrity Constraint: Either each foreign key value must match a primary key value in another relationship or the foreign key value must be null. This rule maintains consistency among the rows of two (related) relations.

How do we know if a foreign key is allowed to be null?

Let's consider two relations, CUSTOMER and ORDER. We know that a customer can place an order and that an order must have been place by one and only one customer. We also know that the primary key of the CUSTOMER relation will be a foreign key in the ORDER relation.

Now, since every order has to have been placed by a customer, then the CUSTOMER foreign key in ORDER cannot be null. If the minimum cardinality had been zero, then the foreign key could be null.

Whether a foreign key can be null must be specified as the property of the foreign key attribute when the database is defined.

But what happens to order data if we choose to delete a customer who has submitted order? Three choices are possible:

Delete the associated orders (called a cascading delete), in which case we lose not only the customer but also the sales history associated with that customer
Prohibit deletion of a customer until all associated orders are first deleted (a safety check).
Place a null value in the foreign key (an exception that say although an order must have a customer_ID value when the order is created, Customer_ID can become null later if the associated customer is deleted).

We will see how these choices are implement in SQL later.

Action Assertions

There are various techniques for defining and enforcing action assertions (or rules). These are discussed later.

Well-Structured Relations

Well-Structured Relation: A relation that contains minimal redundancy and allows users to insert, modify, and delete the rows in a table without errors or inconsistencies.

Redundancies in a table may result in errors or inconsistencies (called anomalies) when a user attempts to update the data in the table.

Anomaly: An error or inconsistency that may result when a user attempts to update a table that contains redundant data. There are three types of anomalies: insertion anomaly, deletion anomaly, and modification anomaly.

Transforming EER Diagrams into Relations

Step 1: Map Regular Entities

Each regular entity type in an ER diagram is transformed into a relation.
The name given to the relation is generally the same as the entity type.
Each simple attribute of the entity type becomes an attribute of the relation.
The identifier of the entity type become the primary key of the corresponding relation. (Check to ensure that the primary key satisfies the desirableproperties of identifier outlined in Ch3.

Composite Attributes

When a regular entity type has a composite attribute, only the simple component attribute of the composite attribute are included in the new relation. For example, the Customer_Address attribute of the regular entity type CUSTOMER would be mapped (decomposed) to its simple components: Street, City, State and Zip in the corresponding relation.

Multivalued Attributes

When a regular attribute entity type contains a multivalued attribute, two new relations (rather than one) are created. The first relation contains all of the attributes of the entity type except the multivalued attribute. The second relation contains two attributes that form the primary key of the second relation. The first of these attributes is the primary key from the first relation, which becomes a foreign key in the second relation. The second is the multivalued attribute. The name of the second attribute should capture the meaning of the multivalued attribute.

(The fact that the second relation contains no nonkey attributes, or descriptors, provide an opportunity to suggest to users that new attributes be added to the relation.)

Step 2: Map Weak Entities

We recall that weak entities do not have complete identifiers, but must have an attribute called a partial identifier that permits distinguishing the various occurrences of the weak entity for each owner entity instance.

For each weak entity type, create a new relation and include all of the simple attributes (or simple components of composite attributes) as attributes of this relation.
Include the primary key of the identifying relation as a foreign key attribute in this new relation. (The primary key of the new relation is the combination of this primary key of the identifying and the partial identifier of the weak entity type.)

Step 3: Map Binary Relationships

The procedure for representing relationships depends on both the degree of the relationship (unary, binary, ternary) and the cardinalities of the relationship.

Map Binary One-to-Many Relationships

Create a relation for each of the two entity types participating in the relationship
Include the primary key attribute (or attributes) of the entity on the one-side of the relationship as a foreign key in the relation that is on the many-side of the relationship (The primary key migrates to the many side.)

Map Binary Many-to-Many Relationships

Create a relation for each of the two entity types participating in the relationship
Create a relation for the associative entity.
Include the primary key of each of the two participating entity types as foreign key attributes in the relation created for the associative entity. (These become the primary key of each of the two participating entity types, and any nonkey attribute that are associated with the M:N relationship are included in the relation that represents the associative entity.)

Map Binary One-to-One Relationships

Binary one-to-one relationships can be viewed as a special case of one-to-many relationships; there are two steps required in mapping such as a relationship

First, two relations are created, one for each of the participating entity types
Second, the primary key of one relation is included as a foreign key in the other relation

In a 1:1 relationship, the association in one direction is nearly always optional one, while the association in the other direction is mandatory one. We should include the foreign key of the entity type that has the mandatory participation in the 1:1 relationship in the relation on the optional side of the relationship . This approach will avoid the need to store null values in the foreign key attribute. Any attribute associated with the relationship itself is also included in the same relation as the foreign key. (See example below.)

Let's consider, for example, a 1:1 relationship between NURSE (mandatory side) and WARD (optional side) called In_charge. A Date_Assigned attribute will be neither an attribute of NURSE nor of WARD but of the relationship In_charge. So, we include Nurse_ID as a foreign key attribute in the WARD relation. We also include Date_Assigned in the WARD relation. This also underscores this fact that a non-M:N relationship could have an attribute.

Step 4: Map Associative Entities

The first step is to create three relations, one for each of the two participating entity types, and the third for the associative entity. We refer to the relation formed from the associative entity as the associative relation.
The second step depends on whether on the E-R diagram an identifier was assigned to the associative entity.

Identifier Not Assigned

In this case, the default primary key consists of the two primary key attributes from the other two relations.

Identifier Assigned

In this case, we have assigned a surrogate identifier to the associative entity type on the E-R diagram. We might have done this for two reasons:
1. The associative entity has a natural identifier that is familiar to end users
2. The default identifier (consisting of the identifiers for each of the participating entity types) may not uniquely identify instances of the associative entity.
Hence, we modify the process for mapping the associative entity in this case as follows:
- As before, we create a new (associative) relation to represent the associative entity
- However, the primary key for this relation is the identifier assigned on the E-R diagram (rather than the default key)
- The primary keys for the two participating entity types are then included as foreign keys in the associative relation

Step 5: Map Unary Relations

Unary relationships are also called recursive relationships.

Unary One-to-Many Relationships

We map the entity type in a unary relationship to a relation using the procedure described in Step 1.
Then a foreign key attribute is added within the same relation that references the primary key values (this foreign key must have the same domain as the primary key--though preferably a different name, and is called a recursive foreign key.)

Recursive Foreign Key: A foreign key in a relation that references the primary key values of that same relation.

Unary Many-to-Many Relationships

In this case, two relations are created: one to represent the entity type in the relationship, and the other an associative relation to represent the M:N relation itself. The primary key of the associative relation consists of two attributes. These attributes (which, preferably, should have different names) both take their values from the primary key of the other relation. Any nonkey attribute of the relationship is included in the associative relation.

Step 6: Map Ternary (and n-ary) Relationships

We recall from Ch3 that it is recommended that we convert a ternary relationship to an associative entity in order to represent participation constraints more accurately.

To map an associative entity type that links three regular entity types, we create a new associative relation. The default primary key consists of the three primary key attributes for the participating entity types (in some cases additional attributes are required to form a unique primary key). These attributes then serve as foreign keys that reference the individual primary keys of the participating entity types.

The foregoing apply to n-nary relationships.

Step 7: Map Supertype/Subtype Relationships

The relational data model (relational schema) does not yet directly support supertype/subtype relationships. There are however various strategies that database designers can use to represent those relationships with the relational data model (Chouinard, 1989). We will use the following strategy to map supertype/subtype relationships.

Create a separate relation for the supertype and for each of its subtypes.
Assign to the relation created for the supertype the attributes that are common to all members of the supertype, including the primary key
Assign to the relation for each subtype the primary key of the supertype, and only those attributes that are unique to that subtype
Assign one (or more) attributes of the supertype to function as the subtype discriminator. (See Ch4 for the discussion about discriminators.)

Introduction to Normalization

Normalization is a formal process for deciding which attributes should be grouped together in a relation. So far, we have used "common sense" to group attributes into entity types during conceptual data modeling, and we subsequently mapped the E-R diagrams developed in the conceptual phase into relations (relational schema) in the logical phase. Before proceeding with physical design however, we need a tool to validate and improve our logical design, so that it satisfies certain constraints that avoid unnecessary duplication of data.

Normalization: The process of decomposing relations with anomalies to produce smaller, well-structured relations.

Steps in Normalization

Normalization can be accomplished and understood in stages, each of which corresponds to a normal form.

Normal Form: A state of a relation that results from applying simple rules regarding functional dependencies (or relationships between attributes) to that relation.

There are six normal forms:

First normal form
Second normal form
Third normal form
Boyce/Codd normal form
Fourth normal form
Fifth normal form

Functional Dependencies and Keys

Normalization is based on the analysis of functional dependencies.

Functional Dependency: A constraint between two attributes or two sets of attributes. For any relationship R, attribute B is functionally dependent on attribute A if, for every valid instance of A, that value of A uniquely determines the value of B (Dutka and Hanson, 1989).

The functional dependency of B on A is represented by an arrow (A --> B). A attribute may be functionally dependent on two (or more) attributes.

Examples:

SSN --> Name, Address, Birthdate
VIN --> Make, Model, Color
ISBN --> Title, First_Author_Name

Determinants

Determinant: The attribute on the left-side of the arrow in a functional dependency.

Candidate Keys

Candidate Key: An attribute, or combination of attributes, that uniquely identifies a row in a relation.

A candidate key must have the following properties:

Unique identification: For every row, the value of the key must uniquely identify that row. This property implies that each nonkey attribute is functionally dependent on that key.
Nonredundancy: No attribute in the key can be deleted without destroying the property of unique identification.

The relationship between determinants and candidate keys can be summarized thus:

A candidate key is always a determinant, while a determinant may or may not be a candidate key. A candidate key is a determinant that uniquely identifies the remaining (nonkey) attributes in a relation. A determinant may be a candidate key, part of a composite candidate key, or a nonkey attribute.

The Basic Normal Forms

First Normal Form

First Normal Form (1NF): A relation that contains no multivalued attributes.

Second Normal Form

Second Normal Form (2NF): A relation in first normal form in which every nonkey attribute is fully functionally dependent on the primary key.

A relation that is in first normal form will be in second normal form if and only if any one of the following conditions applies:

The primary key consists of only one attribute.
No nonkey attribute exists in the relation (thus all the attributes in the relation are components of the primary key).
Every nonkey attribute is functionally dependent on the full set of primary key attributes.

Partial Functional Dependency: A functional dependency in which one or more nonkey attributes are functionally dependent on part (but not all) of the primary key.

Third Normal Form

Third Normal Form (3NF): A relation that is in second normal form and has no transitive dependencies present.

Transitive Dependency: A functional dependency between two (or more) nonkey attributes.

Merging Relations

As a part of the logical design process, normalized relations may have been created from a number of separate E-R diagram and (possibly) other user views. Some of the relations may be redundant; that is, they may refer to the same entities. If so, we should merge those relations (also called view integration). An understanding of how to merge relations is important for three reasons:

On large projects, the work of several subteams come together during logical design, so there is need to merge relations
Integrating existing databases with new information requirements often leads to the need to integrate different views
New data requirements may arise during the life cycle, so there is a need to merge any new relations with what has already been developed

View Integration Problems

Synonyms: Two (or more) attributes having different names but the same meaning, as when they describe the same characteristics of an entity.

When merging relations containing synonyms, we should obtain agreement (if possible) from users on a single, standardized name for the attribute and eliminate any other synonyms.

Often, there is a need to allow some database users to refer to the same data by different names.

Alias: An alternative name used to refer to an attribute.

Homonyms: An attribute that may have more than one meaning.

Transitive Dependencies

When two 3NF relations are merged to form a single relation, transitive dependencies may result. We recall that a transitive dependency implies that a relationship in not in 3NF. We create a 3NF relation by removing the transitive dependency.

Supertype/Subtype Relationships

These relationships may be hidden in user views or relations. The task of the analyst is to correctly identify these relationships within different user views.

5 Logical Database Design and the Relational Model