IDS 410: 6 Physical Database Design and Performance

Introduction

Our previous discussions about conceptual data modeling and the logical database design phases of the database development process (including E-R/EER notations and the relational data model and normalization) do not explain how data will be processed or stored. The purpose of physical database design is to translate the logical description of data into the technical specifications for storing and retrieving data. The goal is to create a design for storing data that will provide adequate performance and insure database integrity, security, and recoverability

Physical Database Design Process

The information needed for physical file and database design include there requirements:

Normalized relations, including volume estimates
Definitions of each attribute
Descriptions of where and when data are used: entered, retrieved, deleted, and updated (including frequencies)
Expectations or requirements for response time and data security, backup, recovery, retention, and integrity
Description of the technologies (database management systems) used for implementing the database

Physical database design requires the following critical decisions:

Choosing the storage format (called data type) for each attribute from the logical data model. The format is chosen to minimize storage space and maximize data integrity.
Grouping attributes from the logical data model into physical records.
Arranging similarly structured records in secondary memory so that individual and groups of records (called file organizations) can be stored, retrieved, and updated rapidly. Considerations should also be given to protecting data and recovering data after errors are found.
Selecting structures (called indexes and database architectures) for storing and connecting files to make retrieving related data more efficient.
Preparing strategies for handling queries against the database that will optimize performance and take advantage of the file organization and indexes that we have specified.

Data Volume and Usage Analysis

An easy way to show the statistic about data volumes and usage (access frequency) is by adding notation to the EER diagram that represents the final set of normalize relations from logical database design. The volume and frequency statistics are generated during the systems analysis phase of the systems development process when systems analysts are studying current and proposed data processing and business activities.

Fortunately, precise numbers are not necessary. What is crucial is the relative size of the numbers, which will suggest where the greatest attention needs to be given in order to achieve the best possible performance.

Designing Fields

Field: The smallest unit of named application data recognized by system software, such as a programming language or database management system.

A field corresponds to a simple attribute from the logical data model, so a field represents each component of a composite attribute.

Choosing Data Types

Data Type: A detailed coding scheme recognized by system software, such as a DBMS, for representing organizational data.

Selecting a data type involves four objectives that will have different relative importance for different applications.

Minimize storage space
Represent all possible values
Improve data integrity
Support all data manipulations

Coding and Compression Techniques

A field with a limited number of possible values can be translated into a code that requires less space. By creating a code or translation table for the product Finish field (with possible values Birth, Maple, and Oak), each Finish field value can be replaced by a code, a cross-reference to the look-up table, similar to a foreign key. (Note that the code table would not appear in the conceptual or logical model; the code table is a physical construct to achieve data processing performance improvements, not a set of data with business value.)

A form of a code table is used by data compression techniques, such as file zip routines. Also related to data compression techniques are encryption techniques.

Controlling Data Integrity

For many DBMA, data integrity controls (i.e. controls on the possible value a field can assume) can be built into the physical structure of the fields and controls enforced by the DBMS on those fields. The data type enforces one type of data integrity control since it limits the type of data and length of a field value. Some other integrity controls that a DBMS might support are:

Default Value--the value a field will assume unless a user enters an explicit value for an instance of that field. (Assigning a default value can (1) reduce data entry time, and (2) reduce data entry errors.)
Range control--to limit the set of permissible values a field may assume. (Range controls should be used with caution. Also, it is better to implement any range controls through the DBMS rather than through a program--since range controls in programs may be inconsistently enforced and are difficult to maintain.)
Null value control--each primary key must have an integrity control that prohibits a null value. (Any other required field may also have a null value control placed on it if it is the policy of the organization.)
Referential integrity--a form of range control in which the value of that field must exist as the value in some field in another row of the same or different table. That is, the range of legitimate values comes from the dynamic contents of a field in a database table, not from some prespecified set of values. (Note that referential integrity guarantees that only some existing cross-referencing value is used, not that it is the correct one.)

Handling Missing Data

When a field may be null, simply entering no value may be sufficient. But if we needed to generate say a report of customers by zip codes, how would be handle customers with unknown zip codes? Two options are discussed in the data integrity controls mentioned above: use a default value or do not permit missing (null) values. Other possible methods for handling missing data are (Babad & Hoffer, 1984):

Substitute an estimate for the missing value; however, such estimates must be marked so that users know that they are not actual values
Track missing data so that special reports and other system elements cause people to quickly resolve unknown values. This can be done by setting up a trigger in the database definition.

Trigger: A routine that will automatically execute when some event occurs or time period elapses.

Perform sensitivity testing so that missing data are ignored unless knowing a value might significantly change results. This is the most complex of the methods mentioned and hence requires the most sophisticated programming. Such routines for handling missing data may be written in application programs. (Many modern DBMSs now support more sophisticated programming capabilities, such as case expressions, user-defined functions, and triggers, so that such logic can be available in the database for all users without application-specific programming.)

Designing Physical Records and Denormalization

In a logical data model, we group into a relation those attributes that are determined by the same primary key. In contrast, a

Physical record is a group of fields stored in adjacent memory locations and retrieved and written together as a unit by a DBMS.

The design of physical record involved choosing the sequencing of fields into adjacent storage locations to achieve two goals: efficient use of secondary storage and data processing speed.

The efficient use of secondary storage is influenced by both the size of the physical record and the structure of secondary storage. Operating systems read data from hard disks in units called pages (not physical records) ...

Page: The amount of data read or written by an operating system in one secondary storage (disk) input or output operation. For I/O with a magnetic tape, the equivalent term is record block.

If page length is not an integer multiple of the physical record length, wasted space may occur at the end of the page.

Blocking Factor: The number of physical records per page.

If storage space is scarce and physical records cannot span pages, creating multiple physical records from one logical relation will minimize wasted storage space. Some DBMSs will also block multiple physical records into a data block; in this case, the DBMS manages the data block, whereas the OS manages a page.

Denormalization

Our preceding discussion focuses on efficient use of storage space. In most case, the second goal of physical record design--efficient data processing--dominates the design process. Efficient processing of data depends on how close together related data are.

Often all the attributes that appear within a relation are not used together, and data from different relations are needed together to answer a query or produce a report. Thus although normalized relations solve data maintenance anomalies and minimize redundancies, normalized relations, if implemented one for one as physical records, may not yield efficient data processing.

Denormalization: The process of transforming normalized relations into unnormalized physical record specifications.

The following are some common denormalization opportunities:

Two entities with a one-to-one relationship: If the access frequency between the two entity types is high, it may result in better processing efficiency to combine the two relations into one record definition.
A many-to-many relationship (associative entity) with nonkey attributes: Rather than joining three files to extract data from the two basic entities in the relationship, processing efficiency might be achieved by combine attributes from one of the entities into the record representing the many-to-many relationship, thus avoiding one join operation in many data access modules. (This would be most advantageous if joining occurs often.)
Reference data: Reference data exists in an entity on the one-side of a one-to-many relationship, and this entity participates in no other database relationships. We should consider merging the two entities in this situation into one record definition when there are few instances of the entity on the many-side for each entity instance on the one-side.

The opportunities listed above deal with combining tables to avoid doing joins. However, denormalization can also be used to create more tables by partitioning a relation into multiple tables.

Horizontal Partitioning

Horizontal Partitioning: Distributing the rows of a table into several separate files. (Breaking a relation into multiple record specifications by placing different rows in different records based upon common column values.) Example: breaking a customer relation into four regional customer files based on the value of a field region.

We should note that horizontal partitioning is very similar to creating supertype/subtype relationships

The Oracle DBMS supports several forms of horizontal partitioning ... :

Key range partitioning
Hash partitioning
Composite partitioning

Vertical Partitioning

Vertical Partitioning: Distributing the columns of a table into several separate physical records, (repeating the primary key in each of the records). Example: breaking a part relation by placing a part number along with accounting-related part data in one record specification, the part number along with engineering-related part data in another record specification, and the part number along with sales-related part data in yet another record specification.

Advantages of Partitioning

Efficiency: Data used together are stored close to one another and separate from data not used together
Local optimization: Each partition of data can be stored to optimize performance for its own use
Security: Data not relevant to one group of users can be segregated from data they are allowed to use
Recovery and uptime: Smaller files will take less time to retrieve, and other files are still accessible if one file is damaged, so the effect of damage are isolated
Load balancing: Files can be allocated to different storage areas which minimize contention for access to the same storage area or even allows for parallel access to different areas.

Disadvantages of Partitioning

Inconsistent access speed: Different partitions may yield different access speeds, this confusing users. Also, when data must be combined across partitions, users may have to deal with significantly slower response times than in a nonpartitioned approach
Complexity: Partitioning is usually not transparent to programmers, who will have to write more complex programs when combining data across partitions.
Extra space and update time: Data may be duplicated across the partitions, taking extra storage space compared to storing all the data in normalized fields. Updates which affect data in multiple partitions can take more time than if one file were used.

(Denormalization by) Data Replication

In data replication, the same data are purposely stored in multiple places in the database in order to achieve increased processing (data retrieval speeds). (Improved speed is worthwhile only if the replicated data is frequently accessed in each of the relations it is replicated in, and if the costs for extra secondary storage and data maintenance are not prohibitive.)

Designing Physical Files

Physical File: A named portion of secondart memory (such as a magnetic tape of hard disk) allocated for the purpose of storing physical records.

An important structure for physical storage space in Oracle is a tablespace.

A tablespace is a named set of disk storage elements in which physical files from one or more database tables may be stored.

Oracle has responsibility for managing the storage of data inside a tablespace, whereas the operating system has many responsibilities for managing a tablespace as a whole as it would any operating system file (e.g. handling file level security, allocating space, and responding to disk read and write errors).

A tablespace may be spread over several extents (where an extent is a contiguous section of disk storage space). When a tablespace needs to enlarge to hold more data, it is assigned another extent ... Managing table spaces, or physical database files, is a significant job of the database administrator in an Oracle environment.

Pointer

All files are organized by using two basic constructs to link one piece of data with another piece of data: sequential storage and pointers.

Pointer: A field of data that can be used to locate a related field or record of data. Generally, a pointer contains the address, or location, of the associated data.

(We define pointers here because a knowledge of what pointers are is necessary for understanding file organization, discussed below.)

File Organization

File Organization: A technique for physically arranging the records of a file on secondary storage devices.

With modern DBMSs, we do not have to design file organizations, though we may be allowed to select an organization and its parameters for a table or physical file. In choosing a file organization for a particular file in a database, we should consider seven important factors:

Fast data retrieval
High throughput for process data input and maintenance transactions
Efficient use of storage space
Protection from failures or data loss
Minimizing need for reorganization
Accomodating growth
Security from unauthorized use

Often these objectives conflict, and we must select a file organization that provides a reasonable balance among the criteria within resources available. We now consider three families of basic file organization

Sequential File Organization

Sequential File Organization: The storage of records in a file in sequence according to a primary key value.

Because of their inflexibility (a program must normally scan a file from the beginning until the desired record is located), sequential files are not used in a database, but may be used for files that backup data from a database.

Indexed File Organizations

Indexed File Organization: The storage of records either sequentially or nonsequentially with an index that allows software to locate individual records.

An index is a table or other data structure used to determine the location of rows in a file that satisfy some condition.

Each index entry matches a key value with one or more records, and can point to unique records or to potentially more than one record. An index that allows each entry to point to more than one record is called a secondary key.

Secondary key: One field or combination of fields for which more than one record may have the same combination of values. Also called nonunique key.

When the term primary and secondary index are used, there are four types of indexes:

Unique primary index (UPI)
Nonunique primary index (NUPI)
Unique secondary index (USI)
Nonunique secondary index (NUSI)

Indexes can be built on top of indexes, since indexes (being files themselves) can become so large as to benefit from indexing. Each index entry then, has a key value and a pointer to another index or to a data record.

One of the most powerful capabilities of indexed fule organization is the ability to create multiple indexes (such as in the library, where there are indexes on title, author, subject, etc. pointing to the same set of books). Also, multiple indexes can be manipulated (e.g. "find products with a Birth finish AND that cost under $500") to increase gains in processing efficiency. Logical AND, OR, and NOT operations can be handled by simply manipulating results from an index scan, thus avoiding the more costly access to data records that do not meet all the qualifications of the query.

The general structure of a hierarchical index is called a tree (albeit an upside-down one, with the root node at the top and leaf nodes at the bottom). The performance of a tree of index entries is influenced by the properties of the tree structure used. A common type of tree used by DBMSs to store indexes is a balanced-tree, of B-tree.

Another type of index is called a bitmap index.

Bitmap index: A table of bits in which the row represents the distinct values of a key and each colum is a bit, when when on indicates that the record for that bit column position has the associated field value.

A bitmap is ideal for attributes that have only a few possible values. In proper situations, because bit manipulation and searching is so fast, the speed of query processing with a bitmap can be 10 times faster than a conventional tree index.

Another important type of index, especially in data warehousing and other decision support applications, is a join index.

A join index is an index on columns from two or more tables that come from the same domain of values.

Simply stated, a join says find rows in the same or different tables that have values that match some criterion.

Hashed File Organizations

Hashed file organization: A storage system in which the address of each record is determined using a hashing algorithm.

Hashing algorithm: A routine that converts a primary key value into a relative record number (or relative file address).

A typical hashing algorithm uses the technique of dividing each primary key value by a suitable prime number and then using the remainder of the division as the relative storage location. (Another technique must then be used to resolve the duplicates, or overflow, that can occur when two or more keys hash to the same address (known as "hash clash").

Hash index table: A file organization that uses hashing to map a key into a location in an index, where there is a pointer to the actual data record matching the hash key.

Summary on File Organizations

The DBMS will handle the management of the file organization. What is important for the DB designer is to understand the properties of the different file organizations so that we can choose the most appropriate one for the type of database processing required in the database and application we are designing.

Clustering Files

Some DBMSs allow adjecent secondary memory space to contain rows from several tables. In this case, a physical file does not contain records with identical structures. For example, in Oracle, rows from one, two, or more related tables that are often joined together can be stored in the same disk area. A cluster is defined by the table and the column or columns by which the tables are usually joined.

Clustering reduces the time to access related records compared to the normal allocation of different files to different areas of a disk. Time is reduced since related records are closer to each other than if the records were stored in separate files in separate areas of the disk.

Designing Controls for Files

One additional aspect of a database file about which we may have design options are the types of controls we can use to protect the file from destruction or contamination or to reconstruct the file if it is damaged. The proprietary storage format provided by the DBMS offers a basic level of control, but we may require additional security controls on fields, files (tables), or databases. Backup procedures provide a copy of a file and of the transactions that have changed the file. When a file is damaged, the file copy or current file along with the log of transactions are used to recover the file to an uncontaminated state. Further, in terms of security, the most effective method is to encrypt the contents of the file so that only programs with access to the decryption routine will be able to see the file contents.

Using and Selecting Indexes

Indexes on a file can be created for either a primary or secondary key or both. The index is itself a table with two columns: the key and the address of the record or records that contain that key value. For a primary key, there will be only one entry in the index for each key value.

Creating a Unique Key Index

CREATE UNIQUE CUSTINDEX ON CUSTOMER(CUSTOMER_ID)

Creating a Secondary (Nonunique) Key Index

CREATE INDEX DESCINDX ON PRODUCT(DECRIPTION)

When to Use Indexes

During physical database, we must choose which attributes to use to create indexes. There is a trade-off between improved performance for retrievals through the use on indexes, and the degraded performance for inserting, deleting, and updating the records in a file. Thus, indexes should be used generously for databases intended primarily to support data retrievals, such as for decision support and data warehouse applications; and should be used judiciously for databases that support transaction processing and other applications with heavy updating requirements, since the indexes impose additional overhead.

The following are some rules of thumb for choosing indexes for relational databases.

Indexes are most useful for larger tables.
Specify a unique index for the primary key of each table.
Indexes are most useful for columns that frequently appear in WHERE clauses of SQL commands, either to qualify the rows to select or for linking (joining) tables.
Use an index for attributes referenced by ORDER BY (sorting) and GROUP BY (categorizing) clauses. (Be sure that the DBMS will, in fact, use indexes on attributes listed in these clauses.
Use an index when there is significant variety in the values of an attribute. (Oracle suggests that indexing is not useful when there are fewer than 30 different values for an attribute, and an index is clearly useful when there are 100 or more different values for an attribute. Similarly, an index will be helpful only if the results of a query which uses that index do not exceed roughly 20 percent of the total number of records in the file (Schumacher, 1997).)
Check the DBMS for the limit, if any, on the number of indexes allowable per table.
Be careful indexing attributes that have null values. For many DBMs, rows with a null value will not be referenced in the index (so they cannot be found from an index search of ATTRIBUTE = NULL). Such a search will have to be done by scanning the file.

RAID: Improving File Access Performance by Parallel Processing

The increasing feasibility of using multiple computer components (as a result of the constant reduction in cost and size of computer technologies) allow parallel processing of data--the result being that four IO operations, when done in parallel, take as long as one such operation.

Parallel database processing and fault tolerance can be accomplished by a database designer using hardware or software technology called Redundant Array of Inexpensive Disks (RAID).

Redundant Array of Inexpensive Disks (RAID): A set, or array, of physical disk drives that appear to the database user (and programs) as if they form one large logical storage unit. (Thus, RAID does not change the logical or physical structure of application programs or database queries.

In order to maximize I/O performance of RAID, all the disk drives need to be kept busy. This is accomplished by striping--the balancing of the workload across the disk drives.

Stripe: The set of pages on all disks in a RAID that are the same relative distrance from the beginning of the disk drive

The details of parrallel processing are usually implemented on the OS level.

RAID has one important risk: the increased likelihood of a disk drive failure across the whole database. To cope with this risk and to make disk storage fault tolerant, many types of RAID technologies redundantly store data so that at least one copy of the data is assessible when demanded, or store extra error correction codes so that damages or lost data can be rebuilt.

Choosing Among RAID Levels

RAID-0

provides the fastest read/write time and most efficient data storage of any disk array type
every disk in the array is used in parallel to read and write application data.
typically used when the database application accesses very large records or a group of related records, or is extrememly I/O intensive.
does not utilize any form of data redundancy or error correction codes, hence a failure in any of the disk drives can corrupt the database
only one I/O commands can be processes at a time (i.e. no multithreading)
bottleneck: disk controller, which can be sarturated by many parallel I/O operations; ideally, multiple controllers can be used to control disk

RAID-1

fully redundant storage (disk mirror) in which data is redundantly written (to improve fault tolerance)
storage space is effectively cut in half; database has a high degree of fault tolerance since it is unlikely that exactly the same page will be damaged in both copies
each page write operation is done in parallel to a pair of disk drives, so no performance improvement is achieved
two read operations, even if generated from separate database queries in different user programs (multithreading), can be done in parallel for each dual array.

RAID 0+1

RAID-0 emphasizes high performance via parallel reads and writes, RAID-1 emphasizes fault tolerance via redundant storage, RAID 0+1 is achieved by building a RAID-0 configuration and then mirroring it.
provides very large volume data storage, high reliability, and protection from drive failures
very expensive; used in environments that require extremely high uptime percentages with rapid response (e.g. for military command and control, air traffic control, and some e-commerce applications, like on-line investing).

RAID-2

some disks used with data striping for parallel data processing (without redundant storage) and other disks are dedicated to storing error correction (or parity) codes.
the error correction codes are used to detect errors in stripes and to reconstruct data when pages within the data stripes are damaged.

RAID-3

also stripes data accross disks, but only one drive stores stripe-level error correction codes
error correction codes embedded within each page are used to detect page errors, and the ECC page is used to recover a damaged page
potential bottle neck: ECC drive

RAID-4

stipe-level error correction codes are stored on only one drive as in RAID-3, but multiple records are stored in a stripe
thus, read operations to different records from the same or different programs can be done in parallel across the drives
bottle neck: parity drive, since it must be accessed to have an error correction page updated when any data record is rewritten

RAID-5

also called Rotating Parity Array, does not use a dedicated parity drive
each drive contains both data and parity pages
read operations can be done in parallel on all drives
a write operation will access its one or more drives to write the new record plus the parity drive for that record
different records store their error correction codes, for recovery purposes, on different drives

Designing Databases

Choosing Database Architecture

A particular database management or data warehousing system supports one of five different architectures:

Hierarchical database model
Network database model
Relational database model
Object-oriented database model
Multidimentional database model

Optimizing for Query Performance

Topics

Parallel Quary Processing
Overiding Automatic Query Optimization
Picking Data Block Size
Balancing I/O Accress Disk Controllers
Guidelines for Better Query Design