Organizing your Data in

  D a t a b a s e s  

CPT 105 · Franklin College · Erich Prisner · 2002-2007

FAQs

Data and Information

Data is unprocessed items, whereas information is processed data, organized, meaningful and useful.

The importance of data for companies can hardly be overestimated. Data must

File Processing versus Databases

A database contains all data, which many programs or users share. This data is processed by some DBMS (Database Management System) into Information. The DBMS is also used to create and to maintain (adding, changing, deleting records) the database. Users may only work with the front-end, a program that interacts with the DBMS.

Before working with databases, the file processing approach was used. Each departmant had its own files. Data redundancy (wastes space and time and increases the probability for errors) and isolated data---the inability to match data maintained in different departments---were the disadvantages ofthis approach.

Some features of DBMSs

Relational databases

Nowadays the most commonly used database models are:

We will concentrate in the following on relational databases, that have been introduced by E.F. Codd in 1970 at IBM. A relational database is a collection of related tables. Each table has different records as rows and fields as columns. See here an example of a table, let's call it the book table:

Author's last name Author's first name Title Year Pages Publisher ISBN
Sayers Dorothy Gaudy Night 1935 446 Hodder and Stoughton

0450021548

Grimes Martha Biting the Moon 1999 436 Headline

0747268061

Bernhard William Deadly Justice 1993 306 Ballantine Books

0345380274

Bernhard William Cruel Justice 1996 466 Ballantine Books

0345408039

A primary key is any field that uniquely identifies the record. Though in this small table, the title field looks like a good candidate, the only primary key of the table is the ISBN number.

Now look at the author table:

Author's last name Author's first name Born at Author number
Sayers Dorothy 1893 1
Grimes Martha ? 2
Bernhard William ? 3
Grafton Sue 1940 4

Again we introduced a primary key---the author number.

How could we link these two tables? Adding an ISDN number field into the second table is not possible, but adding the author number field into the first is. If we do this, then the author number field in the book table is called a foreign key. Both tables are related, since they have a common field which is the primary key in one and a foreign key in the other.

Why do we use 2 related tables? Couldn't we squeeze the same information into one larger table? We could, but assume we have 100 authors, and assume each author has written 30 books. Assume both tables have 10 fields. Then the author table has 100*10 = 1000 data items and the book table has 3000*10 = 30,000 data items Together we have 31,000 items, compared to maybe 3000*19 = 57,000 items (remember, one field at least would be in common). Things would get much worse if we have more related tables.

Therefore one reason for using relational databases is to avoid data redundancy. The second reason is that if we have to change one data item value in one of the related tables (like second name), we would have to change many data items in the large supertable. Unless hierarchical or network databses, relational databases do not have to be application specific. They provide data independence---the ability to exist independently of specific applications.