Introduction
(read ch. 1)
A database management system (DBMS) is a collection of data and
an integrated set of programs that access that data.
Example: PSH keeps information about each student: name, social security
number, standing, address ...
- this data makes up the database
- programs (or parts of programs) are used to add new students, to change
a student's address or standing, to retrieve information about a student ...
Goals of a DBMS:
- manage large bodies of information
- provide convenient and efficient ways to store and access information
- secure information against system failure or tampering
- permit data to be shared among multiple users
Why are DBMS's needed? Alternative: store information in files, and write
programs as needed to modify these files and retrieve information.
A file processing system is a collection of files and programs that
access/modify these files
Problems with file processing systems:
- data redundancy and inconsistency
- difficulty of accessing data
- problems with concurrent access
Example: assume I'm paying for groceries with my MAC card at the same time
my pay check is being deposited (and my bank uses a file processing system)
withdrawal program | deposit program |
1. read balance from checking account file as $51 | |
| 2. read balance from checking account file as $51 |
3. subtract $50 (for groceries) | |
4. update checking account file (new balance: $1) | |
| 5. add $100 (my salary) |
| 6. update checking account file (new balance: $151) |
It is difficult to prevent such problems unless programs (example: withdrawal
and deposit) are coordinated or integrated.
- security problems
- not all users should have access to all data
- example: bank payroll personnel shouldn't know my checking account
balance
- difficult to enforce security in an ad hoc system
- integrity problems
- data may need to satisfy certain conditions, called consistency
constraints
- example: account balances should never fall below $0
- difficult to enforce/add/change consistency constraints in a file
processing system
DBMS's were developed to remedy these problems.
Return to Table of Contents
- abstraction
-
- (Random House College Dictionary) the act of taking away or
separating
- hiding details that aren't needed for a particular purpose
- data abstraction
- hiding details of how data is stored and maintained.
Levels of data abstraction (from low to high)
- physical level
- describes how data is actually stored on physical media
- example: files of records or structs
- needed by implementors and maintainers of DBMS's
- conceptual level
- describes what data is stored and how data is interrelated
- example: data is seen as tables or a graph or tree structure
- needed by database administrators (and "power users")
who must see all of the data
- view level
- describes some part or subset of the database
- simplifies the database for users who only need part
- useful for controlling access to data
- example: payroll personnel have a view that lets them see employee
data, but not customer data
- can have as many views as needed. example: one for payroll
personnel, one for tellers, one for officers ..
- views can overlap
- the conceptual level is like a view that encompasses the entire
database
Return to Table of Contents
A data model is a collection of conceptual tools for describing:
- data
- data relationships
- data semantics
- consistency constraints
Data models:
- provide a way of thinking about data that isn't linked to the
implementation of the database
- are used at the conceptual and view levels (roughly)
Kinds of data models:
- object-based logical models
- record-based logical models
- physical data models
object-based logical models
characteristics:
- structure of the database is flexible
- data constraints can be specified explicitly
- examples:
- the entity-relationship (ER) model
- object-oriented (OO) models
common characteristics of ER and OO models:
- data is viewed as sets of entities (objects) that represent things in
the real world
- entities (objects) in the system are distinct and uniquely identifiable
- example: customer entities can be uniquely identified by social security
number
- new kinds of objects (entities) can be constructed from old kinds
record-based logical models
characteristics:
- data is viewed as fixed-format records of various types
- example: one record type for customers, another for checking accounts, ...
examples of record-based logical models:
- relational model
- network model
- hierarchical model
network model
- data represented by collections of records
- relationships between data items are represented by links between
records
- example: each customer record is linked to the checking account record
of that customer
- records and links can form an arbitrary graph (can have unconnected
components and cycles)
hierarchical model
- again, data represented by collections of records and relationships by
links
- records and links form a tree (connected graph with no cycles)
The network and hierarchical models are low-level and mostly of historical
interest.
Return to Table of Contents
The data stored in the database at any given time is an instance
of the database.
The overall design of the database is the database scheme.
Example:
name | account number | balance |
Linda Null | 102 | 8725.38 |
Tim Wahls | 497 | 0.50 |
is an instance of a database with scheme (name, account number, balance)
Database systems have schemes (designs) at each level of abstraction:
- the physical scheme is the design at the physical level, i.e.
as a file of records of a particular type
- the conceptual scheme is the design at the conceptual level.
Example: (name, account number, balance) is a conceptual scheme.
- a subscheme is a design at the view level
- a subscheme is part of a scheme
- example: (name, account number) is a subscheme of (name, account number,
balance)
- the subscheme controls what information can be seen using the
corresponding view
- example: the previous subscheme could be used by someone preparing
ATM cards for the bank
As with levels of abstraction, there is:
- one physical scheme
- one conceptual scheme
- as many subschemes as needed
Return to Table of Contents
(different from section 1.5 of the text!)
- data independence
- the ability to modify a scheme definition at one level without
affecting the scheme at the next higher level
- physical data independence
- the ability to change the physical scheme without affecting the conceptual
scheme
- logical data independence
- the ability to change the conceptual scheme without affecting subschemes
- easy if information is added to the conceptual scheme
- often impossible if information used in some subscheme is deleted
from the conceptual scheme
Notes:
- physical data independence implies that programs that use the conceptual
level are unaffected by changes to the physical scheme
- logical data independence implies that programs that use some subscheme
are unaffected by changes to the conceptual scheme
- data indepence is like data abstraction in software engineering
Return to Table of Contents
The data definition language (DDL) is the language used to define and
modify the conceptual scheme of the database.
- often used to define/modify subschemes
- the definition of the conceptual scheme (written in the DDL) is compiled
into a file called the data dictionary
- the data dictionary is consulted for scheme information whenever data
is read or modified
- a separate language called the data storage and definition language
is used to access the physical scheme of the database
Return to Table of Contents
- data manipulation language (DML)
- the language used at the conceptual and view levels to retrieve, insert,
delete and modify information stored in the database
- query
- a request for information retrieval. Example (in English): List all
customers who have an account balance over $500
- query language
- the part of the DML used for retrieving information
The terms query language and data manipulation language are often used
interchangeably. Example: SQL is a DML, but is usually called a query
language.
DML's are classified by the kind of query language they include:
- procedural DML's require the user to specify what data is needed
and to provide an algorithm for getting the data. The algorithm is written
in the query language.
Examples: SQL, Quel
- nonprocedural DML's require only that the user specify what
data is needed
Examples: Datalog, QBE
Notes:
- like the distinction between procedural programming languages (C, C++, Ada)
and nonprocedural languages (Prolog)
- DDL's manipulate schemes, while DML's manipulate instances
Return to Table of Contents
The database manager is the program or program unit that provides the
interface between the physical level and the conceptual level.
Functions of the database manager:
- interaction with the file manager (file system)
- minimizing file reads and writes, as disk access is slower than main
memory acces
- translating DML commands to file operations
- integrity enforcement
- checking that consistency constraints are satisfied
- taking some action when they aren't
- security enforcement
- preventing unathorized access to data
- example: through a password and security classification system
- backup and recovery
- detect when information in the database or data dictionary is lost
or corrupted due to disk crash, power failure, software errors ...
- restore the database to a previous consistent state
- concurrency control - making sure that concurrent updates don't give
surprising or inconsistent results.
The database manager for a small system typically does not implement all of
these functions.
Return to Table of Contents
The database administrator is the person(s) in control of the DBMS.
One goal of DBMS's is to permit centralized control.
Typical responsibilities of the database administrator:
- scheme definition
- storage structure and access method definition
- scheme and physical organization modification
- granting authorization for data access
- integrity constraint specification
Return to Table of Contents
People who interact with the DBMS:
- application programmers
- application programmers write application programs in some
programming language (called the host language) which permits the
use of DML commands
- typical host languages: C, Cobol with special syntax to separate
DML commands from host language commands
- a DML precompiler handles embedded DML commands before the
program is compiled
- application programs are used to ease common or complicated interaction
with the DBMS
- examples: generating paychecks, doing account transfers, ...
- sophisticated users: interact with the DBMS using the DML directly.
The query processor translates DML commands into commands to the
database manager.
- specialized users: interact with the DBMS in specialized ways:
- CAD
- expert systems
- graphical or audio data
- temporal data
- ...
- naive users: interact with the database through application programs.
Return to Table of Contents
- file manager
- database manager
- query processor
- DML precompiler
- DDL compiler
Return to Table of Contents