CS352 Lecture: Integrity and Security Constraints revised 9/19/02
Materials:
1. Handout of SQL statements for creating example library database (from
previous lecture)
2. Handout of these statements modified to incorporate additional domain
integrity constraints
I. Introduction
- ------------
A. Persons responsible for databases needed to be concerned with preserving
the integrity and security of the data.
B. Data integrity is concerned with ensuring the ACCURACY of the data. In
particular, the concern is with protecting the data from ACCIDENTAL
inaccuracy, due to causes like:
1. Data entry errors
2. System crashes
3. Anomalies due to concurrent and/or distributed processing
C. Data security is concerned with ensuring only AUTHORIZED ACCESS to the
data. In particular, we don't want unauthorized persons to be able to
read sensitive data, and we don't want malicious persons to be able to
damage the data by unauthorized insertions, deletions, or updates.
II. Integrity
-- ---------
A. When designing a database, it is possible to specify various CONSTRAINTS
that data in the database must satisfy. As we shall see, SQL provides
a number of mechanisms that allow these constraints to be incorporated
in the system's metadata so that they can be enforced by the DBMS. We
will look at the following:
1. Domain integrity constraints
2. Entity integrity constraints
3. Referential integrity constraints
4. Use of assertions and triggers to specify more general constraints.
B. Domain Integrity Constraints constrain the values that can be stored in a
particular column of a table. These are called domain integrity
constraints because they constrain the domain of values from which a
given attribute can be drawn.
1. The SQL standard incorporates a number of facilities for doing so
that we will consider. In SQL, these are enforced whenever a new row
is inserted into a table, or an existing row is updated. Some of
these constraints are specified in the CREATE TABLE statement that
creates a given table; others are specified by explicitly creating
new domains.
2. As discussed in the book, SQL 92 has a CREATE DOMAIN statement
that allows the user to create a named domain which can then be
used to declare columns in tables. While DB2 does not support
this, it does have a similar facility called CREATE DISTINCT TYPE.
a. Example:
HANDOUT - library database creation modified to use named domains
b. An advantage of using this mechanism is that data types are
defined in terms of their SEMANTICS (meaning) - not just their
physical representation.
Example:
Both a telephone number (without area code) and a Gordon student
id are 7-digit numbers - however, it wouldn't make sense to
formulate a join like
student join borrower on student.id = borrower.phone!
i. A distinct domain (type) is not compatible with a different
domain - even if they have the same internal representation
(e.g. INTEGER or CHAR(x) )
ii. It is possible to explicitly cast a value from one type to
another. (The book discusses the SQL 92 sequence for this -
DB2's syntax is different but the idea is the same.)
iii. A downside of this is additional cumbersomeness when using
constants. (Bottom of handout.)
c. This type of constraint differs from all the others we will
discuss in that it can be checked as a matter of the SYNTAX of
a query - thus, if we are using static SQL, it can be checked at
compile time.
3. Another domain integrity constraint is the NOT NULL constraint, which
prohibits storing a null value into a given column.
A not null constraint is specified as part of the declaration
of a column.
Example: Handout - note columns that are and are not declared
not null and discuss reasoning for each
Note: Any attribute that is part of a primary key is declared
to be a candidate key MUST be declared not null - we discuss
these constraints later.
4. The standard SQL data types, because they are based on physical
representations for data, sometimes do not adequately restrict the
values that can appear in a given column.
a. Examples:
i. A letter grade in a course could be stored in a field
declared CHAR(2). In fact, though, only a very small number
of one or two character strings are valid grades.
ii. The first character in a call number has to be a letter.
b. To deal with situations like this, it is possible to specify a
check clause that tests a value about to be stored into a field.
A check clause can be specified as part of a table definition.
In SQL 92 it can also be specified in a domain declaration,
which is the preferred place since it then applies
i. Example: (Assume student_id_type and course_id_type already
defined)
create table enrolled_in (
student_id student_id_type,
course_id course_id_type,
grade char(2) check
(grade in ('A', 'A-', 'B+', 'B', 'B-', 'C+',
'C', 'C-', 'D+', 'D', 'D-', 'F') )
)
ii. Example: (Assume call_number_type is already defined)
create table book (
call_number call_number_type check
(left(cast(call_number as char(10)), 1)
between 'A' and 'Z'),
title char(30) not null,
author char(20)
)
Note need to cast call_number from a user-defined domain to
an ordinary string before applying left.
iii. The handout includes an example of incorporating the latter
into the book table, using db2 syntax for the type cast
c. When a table definition includes a check constraint, any data
being stored into the column(s) in question is checked to be
sure it satisfies the constraint whenever an insert or update
is done.
C. Recall that an entity is a member of an entity SET, and therefore must
be unique among all the elements of that set. This translates into
the notion of a "key" that we discussed earlier. ENTITY INTEGRITY
constraints are concerned with ensuring that each row in a table is
distinct from all other rows in the same table in the necessary
way(s).
1. We have already seen that the process of designing a relational
database should result in each table having a primary key. This
should be incorporated into the declaration for the table by means
of a PRIMARY KEY constraint. This has the following characteristics:
a. No two rows in the table will be allowed to have the same value(s)
in the specified column(s).
b. The PRIMARY KEY constraint requires the NOT NULL constraint for
all columns involved - no column that is part of the PRIMARY KEY
for a table can be null.
c. If the primary key is a single column, the constraint can be
specified as part of the declaration of the column.
Example: the various tables in the handout
d. If the primary key is composite (consists of more than one
column, then the constraint must be expressed as a table
constraint.
Example: Suppose (as is more often the case) that books have both
a call number and a copy number, and the two columns
together constitute the primary key. Then we could
define the book table as follows:
create table book (
call_number call_number_type not null,
copy_number smallint not null,
title char(30) not null,
author char(20),
primary key (call_number, copy_number)
)
(Note the presence of the comma after the declaration of
the author attribute, which signals that either a new
column or a table constaint is coming. In the absence of
the comma, the primary key constraint would be taken as
applying to the author column - but would be syntactically
invalid since columns are named explicitly.)
e. A given table can only have one primary key, of course.
2. A Related sort of constraint is the UNIQUE constraint.
a. Like the PRIMARY KEY constraint, the UNIQUE constraint specifies
that the same value or values cannot appear in two different rows
in the table in a particular column or set of columns.
b. Like the PRIMARY KEY constraint, the UNIQUE constraint can appear
either as a column constraint or a table constraint - and in the
latter case can specify any number of columns to be treated as
a unit.
Example: I chose to require that each borrower have a unique
name, and likewise each employee. That may not be a good
idea in general - but in a small library with only 4
borrowers it works! (Actually, I wanted to illustrate
the constraint.)
Note: It is a peculiarity of this example that the primary key
constraints all apply to only single columns, and so can be
expressed as column constraints, while the unique constraints
happen to have to be table constraints.
c. Unlike the PRIMARY KEY constraint, a table can have any number
of UNIQUE constraints defined for it.
d. The UNIQUE constraint is typically applied to any candidate key(s)
not chosen as the primary key.
D. Thus far, the constraints we have described all pertain to data within
a single table, and can be enforced by looking at that table alone.
The final sort of constraint we need to consider pertains to
REFERENTIAL INTEGRITY.
1. It is frequently the case that the logic of a system demands that
an entry cannot logically occur in one table without a related
entry occuring in another table:
Example: A checked_out row for a book should not occur unless
corresponding entries exist in the book table and the
borrower table.
2. Such situations most often arise because of the way we transform
E-R diagram relationships into tables - each row in the table
representing the relationship will contain the primary keys of
the entitites being related.
3. The requirement that a matching row occur in another table
for each value occurring in a certain column (or set of columns)
in a particular table is called REFERENTIAL INTEGRITY.
4. Referential integrity constraints are expressed in SQL by using a
FOREIGN KEY constraint, which is specified by the use of the
reserved words FOREIGN KEY and/or REFERENCES.
a. Again, can be either a column constraint or a table constraint.
i. If the former, a references clause is used as part of the
column definition, and applies to that column only.
ii. If the latter, FOREIGN KEY is separated off by commas from
other column definitions, and is followed by a parenthesized
list of the columns constrained.
b. A foreign key constraint always has a references clause.
i. May be just the name of another table - in which case the
value in the column(s) being constrained must occur in the
primary key column(s) of the referenced table.
ii. May explicitly list the column(s) in the referenced table
where the value is to be found - necessary if the foreign key is
not the primary key of the referenced table. (Columns can be
explicitly listed even if they are the primary key - no harm
done.)
iii. If the foreign key constraint is expressible as a column
constraint, the word references is all that is needed.
c. Examples:
i. Note references clauses in checked_out and reserve_book in
the handout.
ii. Suppose we stored both a call number and a copy number for
a book, and, as a result, it had a composite primary key -
declared as in an example above. Then our declaration for
checked_out would have to look like:
create table checked_out (
borrower_id borrower_id_type not null references borrower,
call_number call_number_type not null,
copy_number smallint not null,
date_due date,
foreign key (call_number, copy_number) references book
)
iii. Suppose we want to require that no one can be added to the
employee table unless already in the borrower table (perhaps
because employees automatically have borrower privileges).
Since last_name, first_name is NOT the primary key of
borrower, this would have to be written as follows:
create table employee (
ssn ssn_type not null primary key,
last_name name_type not null,
first_name name_type not null,
salary integer,
supervisor_ssn ssn_type,
foreign key (last_name, first_name)
references borrower(last_name, first_name)
)
(The fact that the columns happen to have the same names
in both tables is not essential. Moreover, the order of
the columns is important. If there references clause
were written (first_name, last_name), then we could only
add Emily Elephant as an employee if Elephant Emily were
a borrower!)
5. Support for referential integrity adds to new issues that need to
be addressed.
a. Note that changing that data in a REFERENCED table may cause an
error because of a constraint in a REFERRING table.
Example: checked_out has a foreign key constraint that references
the borrower_id column of borrowr. Thus, a change to
borrower could cause a violation of a constraint for an
existing row in checked_out
i. To cope with this possibility, the DBMS keeps a record of which
columns are referenced by constraints in other tables, and
checks updates of such a column (as well as deletions of an
entire row) in the referenced table to be sure doing so does
not cause a constraint violation in the referring table.
ii. Sometimes, it makes sense to allow a change in the referenced
table and handle the potential constraint violation by also
changing the referring table.
Example: Suppose we have a weak entity like fine. The
definition for a fine table might look like this:
create table fine (
borrower_id char(10) references borrower,
...
As it stands, we cannot delete a borrower who has
unpaid fines.
It might make sense to provide that if a borrower is
deleted from the borrower table, then any fine(s) that
the borrower owes are also automatically deleted.
In this case, the foreign key constraint in the fine
table could have a CASCADE clause:
create table fine (
borrower_id char(10) references borrower on delete cascade,
...
which specifies that if a row is delete from the
borrower table, then any rows referencing that row in
the fine table are also to be deleted.
iii. The book discusses the possibility of a similar option for
update, where the values in the referencing table are
automatically updated to reflect changes to the referenced
table. DB2 does not support this, however.
E. An additional features applies to most of the types of constraints we have
discussed thus far: A constraint may be given a name by preceeding the
constraint specification with CONSTRAINT constraint-name
example - the last example above
... constraint employee_borrower foreign key (last_name, first_name)
references borrower(last_name, first_name)
1. The constraint name is included by SQL in any error message
reporting that the constraint has been violated. This is
especially useful when using embedded SQL, since a program can
now determine which constraint was violated when an operation
fails.
2. If you don't specify a name for a constraint, SQL creates a
default name.
F. To specify more complex integrity rules, it is possible to store
general ASSERTIONS in the database - representing invariants that
are to be enforced whenever data in the database is modified.
1. The book gives an example of an assertion.
2. DB2 does not support assertions, so we won't discuss this further.
G. In the period between SQL92 and SQL99, many SQL implementations added
"triggers" - procedures to be executed whenever a specific event occurs.
Although these were included in the SQL99 standard, many DBMS's
use a syntax that is somewhat different from the standard because the
implementation added the facility before the standard was written.
1. A trigger is a statement that the DBMS is to execute whenever a
certain kind of modification to the database occurs.
2. The book noted some examples. Here's another one:
Faculty at Gordon are allowed to request that books be ordered for
the library. When the book comes in, the requesting faculty member
is notified. This is actually handled by using a multi-part book order
form, but could also be handled by a trigger on the book table, which
adds a row to a table listing people to be notified - e.g.
create trigger book_arrived after insert on book
-- appropriate action - syntax is implementation-specific
3. Key features of a trigger definition:
a. A trigger name - because a trigger is an object in the database
that can be subsequently altered or dropped.
b. One of the words "before" or "after" to specify whether the
triggered action is done before or after the operation in
question. The "before" option allows a trigger to prevent an
action from occurring if it fails.
c. A clause specifying the table whose modification will cause the
triggered action to occur - one of "insert on xxx", "update on xxx",
or "delete on xxx".
H. Some concepts we will consider later in the course are also closely
related to the idea of data integrity.
1. The concept of an ATOMIC TRANSACTION is really an integrity concept.
A properly-coded transaction preserves the integrity of the data by
ensuring that data that is consistent before the transaction starts
does not become inconsistent as a result of the transaction.
2. Crash control measures that we will explore later help to ensure
that data integrity is not damaged due to hardware or software failure.
3. Concurrency control measures that we will explore later help to
to protect the integrity of the data against anomalies arising from
simultaneous updates.
I. To summarize this section:
1. Data integrity is concerned with ensuring the ACCURACY of the data. In
particular, the concern is with protecting the data from ACCIDENTAL
inaccuracy, due to causes like:
a. Data entry errors
b. System crashes
c. Anomalies due to concurrent and/or distributed processing
2. SQL incorporates a number of mechanisms to permit the DBMS to help
preserve data integrity:
a. Facilities to help preserve domain integrity:
i. Not null constraints
ii. User-defined domains
iii. Check constraints
b. Facilities to help preserve entity integrity:
i. Primary key constraints
ii. Unique constraints
c. Facilities to help preserve referential integrity: foreign key
constraints
d. Facilities to enforce more complex requirements:
i. Assertions
ii. Triggers
J. We turn now to the topic of security - a distinct topic, but a clearly
related one since all the data integrity measures in the world cannot
protect an insecure system from malicious damage.
III. Security
--- --------
A. Persons responsible for databases needed to be concerned with preserving
both the integrity and security of the data. Data security is concerned
with ensuring only AUTHORIZED ACCESS to the data. In particular, we
don't want unauthorized persons to be able to read sensitive data, and
we don't want malicious persons to be able to damage the data by
unauthorized insertions, deletions, or updates.
B. System security is a HUGE topic - one that could easily be the subject
of multiple courses at the undergraduate or graduate level, as well
as being an ongoing focus of reserarch.
1. We assume, as a starting point that the DBMS is running in an
environment and under an operating system that provides a basic
foundation for security, including:
a. Appropriate physical security
b. Trustworthy system administrators and users (human security)
c. User authentication (e.g. login passwords)
d. Protection for files (operating system security)
e. Network security
2. The DBMS builds on this by allowing access to information in a single
file (or collection of files) - the database - based on user
identity - i.e. making it possible to limit a given user to
accessing a subset of the entire database. (A finer-grained level of
access control than the all-or-nothing sort of file access typically
provided by an operating system.)
3. We assume that the rest of the system has ensured that the database
can only be accessed through the DBMS, and that a person who accesses
the database is who he/she claims to be. We now consider SQL
mechanisms that allow the DBA to control what such a person may do
with the data.
C. There are three key concepts involved in understanding the security
mechanisms of SQL. (We will use the terminology used in IBM's DB2
documentation - other systems may use slightly different names for
the same concepts.)
1. "An authorization ID is a character string that is obtained by the
database manager when a connection is established between the
database manager and ... a process." (SQL Reference Manual page 72)
a. DB2 recognizes both individual authorization IDs and group
authorization IDs. (A given individual may belong to one or
more groups.) A privilege may be granted to an individual, or
to a group; an individual has a privilege if granted to him/her
or to a group to which he/she belongs.
a. On our systems, DB2 uses the login mechanism of Linux - thus
an authorization ID is typically a Linux user name or group name,
and the user authenticates himself/herself through knowing the
associated password.
i. Note that this is done on the SERVER - and so uses the user
names and passwords on the server.
ii. It is also possible to set up a database in which authentication
is done on the CLIENT (using its password database). This
means that the server trusts the client's claim that a certain
user is who the client says he/she is - which avoids the need
for the user to have an account on the server, but creates a
new possibility for penetration of the system.
` b. It is also possible, in principle, for a DBMS to have its own
authentication mechanism with authorization id's that are specific
to the DBMS. We will not pursue this further.
2. An object is a protectable entity, such as
a. The database instance
b. A specific database
c. A schema within a database
d. A table or view within a schema
e. A specific column within a table or view
These objects form a hierarchy - i.e. to access a table, a user
must first have rights to access the database in which it is
contained.
3. An authority (also called privilege) is the right to perform a certain
operation on an object. There are different kinds of privileges that
apply to objects at different levels.
D. SQL privileges for various kinds of objects.
1. The privilege names are standard SQL names - not specific to DB2
2. We will not attempt to exhaustively cover all privileges - just
selected important ones. In particular, we will only consider
privileges related to the logical and view level of database
access. There are a number of privileges that pertain to physical
level operations (allocation of space, creation of indices etc)
that we will not discuss now - but may refer to later.
3. Privileges that apply to the entire instance
a. SYSADM - the System Adminstrator Privilege. This is the highest
level of authority, and implicitly includes all other privileges.
i. One place where this authority is needed is to actually create
or drop individual databases within an instance.
ii. In DB2, this authority is granted to any user who is logged in
to the Linux account that actually owns the database instance
(e.g. db2cs352 in our case.)
b. SYSCTRL - the authority to manage system resources such as disk
space. This authority is like SYSADM in some ways, except that
it does not include any authority to actually see or alter data -
only space.
c. SYSMAINT - the authority to perform maintenance tasks such as
backup. It does not include any authority to actually see or
alter data.
4. Privileges that apply to a specific database
a. DBADM - adminstrative privileges within a particular database.
"A database administrator has all privileges against all objects
in the database and may grant these privileges to others"
(SQL Reference Manual page 986)
b. CONNECT - the privilege to connect to the database. (Obviously,
no other access to the database is possible without this.)
c. IMPLICIT_SCHEMA - the ability to create an implicit schema
within the database (with the same name as the authorization id
of the user creating it.) Only the database administrator can
create a schema with any other name.
d. CREATETAB - the privilege to create tables within the database.
5. Privileges that apply to a schema within a database
a. CREATEIN - the privilege to create objects within the schema.
(CREATETAB on the database is also needed to create tables.)
b. ALTERIN - the privilege to alter objects within the schema
c. DROPIN - the privilege to drop objects within the schema
6. Privileges that apply to a specific table or view
a. SELECT - the privilege to see the content of a table or view (by
using a select statement), and to create views based on a table.
b. INSERT - the privilege to insert rows in a table (by using an
insert statement on the table or an insertable view on the table)
c. UPDATE - the privilege to update rows in a table (by using an
update statement on the table or an updateable view on the table)
d. DELETE - the privilege to delete rows from a table (by using a
delete statement on the table or a deleteable view on the table)
e. ALTER - the privilege to add columns to a table or to add or drop
constraints on a table.
f. REFERENCE - the privilege to create a table that includes a
references constraint that refers to this table. (This is needed
because otherwise someone could prevent a deletion of a row
from the table by containing another table with a foreign key
constraint and store a value into a row that prevents deletion
of the matching row from this table.)
g. CONTROL - we will discuss this shortly
7. Privileges that apply to a specific column within a table or view.
a. The UPDATE privilege may be granted only on specific columns
within a table, rather than on the entire table.
b. The REFERENCES privilege may be granted only on specific columns
within a table, rather than on the entire table.
c. There is no form of the SELECT privilege that allows seeing only
specific columns within a table - this is not needed, because
a view can be used for this purpose.
d. There is obviously no column-specific form of INSERT or DELETE,
since these operations inevitably affect an entire row.
8. Note that there are no privileges that apply to a specific row within
a table - such privileges can be achieved through views.
E. Granting of authority
1. For each object, the DBMS maintains a record of what authorities
are granted to specific authorization id's. With the exception
of SYSADM (which is determined by one's login), these privileges are
stored in the system catalog.
2. For each object, the DBMS also maintains a record of who created
the object, who is therefore the owner of the object and receives
all appropriate privileges relative to that object.
Example: To create a table, once must have the CREATETAB privilege
within the database and the CREATEIN privilege within the
schema. Once one creates a table, he/she automatically
has SELECT, INSERT, UPDATE, DELETE, ALTER, CONTROL (and
several other privileges we haven't mentioned) on the table.
The same is true with views, except that CREATETAB is not
needed, and ALTER is not applicable.
3. A privilege can be granted by the SQL GRANT statement
a. This has the following general form:
GRANT privilege ON object TO recipient
Example: to give the user "aardvark" authority to look at the
data in table "foo" the following statement would be used:
grant select on foo to user aardvark
b. So who has the right to grant privileges on an object?
i. A holder of SYSADM or DBADM on the instance or database.
ii. The owner (creator) of the object.
iii. The holder of CONTROL privilege on that object.
iv. The grant statement for table-level privileges includes
a with grant option clause which allows the recipient of
a particular privilege on a particular object to grant the
same privilege to others.
c. Of course, there are often objects which every person (who has
the ability to connect to the database) may be entitled to access.
Explicitly granting privileges to each user is painful to say
the least, and breaks down if a user is added after privileges
are granted. It also makes the storage of privilege information
in the system tables unwieldy, to say the least.
For this reason, the grant statement allows privileges to be
granted to a group or to public, as well as to an individual
user.
Example: Let anyone who can connect to the database see the
content of table foo:
grant select on foo to public
This translates into a single entry in the system catalog that
grants select access to all users.
F. Revoking of Authority
1. The granter of a privilege may withdraw it by using the SQL REVOKE
statement. Note, though, that if the same person has been granted
a given privilege by two grantors, then if one revokes the privilege
it will still remain in force.
(This forms the basis of a security loophole in some SQL systems. If
user A grants access to some object to user B with grant option,
then user B can conspire with user C to retain that access even
if user A chooses to withdraw it, as follows: User B grants access
to user C with grant option, then user C grants access back to user
B. The resulting loop in the grants prevents some implementations
from revoking the grant if A now tries to revoke the access from B.)
2. DB2 handles this somewhat differently - to revoke a privilege,
one needs to have SYSADM or DBADM authority, or CONTROL authority
on the object in question - i.e. the original grantor of the
privilege is not an issue.
IV. Views
-- -----
A. We have introduced the notion of views as a very useful tool in the
design of relational databases.
B. One way that view can be used is to give very fine-grained access control
to a table.
1. Sometimes, a given individual must be given access to only a portion
of a table - e.g.
a. A given user may be allowed to only see certain columns. For
example, in a personnel database most users would be prohibited
from seeing the salary column.
b. A given user may be allowed to see only certain rows. Again, in
the case of a personnel database we may wish to let a department
manager look only at the records of employees in the department
he manages.
2. These needs can be addressed by defining an appropriate VIEW.
SQL allows the creator of a view to grant access to the view in the
same way that access can be granted to a table. In particular:
a. The creator of a view must have suitable access to the underlying
table (e.g. SELECT authority.)
b. Others may be given access to data through the view, even though
they don't have direct access to the underlying tables. (I.e.
the DBMS checks two authorizations when a view is accessed:
the authority of the accessor to access the view, and the
authority of the creator of the view to access the underlying
tables.)
C. Views can be used to give selective access for inserting, deleting,
or updating data as well as reading data.
1. That is, an insert, update, or delete operation can be performed
on a view, and will result in changes to the underlying table -
provided the person doing the operation is authorized to perform
the operation on the view, and the creator of the view is
authorized to perform the operation on the base table(s).
2. Modifying the database through views does, however, present some
interesting problems:
a. If a new row is inserted into a view, what happens to columns
that are not part of the view? Answer: they are given the
value NULL - which may not be desirable. (And the operation
will fail totally if any of these columns is part of a key
or is declared NOT NULL.)
b. There is the problem of DISAPPEARING ROWS. What if a user
inserts or updates a row through a view in such a way that
the row does not satisfy the conditions for being included
in the view? Answer: the row is still in the table, but the
one who put it there can't see it!
c. If the view involves a join or union, inserting into or deleting
from or even updating the view becomes problematical.
i. What if we insert into a view involving a join. Do we add
rows to both tables? What if one table has a row that could
participate but the other does not?
ii. What if we insert a row into a view involving a union. Into
which table does it go?
iii. What if we delete a row from a view involving a join. Do we
delete the corresponding rows from both tables? Or just one?
Which one?
iv. What if we update a column that is the basis of a join between
tables? Which table do we change?
v. Issues like these lead to most DBMS implementations either
severely restricting or forbidding insertions, deletion, and
even updating a view involving a join or union.
V. Additional Issues
- ---------- ------
A. Another important tool for maintaining security - particularly in
distributed systems - is ENCRYPTION.
1. Any time information is transmitted over a communication link from
one place to another, it is vulnerable to being intercepted and even
altered by unauthorized users.
2. Systems which can be accessed from remote sites are also potentially
vulnerable to penetration by unauthorized users, who may be able to
then obtain unauthorized access to read or motify the database by
masquerading as a legitimate user.
3. Encryption can be used to reduce the risk of both types of
unauthorized access.
a. Sensitive information being transmitted over a network should
certainly be encrypted in some suitable way.
` b. Stored data may also be encrypted to prevent unauthorized users
from understanding it if they do obtain illicit access to it.
c. Since encryption is discussed in other places in the curriculum
(e.g. operating systems, networks) we will not discuss it further
here.
B. One important class of security problems arises in conjunction with
STATISTICAL DATABASES.
1. For a variety of reasons, it may be desirable to allow wide access
to certain statistical functions (e.g. averages) of data for broad
classes of people, while forbidding acces to specific data for
individuals.
Example: The US Census bureau publishes data on median household
income for individual census tracts across the country.
However, data on the income for a specific individual
household is protected by law and cannot be released.
If the number of people living in a given census tract is
too small, however, the average for that tract is not
published. (E.g. if a person lived in a tract with only
two households, he could easily obtain his neighbor's
income given the published information plus knowledge of
his own.)
2. When such data is made available, care needs to be taken to make it
impossible for some one to use a series of legitimate queries to
obtain a piece of information that should remain confidential.
Example: Suppose it were possible to query a database to obtain the
average gpa for any desired list of 20 or more students
at Gordon. One might then obtain the average gpa for all
CS majors, or all students from Maine, or whatever. Given
that each such group must include at least 20 students, it
would seem hard to learn anything about the gpa of a specific
individual, which should certainly remain confidential.
However, a penetrator who wanted to obtain the gpa of
a given individual might do so in two queries - by asking
first for a group of 20 students that includes himself but
does NOT include the target invidual, and then for the same
group of 20 students except that it does not include himself
but DOES include the target individual. From the difference
of these two results, plus a knowledge of his own gpa, the
malicious individual can calculate the gpa of the target
person.
3. A strategy for dealing with problems like this is to impose rules
restricting the degree of overlap between different queries by the
same user. Even so, there are some subtleties involved, which will
be explored in a bonus problem on the homework.
Copyright ©2002 - Russell C. Bjork