SQLUnit14EntitiesAndRelationshipsOld

Data Management: Databases and
Organizations
Richard Watson
Summary of Chapters 3-6 prepared
by Kirk Scott
1
Data Modeling and SQL
•
•
•
•
Chapter 3. The Single Entity
Chapter 4. The One-to-Many Relationship
Chapter 5. The Many-to-Many Relationship
Chapter 6. One-to-One and Recursive
Relationships
2
Introduction
• Large parts of these overheads will be somewhat
repetitive
• They cover in general terms some of the things that
were specifically illustrated by concrete SQL examples
• However, the repetition shouldn’t be harmful
• It should put the examples into a broader context, and
add new examples to flesh the ideas out
• The ultimate goal is for the basic concepts and
diagramming to be clear so that there will be no
trouble considering design questions in unit 14
3
Chapter 3. The Single Entity
• The author starts with the entity relationship
diagramming conventions and the concept of a
single entity
• The author represents an entity with a box
containing its name in capital letters inside, at the
top
• Full field names are given after that in small
letters
• The primary key field is marked with an asterisk
4
5
6
7
8
9
• Different diagramming conventions are
perfectly acceptable, as long as you are
consistent
• The name of the entity may be given above
the box representing it
• You may choose to capitalize just the first
letter of the name
10
• In theory, you could qualify field names,
although this would be redundant, given the
entity name at the top
• You could also use short names for fields if
space is at a premium
• Primary keys could be marked with pk or
underlined
11
Chapter 4. The One-to-Many
Relationship
• The author uses the crow’s foot to mark a
one-to-many relationship in an ER diagram
• In a simple ER diagram fields may not be
listed, just entity names and crow’s feet
• In a more complete diagram, fields can be
listed
12
• The author does not include the embedded
pk/fk in the list of fields in the fk/many table
because it is redundant
• I do not follow this convention
• I believe that in the interests of clarity it is
worthwhile to include the fk in the list of fields
13
The Book Doesn’t Show the Foreign
Key Field; It’s Implicit
14
15
An ER Diagram Plus a New Form of
Schema Diagram (with Explicit FK)
16
17
Chapter 5. The Many-to-Many
Relationship
• As is known, the many-to-many relationship is
the most “complicated” of the relationships
• The book presents some interesting examples
that arise in real situations
• They illustrate ideas that are not immediately
apparent from the examples that have gone
before
• The first example is based on a bill of sale,
shown on the next overhead
18
The Bill of Sale Example: An
Interesting Case of a pk/fk Relationship
19
• The book analyzes this situation as consisting
of base entities which are a sale and the items
which are sold
• There is a many-to-many relationship between
these base entities because each sale can
consist of many items
• Also, each item can be present in many sales
• The book’s ER for this analysis is shown on the
next overhead
20
+ Means saleno is part of the primary
key of LINEITEM
21

•  LINEITEM contains every line of every (bill
of) sale
22
itemno identifies a kind of item, not an
individual item
23

•  A given kind of item may appear on many
different sales
24
itemno is not part of the primary key
of LINEITEM
25

•  A given item may appear on many
different lines of a given sale
• Practically speaking, more of a given item can
be added to a sale by adding a new line to
LINEITEM rather than modifying an existing
LINEITEM
26

•  LINEITEM is not a pk/fk-pk/fk table in the
middle
• This is a practical solution to a real-world data
modeling problem
• It is not a theoretically minimalist
representation of relationships
27
• When first introducing many-to-many
relationships, I referred to the table in the
middle
• More formally, the book refers to an
associative entity
• The associative entity is the table in the
middle that captures the relationship between
two base entities
28
• In the ER notation for this example the + sign
is used
• This has not been seen before
• For the purposes of understanding the book’s
example, it is important to know what this
means
29
• The + sign is shown over a crow’s foot
• It symbolizes the fact that the embedded fk is
part of the pk of the table it’s embedded in
• You have seen an example of a table in the
middle where the pk is the concatenation of
the two embedded fk’s
• This example is not the same as that
30
• In this example the saleno is the pk of the Sale
table
• It is embedded as a fk in the Lineitem table
• A saleno value will appear in the Lineitem table
as many times as there are separate lines
belonging to the sale
• These separate lines are identified by lineno’s
• The lineno’s are not embedded fk’s based on the
unique identifiers, itemno’s, of entries in the Item
table
31
• An alternative way of representing the
relationship would be to list the fields of the table
in the middle this way:
• saleno pk, fk
• lineno pk
• itemno fk
• lineqty
• lineprice
• Note again that the saleno is both a pk and a fk,
while the lineno is purely pk
32
• At first glance it may seem a little strange, but
the table in the middle contains every line of
every sale, listed separately
• It is the saleno and the lineno together which
uniquely identify the entries in the Lineitem
table
• This model actually reflects reality well
• It differs, in particular, from the car sale
example
33
• In the car sale example, there were individual
cars that were sold
• In the example database they were only
shown as being sold once
• In reality, the same car might be sold more
than once
• This could be modeled by making the
salesdate part of the pk of the Carsale table
34
• In the Sale, Lineitem, Item example, the items
are not actually individual items
• An item is a kind of item, like a screw or a
shovel or a microwave oven
• The seller may have many of each kind of item
in stock and doesn’t distinguish between
individual items
35
• Multiple instances of the same (kind of) item may
be sold to the same customer
• Also, the same (kind of) item can be sold to more
than one customer
• It’s not incredibly difficult, but it’s worth
emphasizing that the itemno does appear in the
table in the middle as a fk
• This tells which item that line of a sale was in
reference to
• However, the itemno is not part of the pk of the
table in the middle
36
• In a perfect world, you might argue that each
item should appear on only one line of a sale
• If so, then you could dispense with individual
line numbers and use the itemno as part of
the pk instead
• However, reality makes the given solution
better
37
You Want to Support Customer
Decisions in Mid-Stream
• When creating a data model, it should be
flexible and accommodate all possibilities
• Could a customer, in the middle of making a
purchase, decide that more instances of a
certain item were desired?
• If so, do you allow this, and how do you
support it?
38
• From a business point of view, few things are
more destructive than a computer system
whose model imposes artificial constraints on
the user (seller and customer)
• Of course, if a customer decides that more
instances are desired you want to sell them
39
• Have you ever heard things like these:
• “I’d like to let you buy more, but the computer
won’t allow it.”
• “I’d like to let you buy more, but it will be
necessary to start a completely new bill of
sale.”
• “I’d like to let you buy more, but it will be
necessary to go back and modify the earlier
line of the sale for that item.”
40
• In any of the previous scenarios, both the
customer and the salesperson want to scream
• The best scenario would go like this:
• “Oh, you want 20 instead of 10? We’ll just
add another line here at the bottom for
another 10.”
• Now everybody sighs with satisfaction…
41
The Set and Logical Operators in SQL
Form an Algebra
• SQL has operators like AND, OR, NOT
• Similarly, there are set operators like UNION
• Although Microsoft Access SQL doesn’t
support INTERSECT, some implementations do
• Taken together, these elements form the basis
for an algebra
42
The Cartesian Product is an Algebraic
Product, Which has an Inverse
• The Cartesian product represents a form of
multiplication for relations
• The results of a join operation are a subset of
the results of a product
• In an algebraic system, the existence of a
multiplication operation implies the existence
of a division operation
43
For All/Double Not Exists Accomplish
Relational Division (Invert the Product)
• As pointed out when doing the concrete SQL
examples, there is no FOR ALL operator
• However, double NOT EXISTS can accomplish
the same thing
• FOR ALL/double NOT EXISTS is roughly
analogous to division in a relational system
• Before we’re finished with SQL we will see
queries which are actually stated in terms of
division
44
• This is the point where Watson takes up the
case of double not exists
• The book shows a ER diagram of 3 tables
capturing a many-to-many relationship
• This diagram is labeled generically, but it is of
the same structure as the Lineitem example
45
• It then outlines the double NOT EXISTS query
that could be written for it
• The fact that this models the Lineitem
example is not important
• The table in the middle could have a
completely concatenated primary key
• It could also have its own, separate primary
key
46
In a Three-Way Relationship, the Tables
are: Target, Target-Source, and Source
• The important point is that the base tables are
at the ends of the ER diagram
• The book refers to these as target and source,
respectively
• The table in the middle, the associative entity,
is labeled Target-Source by the book
47
A Diagram and Query for a LINEITEMLike Design
48
The Order of the Tables in the Query
• If you want to find those rows of the target
which are in relation to all of the rows of the
source,
• Then in the double NOT EXISTS query:
– The target appears first, in the outermost query
– The source appears second, in the middle, in the
first nested subquery
– And the table in the middle appears last, in the
second nested subquery
49
Translation of the Query: Find the
Sales that Included Every Item
• If the table in the middle were a Cartesian
product, it would match every sale with every
item
• The table in the middle isn’t necessarily the
Cartesian product
• The query will find only those sales which
were matched with every item
50
Remember, this Example has been a
Review of a (Simple) Many-to-Many
Relationship
• The next example will illustrate the inclusion
of more relationships in a design
51
A Design with a Cycle
• The next diagram illustrates a design
containing a cycle
• Such designs will become especially important
when considering normalization, the theory of
correctness in designs
• For the time being simply note that there is
nothing preventing designs with cycles
52
Two Many-to-Many Relationships with
Others…
53
A Concatenated Key with Date
• The next example design is one where both of
the embedded foreign keys are part of the
primary key of a table in the middle
• However, it is more complicated than that
because a date field is also included in the
primary key
• This allows the same pair of base values to be
paired with each other more than once
54
Each Customer and Magazine Can Be
Paired with Each Other More Than Once
55
A Simple Concatenated Key
• The next design is actually somewhat simpler
• It also has two embedded pk/fk’s in the table
in the middle
• The table in the middle isn’t pure key though
• There is also a non-key attribute field for the
table in the middle
56
This is Actually Simpler: One Gift per
Donor per Year
57
The Music CD Library: A Larger
Database Design Example
• Some of the design examples given so far from
chapters 3 and 4 could be parts of a simple
database for a collection of music CD’s
• At the end of chapter 5, with the capability to
model many-to-many relationships, the
authors expand this example
58
• On the next overhead an 8 entity design is
shown
• Note that 4 of the 8 entities can be classified
as associative entities
• These are the entities: CD, Composition,
Label, Person, Person-CD, PersonComposition, Person-Track, Track
59

•  You can say you understand the model if:
• You can define what each entity means
• You can define what each relationship means
60
61
Extending the Model to Match Reality
Better (This Could Make You Dizzy)
• The next overhead shows the music CD design
blossoming further
• The Person-Track table has been removed
• Recording and Person-Recording tables have been
added
• In the book, the new relationships are analyzed
• I will not list the analysis here
• The new design reflects additional assumptions and
capabilities
• The new design should be a better model of reality,
with fewer exceptions and more flexibility
62
63
Chapter 6. One-to-One and Recursive
Relationships
• What one-to-one relationships are should be
clear
• The book uses the term recursive relationship
for those cases where a table is in a
relationship with itself
64
One-to-One Relationships
• You may recall some of the different options
for capturing one-to-one relationships
• If this is truly one-to-one in all cases at all
times, then this can be a single relation
• Otherwise, you end up embedding the pk of
one entity as a fk in another
65
Model 1-1 as pk Embedded as fk and
Monitor Data Integrity
• Maintaining this as a one-to-one relationship
then becomes a question of data integrity
• When choosing which pk to embed as a fk,
you should take into consideration any
possible exceptions or changes in the
relationship in the future
• The book has a number of examples which
illustrate details of this concept
66
A Design Starting Point: A Diagram
Which is Not ER
• The book’s examples start with a company
with a two level management hierarchy
• There are bosses of departments and there is
an overall managing director
• The (non-ER) diagram on the following
overhead illustrates this
67
Departments Have Managers;
Managers Have a Boss
68
• Next the book shows an ER diagram
illustrating that departments have employees
and that departments have bosses
• A garden variety crow’s foot doesn’t have to
be labeled
• A one-to-one relationships should be labeled
69
One-to-One Arcs Have to be Labeled;
Which Way Does the Embedding Go?
70
• The foregoing diagram doesn’t explicitly show
whether the pk of Dept is embedded as a fk in
Emp or vice-versa
• In this case it is likely that the pk of Emp is
embedded as a fk in Dept
• This is because, all else being equal, a
department will have a boss
• However, few employees will be bosses
• There would be lots of nulls if there were a
“department which you’re the boss of” field in
Emp
71
A One-to-Many Recursive Relationship—A
Table in a Relationship with Itself
• Next, the book considers recording which
employee is which other employee’s boss
• This leads to what the book calls a recursive
relationship
• This is when there is a one-to-many relationship
between a table and itself
• Such a one-to-many relationship should be
labeled because the meaning of the embedding
would not necessarily be clear
• An ER diagram illustrating this follows
72
Many Employees Have One Other
Employee Who is Their Boss (The
Relationship is Labeled)
73
Question 1: Is an Employee’s Boss the
Boss of the Employee’s Department?
• The previous design may not be ideal
• If every employee is assigned to a department, it
would seem that the employee’s boss would be
the boss of that department
• At first glance, at the very least, this appears to
be redundant
• Redundancy means that information is repeated,
and it opens up the possibility of inconsistencies
between the repeated representations of the
same data
74
Question 2: Are Bosses Members of
the Departments They’re Bosses of?
• However, this is another problem that arises
from real life
• Ask yourself, what departments are the bosses
of departments assigned to?
• For example, if “Bob” is the head of Marketing
and his department is listed as Marketing, is
he his own boss?
• It should be apparent that his boss is the
managing director
75
The Design Contains Apparent
Redundancies, but is Flexible
• Another detail that might be considered is
split assignments or temporary assignments
• If an employee is split 50-50 between
departments, who is their boss?
• If an employee is only temporarily assigned to
a department, who is their boss?
• The apparently redundant design allows such
cases to be handled with full flexibility
76
A One-to-One Recursive Relationship
that Forms a Linked List
• The next example the book pursues is a little
artificial
• However, something like it might arise in real
life, and this provides an introduction to the
idea
• It is possible for there to be a one-to-one
relationship between a table and itself
77
The Monarchs of England
• The following overhead illustrates the idea
with the succession of monarchs
• The idea is that the pk of the monarch table is
embedded as a fk in the table
• Every monarch except the first has the
previous monarch recorded
• The problem could also be solved by simply
recording a numbering for the monarchs
78
Just Assigning a Reign Number Might
Be Clearer and Easier
79
A Many-to-Many Recursive
Relationship
• The next example considers a table in a manyto-many relationship with itself
• This is another example drawn from real life
which is very instructive about how relational
databases work
• It is helpful because it brings out one of the
limitations of relational databases
• It provides insight into the subject of objectoriented databases
80
• Whenever a table is in a relationship with itself,
the book refers to this as a recursive relationship
• As far as I’m concerned, the use of the term
recursive is optional, although descriptive
• I am just as happy in this context with saying “in a
relationship with itself”
• In any case, consider the ER diagram on the next
overhead and the explanatory remarks that
follow
81
This is Something Practical, Not
Artificial
82
• The idea is that the Product table contains
entries for stand-alone products (possible subproducts) and for products (super-products)
that consist of collections of other products
• Potentially the Product table might also
contain things (sub-products) which
themselves aren’t even individual products,
but which only exist as components of
finished products
83
• The Assembly table is the table which shows
the relationship between products and subproducts (whether those sub-products have
an independent existence or not)
• Notice that both of the crows’ feet in the
diagram have + signs on them
• This means that the pk of an assembly is the
concatenation of the embedded fk’s of a
(super) product and a (sub) product
84
• In addition, the Assembly table has a quantity
field, telling how many of the sub-product
there are in the super-product
• If you assume that this is just a two-level
hierarchy with super-products and subproducts, things seem relatively clear
• However, both from a database point of view
and a real life point of view, there is no need
for this restriction to apply
85
• There is no reason why a given product might
not consist of several other (sub) products
• Each of these (sub) products, in turn might be
super-products consisting of other subproducts, and so on
• Now the descriptiveness of the term recursion
becomes apparent
86
• There is no theoretical limit on how deeply
things might be related in this kind of “has-a”
relationship
• Practically speaking, the only limit is how
many rows there are in the Product table
• This last claim leads to one more observation
87
• Data integrity would require that no product
be a super-product or sub-product of itself
• Otherwise you would have a containment
cycle
• It seems apparent that in real life this
shouldn’t occur
88
This is the Relational DB Way of Capturing
a Tree-Like Set of Relationships
• The product-assembly relationship crops up
reasonably frequently in real life
• If you think about it, what’s really being
captured is a tree-like containment structure
• Manufacturing is a problem domain where
this is relevant
89
• Working from the top down, a car has various
components, including doors
• Doors may be made of a variety of panels,
among other things
• The panels may consist of various items,
including screws
• And so on, down the line
90
• The given relational design works, to a certain
extent, but it has shortcomings
• For example, it is not necessarily an easy way
to understand or a natural way to envision
tree-like relationships
• In particular, consider what you know about
SQL and what kind of query you might liked to
execute against products and assemblies
91
An SQL Query Can Go One Level Down the
Tree; It Can’t Go Arbitrary Levels Down
• SQL is non-procedural
• For a given product you could ask for all of its
immediate sub-products or sub-assemblies
• However, it would not be possible to form a
query that would retrieve all of the
constituent parts of a given product
• SQL won’t allow you to travel “down the tree”
92
Object-Oriented Databases Are
Inherently Tree-Like
• It is these problems that led, at least in part,
to the development of what are known as
object-oriented databases
• In essence, O-O databases are constructed
around tree-like containment
93
Object-Oriented Databases Are
Valuable, but They are a Niche Only
• Although extremely useful in some problem
domains, it is estimated that O-O db’s have
about 5% of the commercial market
• The remaining 95% is relational because
relational db’s are applicable and convenient
in so many other problem domains
94
The CD Music Library Again—
• The chapter concludes with the latest version
of the CD music library
• It illustrates several points
• Although the ER diagram is useful for getting
the big picture, it’s becoming clear that
without written text explaining the problem
and the assumptions made, you haven’t
completely and clearly documented what’s
going on
95
• This example illustrates another point, which
is also relevant to the final project
• You might have thought that a CD music
library was a pretty simple, toy application
• Notice that it has grown to 13 tables, twice as
many as you’re required to have for your
project
96
• It is likely that before you’re finished with your
project, you will be simplifying the problem
you tackled so that you meet the minimum
requirements without inviting too much
trouble for yourself
97
• The previous version of the design had these
tables: CD, Composition, Label, Person,
Person-CD, Person-Composition, PersonRecording, Recording Track
• This latest version has these tables added to
it: Group, Group-CD, Group-Recording,
Person-Group
• The ER diagram is shown on the following
overhead
98
99
The End
100