Chapter 8: Top-down Relational Database Design: NORMALIZATION Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use What can happen when we combine relations/tables? Suppose we combine the tables instructor and department This creates redundancy (repetition of information): Database System Concepts - 6th Edition 8.2 ©Silberschatz, Korth and Sudarshan What can happen when we combine relations/tables? Another problem: UPDATE When any of the redundant info is changed, the changes hav eto be applied to multiple tuples! Database System Concepts - 6th Edition 8.3 ©Silberschatz, Korth and Sudarshan What can happen when we combine relations/tables? Another problem: INSERTION When a new department is created, there are no instructors associated with it yet → need to use NULL values! Database System Concepts - 6th Edition 8.4 ©Silberschatz, Korth and Sudarshan Is there any reason to combine relations/tables? Any query that involves a natural join between department and instructor will execute faster on the combined table! This is generally preferred in data mining. Database System Concepts - 6th Edition 8.5 ©Silberschatz, Korth and Sudarshan The “top-down” approach In this chapter, we look at the problem in the opposite direction: Starting with a large table that contains many columns and much redundant information, how can we split (decompose) it into tables with fewer columns and less redundancy? Database System Concepts - 6th Edition 8.6 ©Silberschatz, Korth and Sudarshan Top-down: Decomposition Suppose we start with the table inst_dept. How would we get the idea to decompose it into instructor and department? Naïve approach: spot redundancies in data … but it doesn’t work for two reasons! Database System Concepts - 6th Edition 8.7 ©Silberschatz, Korth and Sudarshan Spotting redundancies in data Problem 1: It’s costly Real-life DBs can have large amount of data (hundreds of columns, hundreds of millions of rows) Spotting redundancies requires consideration of combinations of elements from a set that is already large → → Combinatorial explosion (N-squared and worse) Database System Concepts - 6th Edition 8.8 ©Silberschatz, Korth and Sudarshan Spotting redundancies in data Problem 2: From data alone, it’s impossible to decide whether a pattern discovered is coincidence or not Is it the case that departments always reside in one building and have a unique budget? Database System Concepts - 6th Edition 8.9 ©Silberschatz, Korth and Sudarshan Solution: Examine not the data itself (a.k.a. syntax), but the meaning of the data, a.k.a. the semantics! The designer must be allowed to specify rules of the enterprise, a.k.a. functional dependencies, e.g. dept_name → building, budget Database System Concepts - 6th Edition 8.10 ©Silberschatz, Korth and Sudarshan dept_name building, budget What does it mean? “If several rows have the same value for dept_name, then they also have the same values for building and budget.” or “If there were a schema (dept_name, building, budget), then dept_name would be a candidate key.” Database System Concepts - 6th Edition 8.11 ©Silberschatz, Korth and Sudarshan dept_name building, budget “If there were a schema (dept_name, building, budget), then dept_name would be a candidate key.” Since in our table inst_dept dept_name is not a candidate key, the building and budget of a department may have to be repeated along with dept_name. This indicates the need to decompose inst_dept. Database System Concepts - 6th Edition 8.12 ©Silberschatz, Korth and Sudarshan dept_name building, budget “If there were a schema (dept_name, building, budget), then dept_name would be a candidate key.” This example also shows how functional dependencies (FD) are different from keys: a FD captures a rule that is in general more granular than a key. A key is a FD, but a FD is not always a key! Database System Concepts - 6th Edition 8.13 ©Silberschatz, Korth and Sudarshan Not all decompositions are good! Suppose we decompose employee(ID, name, street, city, salary) into employee1 (ID, name) employee2 (name, street, city, salary) Problem: we cannot reconstruct the original employee relation! Database System Concepts - 6th Edition 8.14 ©Silberschatz, Korth and Sudarshan A lossy decomposition Database System Concepts - 6th Edition 8.15 ©Silberschatz, Korth and Sudarshan But there are also lossless decompositions! Technically it’s called a lossless-join decomposition Decomposition of R = (A, B, C) R1 = (A, B) R2 = (B, C) A B C A B B C 1 2 1 2 A B A B A,B(r) r A (r) B (r) Database System Concepts - 6th Edition 1 2 A B C 1 2 B,C(r) A B 8.16 ©Silberschatz, Korth and Sudarshan How to avoid lossy decompositions? Database System Concepts - 6th Edition 8.17 ©Silberschatz, Korth and Sudarshan Goal — Devise a theory for the following Decide whether a particular relation R is in “good” form. When the relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that each relation is in good form the decomposition is a lossless-join decomposition Our theory is based on dependencies: functional dependencies multivalued dependencies The process outlined above is called NORMALIZATION Database System Concepts - 6th Edition 8.18 ©Silberschatz, Korth and Sudarshan 8.2 First Normal Form (1NF) A domain is atomic if its elements are treated by the DBMS as indivisible units. Examples of non-atomic domains: Names IDs that can be broken up into parts (e.g. CS401) Phone Any with first +middle + last numbers composite attributes! A relational schema R is in first normal form (1NF) if the domains of all attributes of R are atomic. For now, we assume all relations to be in 1NF (but see Ch.22: Object-Based Databases) Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan 1NF Atomicity is actually a property of how the elements of the domain are used! Example: Students are given roll numbers which are strings of the form CS0012 or EE1127 Strings would normally be considered indivisible … … but if the first two characters are extracted to find the dept., the domain of roll numbers is not atomic. Doing so is a bad idea: leads to encoding of information in the app. program rather than in the DB. Why is this bad? What should the DB designer do in this case? Database System Concepts - 6th Edition 8.20 ©Silberschatz, Korth and Sudarshan 8.3 Functional Dependencies (FD) FDs are constraints on the set of legal relations. Require that the value for a certain set of attributes determine uniquely the value for another set of attributes. A FD is a generalization of the concept of key: A key requires that the value for a certain set of attributes determine uniquely the value for all remaining attributes. Database System Concepts - 6th Edition 8.21 ©Silberschatz, Korth and Sudarshan Functional Dependencies Let R be a relation schema, and a, b two sets of attributes R and R The functional dependency holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is, t1[] = t2 [] t1[ ] = t2 [ ] Example: Consider r(A,B ) with the following instance of r. 1 1 3 4 5 7 On this instance, A B does NOT hold, but B A does hold. Database System Concepts - 6th Edition 8.22 ©Silberschatz, Korth and Sudarshan QUIZ: Functional Dependencies Decide if the following FDs hold or not: AB BA {A, C} D {A, B, C} D Database System Concepts - 6th Edition 8.23 ©Silberschatz, Korth and Sudarshan FD vs. key K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if K R, and for no K, R FDs allow us to express constraints that cannot be expressed using (super)keys. Consider the schema: inst_dept (ID, name, salary, dept_name, building, budget ) We expect these FDs to hold: dept_name building ID building but would not expect the following FD to hold: dept_name salary Database System Concepts - 6th Edition 8.24 ©Silberschatz, Korth and Sudarshan QUIZ: FD vs. key Decide if the following are candidate keys of not: A B {A, C} {A, B, C} {A, B, C, D} D Database System Concepts - 6th Edition 8.25 ©Silberschatz, Korth and Sudarshan Uses for FDs Test relations to see if they are legal under a given set of FDs. If a relation r is legal under a set F of FDs, we say that r satisfies F. Specify constraints on the set of legal relations We say that F holds on R if all legal relations on R satisfy the set of FDs F. Note: A specific instance of a relation schema may satisfy a FD even if the FD does not hold on all legal instances. Example: a specific instance of instructor may, by chance, satisfy name ID. Database System Concepts - 6th Edition 8.26 ©Silberschatz, Korth and Sudarshan Trivial FD A functional dependency is trivial if it is satisfied by all instances of a relation Example: ID, name ID name name In general, Database System Concepts - 6th Edition is trivial if 8.27 ©Silberschatz, Korth and Sudarshan QUIZ: Trivial FDs Give 4 examples of trivial FDs in this relation. Database System Concepts - 6th Edition 8.28 ©Silberschatz, Korth and Sudarshan The Holy Grail: Closure of a set of FDs Given a set F of FDs, there are certain other FDs that are logically implied by F. For example: If A B and B C, then we can infer that A C The set of all FDs logically implied by F is the closure of F. We denote the closure of F by F+. F+ is a superset of F. Database System Concepts - 6th Edition 8.29 ©Silberschatz, Korth and Sudarshan 8.3.2 Boyce-Codd Normal Form A relation schema R is in BCNF with respect to a set F of FDs if for all FDs in F+ of the form (where R and R), at least one of the following is true: is trivial (i.e., ) is a superkey for R Database System Concepts - 6th Edition 8.30 ©Silberschatz, Korth and Sudarshan QUIZ: BCNF at least one of the following holds: is trivial (i.e., ) is a superkey for R Is this schema in BCNF? instr_dept (ID, name, salary, dept_name, building, budget ) Database System Concepts - 6th Edition 8.31 ©Silberschatz, Korth and Sudarshan QUIZ: BCNF at least one of the following holds: is trivial (i.e., ) is a superkey for R Is this schema in BCNF? instr_dept (ID, name, salary, dept_name, building, budget ) No, because dept_name building, budget holds, but dept_name is not a superkey (Why?) Database System Concepts - 6th Edition 8.32 ©Silberschatz, Korth and Sudarshan Extra-credit QUIZ: BCNF EOL1/3 Database System Concepts - 6th Edition 8.33 ©Silberschatz, Korth and Sudarshan Quiz: What is the difference between keys and functional dependencies (FD)? Database System Concepts - 6th Edition 8.34 ©Silberschatz, Korth and Sudarshan Quiz: What is the difference between keys and functional dependencies (FD)? A key is a FD, but a FD is not always a key! Database System Concepts - 6th Edition 8.35 ©Silberschatz, Korth and Sudarshan BCNF Decomposition Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF. We decompose R into: • (U ) •( R - ( - ) ) Database System Concepts - 6th Edition 8.36 ©Silberschatz, Korth and Sudarshan Decomposing a Schema into BCNF We decompose R into: • (U ) •( R - ( - ) ) In our example: = dept_name = building, budget and inst_dept is replaced by (U ) = ( dept_name, building, budget ) ( R - ( - ) ) = ( ID, name, salary, dept_name ) Database System Concepts - 6th Edition 8.37 ©Silberschatz, Korth and Sudarshan QUIZ 1: BCNF We decompose R into: • (U ) •( R - ( - ) ) Take = {A, B, C, D} = {C, D, E, F}, and the entire relation is R = {A,B,C,D,E,F,G,H} What is the decomposition? Database System Concepts - 6th Edition 8.38 ©Silberschatz, Korth and Sudarshan QUIZ 2: BCNF We decompose R into: • (U ) •( R - ( - ) ) Take = {A, B} = {E, F}, and the entire relation is R = {A,B,C,D,E,F,G,H} What is the decomposition? Database System Concepts - 6th Edition 8.39 ©Silberschatz, Korth and Sudarshan QUIZ 3: BCNF Is this relation in BCNF? Hint: Rename the attributes A, B, C, …. Database System Concepts - 6th Edition 8.40 ©Silberschatz, Korth and Sudarshan QUIZ 3: BCNF A: Not BCNF, b/c both FDs are violations! Decompose it to BCNF! Database System Concepts - 6th Edition 8.41 ©Silberschatz, Korth and Sudarshan QUIZ 3: BCNF Solution: Database System Concepts - 6th Edition 8.42 ©Silberschatz, Korth and Sudarshan Dependency Preservation Constraints, including FDs, are costly to check in practice unless they pertain to only one relation. If it is sufficient to test only those dependencies on each individual relation of a decomposition in order to ensure that all functional dependencies hold, then that decomposition is dependency preserving. Database System Concepts - 6th Edition 8.43 ©Silberschatz, Korth and Sudarshan BCNF and Dependency Preservation ER model of a bank: A customer can have more than 1 personal banker, but at most one at any given branch. A ternary relationshipset is needed: Database System Concepts - 6th Edition 8.44 ©Silberschatz, Korth and Sudarshan BCNF and Dependency Preservation Implementation: R = cust_banker_branch = (customer_id, employee_id, branch_name, type) FDs: FD1: employee_id branch_name FD2: (customer_id, branch_name) (employee_id, type) Is cust_banker_branch in BCNF? Database System Concepts - 6th Edition 8.45 ©Silberschatz, Korth and Sudarshan BCNF and Dependency Preservation Implementation: R = cust_banker_branch = (customer_id, employee_id, branch_name, type) FDs: FD1: employee_id branch_name FD2: (customer_id, branch_name) (employee_id, type) Apply the decomposition algorithm! Database System Concepts - 6th Edition 8.46 ©Silberschatz, Korth and Sudarshan BCNF and Dependency Preservation Implementation: R = cust_banker_branch = (customer_id, employee_id, branch_name, type) FDs: FD1: employee_id branch_name FD2: (customer_id, branch_name) (employee_id, type) Decomposition: R1 = (employee_id, branch_name) R2 = (customer_id, employee_id, type) Problem: FD2 is now “spread” across two relations! Database System Concepts - 6th Edition 8.47 ©Silberschatz, Korth and Sudarshan BCNF and Dependency Preservation Conclusion: BCNF is not dependency preserving (in general) Because it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker normal form … Database System Concepts - 6th Edition 8.48 ©Silberschatz, Korth and Sudarshan Third Normal Form = 3NF A relation schema R is in third normal form (3NF) if for all: in F+ at least one of the following holds: is trivial (i.e., ) is a superkey for R Each attribute A in – is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold). Third condition is a minimal relaxation of BCNF to ensure dependency preservation. Database System Concepts - 6th Edition 8.49 ©Silberschatz, Korth and Sudarshan SKIP all other 3NF theory! The only facts about 3NF we cover are those on the previous slide! Database System Concepts - 6th Edition 8.50 ©Silberschatz, Korth and Sudarshan Whatever happened with 2NF? In a nutshell, it forbids attributes to depend on parts of keys. It is not of practical use anymore. See Second normal form - Wikipedia, the free encyclopedia for more details. Database System Concepts - 6th Edition 8.51 ©Silberschatz, Korth and Sudarshan Review of Normal Forms Database System Concepts - 6th Edition 8.52 ©Silberschatz, Korth and Sudarshan Updated list of Normalization Goals Let R be a relation scheme with a set F of FDs: Decide whether R is in “good” form. If R is not in “good” form, decompose it into a set of relation schemes {R1, R2, ..., Rn} such that : each relation scheme is in good form the decomposition is a lossless-join decomposition Preferably, the decomposition should be dependency preserving. Database System Concepts - 6th Edition 8.53 ©Silberschatz, Korth and Sudarshan 8.4 Functional-Dependency Theory This is the formal theory that tells us which functional dependencies are implied logically by a given set of functional dependencies. Database System Concepts - 6th Edition 8.54 ©Silberschatz, Korth and Sudarshan Remember: Closure of a Set of FDs Given a set F set of FDs, there are certain other FDs that are logically implied by F. E.g. transitivity: If A B and B C, then also A C The set of all functional dependencies logically implied by F is the closure of F. We denote the closure of F by Database System Concepts - 6th Edition 8.55 F+. ©Silberschatz, Korth and Sudarshan Armstrong’s Axioms We can find F+, the closure of F, by repeatedly applying Armstrong’s Axioms: if , then (reflexivity) if , then (augmentation) if , and , then (transitivity) These rules are sound (They generate only FDs that actually hold) complete (They generate all FDs that hold). Database System Concepts - 6th Edition 8.56 ©Silberschatz, Korth and Sudarshan QUIZ: Armstrong’s Axioms Write Armstrong’s Axioms: (reflexivity) (augmentation) (transitivity) Database System Concepts - 6th Edition 8.57 ©Silberschatz, Korth and Sudarshan QUIZ: Armstrong’s Axioms Write Armstrong’s Axioms: if , then (reflexivity) if , then (augmentation) if , and , then (transitivity) Database System Concepts - 6th Edition 8.58 ©Silberschatz, Korth and Sudarshan Examples of use of A’s Axioms Given the following relation: and the set of FDs R = (A, B, C, G, H, I) F= { AB AC CG H CG I B H} Some other members of the closure F+ are: AH AG I by transitivity from A B and B H by augmenting A C with G, to get AG CG and then transitivity with CG I CG HI by augmenting CG I to infer CG CGI, and augmenting of CG H to infer CGI HI, and then transitivity Database System Concepts - 6th Edition 8.59 ©Silberschatz, Korth and Sudarshan Your turn! if , then (reflexivity) if , then (augmentation) if , and , then (transitivity) Prove that if and only if Double implication: L-to-R and R-to-L! EOL 2/3 Database System Concepts - 6th Edition 8.60 ©Silberschatz, Korth and Sudarshan QUIZ: BCNF and 3NF Consider the following relation: What non-trivial FDs exist? Source: Database http://en.wikipedia.org/wiki/Boyce-Codd_normal_form System Concepts - 6th Edition 8.61 ©Silberschatz, Korth and Sudarshan QUIZ: BCNF and 3NF Let’s simplify the notation: AB → C and C → B. F1: Person, Shop Type → Nearest Shop F2: Nearest Shop → Shop Type Is the relation in BCNF? Source: Database http://en.wikipedia.org/wiki/Boyce-Codd_normal_form System Concepts - 6th Edition 8.62 ©Silberschatz, Korth and Sudarshan QUIZ: BCNF and 3NF Let’s simplify the notation: AB → C and C → B. No, b/c C → B is a violation: C is not superkey. Is the relation in 3NF? Source: Database http://en.wikipedia.org/wiki/Boyce-Codd_normal_form System Concepts - 6th Edition 8.63 ©Silberschatz, Korth and Sudarshan QUIZ: BCNF and 3NF Let’s simplify the notation: AB → C and C → B. No, b/c C → B is a violation: C is not superkey. Remember: 3NF has the following condition in addition to BCNF: Each attribute A in – is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) Source: Database http://en.wikipedia.org/wiki/Boyce-Codd_normal_form System Concepts - 6th Edition 8.64 ©Silberschatz, Korth and Sudarshan QUIZ: BCNF and 3NF Let’s simplify the notation: AB → C and C → B. B is part of the candidate key AB. This shows that C → B is not a 3NF violation, so the relation is in 3NF! Do the BCNF decomposition Source: Database http://en.wikipedia.org/wiki/Boyce-Codd_normal_form System Concepts - 6th Edition 8.65 ©Silberschatz, Korth and Sudarshan QUIZ: BCNF and 3NF Let’s simplify the notation: AB → C and C → B. R1 = {B, C} R2 = {A, C} Is the decomposition above dependency-preserving? Source: Database http://en.wikipedia.org/wiki/Boyce-Codd_normal_form System Concepts - 6th Edition 8.66 ©Silberschatz, Korth and Sudarshan More FD theorems, a.k.a. rules or results Exercise 8.4 Exercise 8.26 Exercise 8.5 Database System Concepts - 6th Edition 8.67 ©Silberschatz, Korth and Sudarshan Naïve Algorithm for Computing F+ Apply as many axioms and theorems to derive new FDs! Can you find 3 more FDs in this manner? Do you see a problem with this approach? Database System Concepts - 6th Edition 8.68 ©Silberschatz, Korth and Sudarshan Extra-credit QUIZ Database System Concepts - 6th Edition 8.69 ©Silberschatz, Korth and Sudarshan Algorithm for Computing F+ To compute the closure F+ of a set of FDs F: Assign F+ = F repeat for each FD f in F+ apply reflexivity and augmentation rules on f add the resulting FDs to F + for each pair of FDs f1and f2 in F + if f1 and f2 can be combined using transitivity, add the resulting FD to F + until F + does not change any further Database System Concepts - 6th Edition 8.70 ©Silberschatz, Korth and Sudarshan QUIZ: Algorithm for Computing F+ To compute the closure F+ of a set of FDs F: Assign F+ = F repeat for each FD f in F+ apply reflexivity and augmentation rules on f add the resulting FDs to F + for each pair of FDs f1and f2 in F + if f1 and f2 can be combined using transitivity, add the resulting FD to F + until F + does not change any further Apply the algorithm to R = {A, B, C}, with AB → C Database System Concepts - 6th Edition 8.71 ©Silberschatz, Korth and Sudarshan Closure of Attribute Sets Since computing the entire closure F+ is in general a formidable task, we set ourselves first a more modest goal: Given a set of attributes , define the closure of under F (denoted by +) as the set of attributes that are functionally determined by under F Algorithm to compute +, the closure of under F result := ; while (changes to result) do for each in F do begin if result then result := result end Database System Concepts - 6th Edition 8.72 ©Silberschatz, Korth and Sudarshan Example of Attribute Set Closure R = (A, B, C, G, H, I) F = {A B AC CG H CG I B H} (AG)+ 1. result = AG 2. result = ABCG (A C and A B) 3. result = ABCGH (CG H and CG AGBC) 4. result = ABCGHI (CG I and CG AGBCH) Stop (Why?) Is AG a candidate key? Is AG a super key? 1. Does AG R? Yes, b/c (AG)+ = R. 2. Is any subset of AG a superkey? 1. 1. 2. Does A R? No, b/c (A)+ ≠ R Does G R? No, b/c (G)+ ≠ R Database System Concepts - 6th Edition 8.73 ©Silberschatz, Korth and Sudarshan Uses of Attribute Closure Algorithm Testing for superkey: To test if is a superkey, we compute +, and check if + contains all attributes of R Testing if certain FDs hold: To check if holds (is in F+), just check if + Another algorithm for computing closure F+ of F: For each R, find the closure + for each S +, we output the FD S Still very expensive, but at least we have a more systematic way of doing it! Database System Concepts - 6th Edition 8.74 ©Silberschatz, Korth and Sudarshan Uses of Attribute Closure Algorithm Testing for superkey: To test if is a superkey, we compute +, and check if + contains all attributes of R Testing if a certain FDs holds: To check if holds (is in F+), just check if + Another algorithm for computing closure F+ of F: For each R, find the closure + for each S +, we output the FD S Database System Concepts - 6th Edition 8.75 ©Silberschatz, Korth and Sudarshan QUIZ: Uses of Attribute Closure Alg. Testing for superkey: To test if is a superkey, we compute +, and check if + contains all attributes of R R = (A, B, C, D) F = {A BC, B C, A B, AB C, BC → D} Is AD a superkey? Is AD a candidate key? Database System Concepts - 6th Edition 8.76 ©Silberschatz, Korth and Sudarshan QUIZ: Uses of Attribute Closure Alg. Testing if a certain FDs holds: To check if holds (is in F+), just check if + R = (A, B, C, D) F = {A BC, B C, A B, AB C, BC → D} Does AC → D hold ? Database System Concepts - 6th Edition 8.77 ©Silberschatz, Korth and Sudarshan Why is closure important for BCNF? To check if a non-trivial dependency causes a violation of BCNF 1. compute + (the attribute closure of ), and 2. verify that it includes all attributes of R, that is, it is a superkey of R. Simplified test: To check if a relation schema R is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+. If none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either. However, simplified test using only F is incorrect when testing a relation in a decomposition of R Consider R = (A, B, C, D, E), with F = { A B, BC D} Decompose R into R1 = (A, B) and R2 = (A, C, D, E) Neither of the dependencies in F contain only attributes from (A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF. In fact, dependency AC D in F+ shows R2 is not in BCNF. Database System Concepts - 6th Edition 8.78 ©Silberschatz, Korth and Sudarshan SKIP: Canonical Cover and everything until … Database System Concepts - 6th Edition 8.79 ©Silberschatz, Korth and Sudarshan 8.8 Overall DB Design Process We have assumed that the schema R is given, but how does R appear in practice? R can be generated when converting E-R diagram to a set of tables. R can be a single relation containing all attributes that are of interest (called universal relation). Normalization then breaks R into smaller relations. R can be the result of some ad hoc design of relations, which we then test/convert to normal form. Database System Concepts - 6th Edition 8.80 ©Silberschatz, Korth and Sudarshan ER Model and Normalization When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated from the E-R diagram should not need further normalization. However, in a real (imperfect) design, there can be: FDs from non-key attributes of an entity set to other attributes of the same entity set, e.g.: employee entity with attributes including department_name and building, and the FD department_name building Good design would have made department a separate entity FDs from non-key attributes of a relationship set to other … It’s possible, but rare, since most relationships are binary. Database System Concepts - 6th Edition 8.81 ©Silberschatz, Korth and Sudarshan Denormalization for Performance May want to use non-normalized schema for performance For example, displaying prereqs along with course_id, and title requires join of course with prereq Alternative 1: Use denormalized relation containing attributes of course as well as prereq with all above attributes faster lookup extra space and extra execution time for updates extra coding work for programmer and possibility of error in extra code Alternative 2: use a materialized view defined as course prereq Benefits and drawbacks same as above, except no extra coding work for programmer and avoids possible errors Database System Concepts - 6th Edition 8.82 ©Silberschatz, Korth and Sudarshan Other Design Issues Some aspects of DB design are not caught by normalization. Examples of bad DB design, to be avoided: Instead of earnings (company_id, year, amount ), use Separate tables: earnings_2004, earnings_2005, earnings_2006, etc. All these tables are in BCNF, but: querying across years is difficult a new table needs to be created each year One table, but with a separate column for each year: company_year (company_id, earnings_2004, earnings_2005, earnings_2006) It’s also in BCNF, but also makes querying across years difficult and requires new attribute each year. Is an example of a crosstab, where values for one attribute become column names Used in spreadsheets, and other data analysis tools Database System Concepts - 6th Edition 8.83 ©Silberschatz, Korth and Sudarshan SKIP 8.9 Modeling Temporal Data Database System Concepts - 6th Edition 8.84 ©Silberschatz, Korth and Sudarshan Homework for Ch.8 8.4, 8.5 8.6 (Derive only 6 new FDs, using the closure algorithm in Fig. 8.7) 8.26 8.29 except (d) – 3NF decomposition Database System Concepts - 6th Edition 8.85 ©Silberschatz, Korth and Sudarshan The next slides are a collection of the algorithms we need to know from this chapter Database System Concepts - 6th Edition 8.86 ©Silberschatz, Korth and Sudarshan Algorithm for F+ Assign F+ = F repeat for each FD f in F+ apply reflexivity and augmentation rules on f add the resulting FDs to F + for each pair of FDs f1and f2 in F + if f1 and f2 can be combined using transitivity, add the resulting FD to F + until F + does not change any further Database System Concepts - 6th Edition 8.87 ©Silberschatz, Korth and Sudarshan Algorithm for + result := ; while (changes to result) do for each in F do begin if result then result := result end Database System Concepts - 6th Edition 8.88 ©Silberschatz, Korth and Sudarshan Decomposing a Schema into BCNF Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF. We decompose R into: (U ) (R-(-)) • • Database System Concepts - 6th Edition 8.89 ©Silberschatz, Korth and Sudarshan
© Copyright 2024