Lossless join decomposition

In database design, a lossless join decomposition is a decomposition of a relation R {\displaystyle R} into relations R 1 , R 2 {\displaystyle R_{1},R_{2}} such that a natural join of the two smaller relations yields back the original relation. This is central in removing redundancy safely from databases while preserving the original data.[1] Lossless join can also be called nonadditive.[2]

Criteria

Let R 1 {\displaystyle R_{1}} and R 2 {\displaystyle R_{2}} be a decomposition of a relation R {\displaystyle R} .

The decomposition is lossless if and only if the natural join of R 1 {\displaystyle R_{1}} and R 2 {\displaystyle R_{2}} results in the original relation R {\displaystyle R} (i.e., R 1 R 2 = R {\displaystyle R_{1}\bowtie R_{2}=R} ).[3][unreliable source?]

Equivalently, the decomposition is lossless if and only if one of the sub-relations (i.e. R 1 {\displaystyle R_{1}} or R 2 {\displaystyle R_{2}} ) is a subset of the closure of their intersection.[4] In other words, the decomposition of R {\displaystyle R} is lossless if either R 1 [ R 1 R 2 ] + {\displaystyle R_{1}\subseteq [R_{1}\cap R_{2}]^{+}} or R 2 [ R 1 R 2 ] + {\displaystyle R_{2}\subseteq [R_{1}\cap R_{2}]^{+}} is true.

Criteria for multiple sub-relations

Multiple sub-relations R 1 , R 2 , . . . , R n {\displaystyle R_{1},R_{2},...,R_{n}} have a lossless join if there is some way in which we can repeatedly perform lossless joins until all the relations have been joined into a single relation. Once we have a new sub-relation made from a lossless join, we are not allowed to use any of its isolated sub-relations to join with any of the other relations. For example, if we can do a lossless join on a pair of relations R i , R j {\displaystyle R_{i},R_{j}} to form a new relation R i , j {\displaystyle R_{i,j}} , we use this new relation (rather than R i {\displaystyle R_{i}} or R j {\displaystyle R_{j}} ) to form a lossless join with another relation R k {\displaystyle R_{k}} (which may already be joined (e.g., R k , l {\displaystyle R_{k,l}} )).

Examples

  • Let R = ( A , B , C , D ) {\displaystyle R=(A,B,C,D)} be the relation schema, with attributes A, B, C and D.
  • Let F = { A B C } {\displaystyle F=\{A\rightarrow BC\}} be the set of functional dependencies.
  • Decomposition into R 1 = ( A , B , C ) {\displaystyle R_{1}=(A,B,C)} and R 2 = ( A , D ) {\displaystyle R_{2}=(A,D)} is lossless under F because R 1 R 2 = A ) {\displaystyle R_{1}\cap R_{2}=A)} . A is a superkey in R 1 {\displaystyle R_{1}} , meaning we have a functional dependency { A B C } {\displaystyle \{A\rightarrow BC\}} .  In other words, now we have proven that ( R 1 R 2 R 1 ) F + {\displaystyle (R_{1}\cap R_{2}\rightarrow R_{1})\in F^{+}} .

[5][6]

References

  1. ^ Pohler, K (2015). "Lossless-Join Decomposition: applications in quantitative computing metrics". International Journal of Applied Computer Science. 21 (4): 190–212.
  2. ^ Elmasri, Ramez (2016). Fundamentals of database systems (Seventh ed.). Hoboken, NJ: Pearson. p. 461. ISBN 978-0133970777.
  3. ^ "Lossless Join Property". Stackoverflow.com. Retrieved 2016-02-07.
  4. ^ "Lossless Join Decomposition" (PDF). University at Buffalo. Jan Chomicki. Retrieved 2012-02-08.
  5. ^ "Lossless-Join Decomposition". Cs.sfu.ca. Retrieved 2016-02-07.
  6. ^ "www.data-e-education.com - Lossless Join Decomposition". Archived from the original on 2014-02-21. Retrieved 2014-02-12.