Relationships Inner Joins SWEN344 Web Engineering A Defining
Relationships & Inner Joins SWEN-344 Web Engineering
A Defining Feature ● Tracking & querying the relationships between data is what relational database systems were designed for ● Steps ○ ○ ○ (1) What relationships do we have? ■ Many-one, or “belongs to” ■ Many-many, ■ One-one (rare) ■. . . there rarely are others! e. g. OO-like inheritance, polymorphic (2) Relationships → Tables (3) Testng ● These relationships are defined by foreign keys ● Foreign keys allow for very fast joins
Primary Keys & Foreign Keys ● Primary key is almost always called id ○ ○ ○ ● The foreign key is conventionally called foo_id where foo is the name of the table it refers to. ○ ○ ● Usually an integer or UUID, or any column type Must uniquely identify the row You CAN use strings as primary keys (e. g. username), BUT! ■ Using strings as primary keys is a performance hit on both size and speed ■ If we need to rename an account ■ Not allowed to rename your account, ever? They probably used your username as a foreign key. You can use a foreign key constraint ensuring that city_id only gets matched with city. id In this class, we do not require foreign keys constraints Key design decision: Your users should never have to remember a primary key or foreign key. The system remembers for you foreign keys id team_name city_id 1 Lakers 1 2 Clippers 1 3 Magic 2 primary keys id city_name 1 Los Angeles 2 Orlando
Many-One ● ● ● Colloquially called “belongs to” or “has one” E. g. ○ ○ ● Every team belongs to a city, and a city can have many teams ○ Every order has one customer, and customers can make many orders ○ Every git commit has one author, but an author can make multiple commits Note column naming convention. “teams. team_name” (bad) → “teams. name” (shorter==better!) Often represents teams id name city_id 1 Lakers 1 2 Clippers 1 3 Magic 2 Authorship Location Defining feature ● How crazy do you want to get with this? Spoiler: don’t ○ ○ E. g. “Every city is in one state, and states can have many cities” …more on this in normalization. . . cities id name 1 Los Angeles 2 Orlando
Getting Our Data Back: Inner Join ● In SELECT’s FROM ○ ○ ● ● ○ ○ ○ ● Keeps the primary-foreign keys logic in the FROM clause more readable Can chain multiple tables in a single join In Postgres, just “join” means “inner join” Yes, there are outer joins! Coming Joins are commutative: order doesn’t matter Number rows <= max(n, m) ○ ○ WHERE orders. customer_id = customers. id AND orders. balance < 5. 0 is a subset of the cross-product is the subset where the foreign keys match with their primary -- Equivalent to above, easier on the eyes key counterparts SELECT * FROM customers Convenient syntax: “INNER JOIN. . ON” ○ ● ● When you list multiple tables, it applies the cross-product all -- join tables involved SELECT * FROM customers, orders Sometimes necessary, but not typical The inner join of two tables ○ ○ -- cross product SELECT * from customers, orders where n, m are the number of rows in each respective table Depends on how many actually join You can think of inner joins as “including data we referenced in another table” or like dereferencing a “pointer” (foreign key) INNER JOIN orders ON orders. customer_id = customers. id WHERE orders. balance < 5. 0 -- Also functionally equivalent to the above SELECT * FROM orders INNER JOIN customers ON orders. customer_id = customers. id WHERE orders. balance < 5. 0 -- Multiple tables at once SELECT * FROM customers INNER JOIN orders ON orders. customer_id = customers. id INNER JOIN store ON order. store_id=stores. id WHERE orders. balance < 5. 0
cities teams Practicing Inner Joins ● How many rows? What are the results? SELECT * FROM teams, cities id name city_id id name 1 Lakers 1 1 Los Angeles 2 Clippers 1 2 Orlando 3 Magic 2 Is this correct in this context?
cities teams id name city_id id name 1 Lakers 1 1 Los Angeles 2 Clippers 1 2 Orlando 3 Magic 2 Practicing Inner Joins ● How many rows? 2*3=6 What are the results? cross-product Is this correct in this context? no SELECT * FROM teams, cities teams. id teams. name city_id cities. name 1 Lakers 1 1 Los Angeles 1 Lakers 1 2 Orlando 2 Clippers 1 1 Los Angeles 2 Clippers 1 2 Orlando 3 Magic 2 1 Los Angeles 3 Magic 2 2 Orlando
cities teams id name city_id id name 1 Lakers 1 1 Los Angeles 2 Clippers 1 2 Orlando 3 Magic 2 Practicing Inner Joins ● How many rows? What are the results? Is this correct in this context? SELECT * FROM teams INNER JOIN cities ON teams. city_id=cities. id teams. name city_id cities. name 1 Lakers 1 1 Los Angeles 2 Clippers 1 1 Los Angeles 3 Magic 2 2 Orlando
cities teams id name city_id id name 1 Lakers 1 1 Los Angeles 2 Clippers 1 2 Orlando 3 Magic 2 Practicing Inner Joins ● How many rows? 3 What are the results? inner join Is this correct in this context? yes SELECT * FROM teams INNER JOIN cities ON teams. city_id=cities. id teams. name city_id cities. name 1 Lakers 1 1 Los Angeles 2 Clippers 1 1 Los Angeles 3 Magic 2 2 Orlando
players Many-Many ● e. g. Players and Teams ○ ○ Players can be on multiple teams in their career Teams have multiple players ● Solution: linking table ○ A table with 2+ foreign keys ● Design tip: name your linking tables if you can ○ ○ ○ If the relationship has no colloquial term, then use the two table names, e. g. “commits_filepaths” or “players_teams” BUT! Often we have words describing that relationship Can be a noun, or adjective phrase, e. g. “played_for”, “roster”, “enrollment” e. g. Students ←Enrollment →Class ● Linking tables can have columns too! e. g. start_year id name 1 Kobe Bryant 2 Shaquille O’Neal played_for id player_id team_id start_year end_year 1 1996 2016 2 2 2 1996 3 2 1 1996 2004 id name city_id 1 Lakers 1 2 Clippers 1 3 Magic 2
teams players Practicing Inner Joins id name city_id 1 Kobe Bryant 1 Lakers 1 2 Shaquille O’Neal 2 Clippers 1 3 Magic 2 Who has played for a single team in L. A. for 20 years in a single stretch? And how long? played_for id player_id team_id start_year end_year 1 1996 2016 2 2 2 1996 3 2 1 1996 2004 cities id name 1 Los Angeles 2 Orlando
teams players Practicing Inner Joins id name city_id 1 Kobe Bryant 1 Lakers 1 2 Shaquille O’Neal 2 Clippers 1 3 Magic 2 played_for Who has played for a single team in L. A. for 20 years in a single stretch? And how long? SELECT *, end_year - start_year AS tenure FROM players INNER JOIN played_for ON players. id=played_for. player_id INNER JOIN teams ON played_for. team_id=teams. id INNER JOIN cities ON teams. city_id=cities. id WHERE end_year - start_year >= 20 AND cities. name = 'Los Angeles' id player_id team_id start_year end_year 1 1996 2016 2 2 2 1996 3 2 1 1996 2004 cities id name 1 Los Angeles 2 Orlando players. id players. name played_for. id played_for. player_id played_for. team_id played_for. start_year played_for. end_year tenure cities. id cities. name 1 Kobe Bryant 1 1994 2016 20 1 Los Angeles (Now can you see why we say to avoid * in production code? We don’t need most of these columns…)
Key Decisions ● Scope creep and schemas are best friends. ● Do we truly need a many-many here? ○ ○ ○ Many-many will grow your schema quickly Make it harder to understand What business value do we get from more details? ● Do we truly need a separate relationship? ○ ○ Or are we okay with repeated data? (normalization - coming soon) e. g. Cities and States in addresses. It’s ok to just say “NY” in 1 million addresses. ● Should we store this column on the main table or the linking table? ○ e. g. Player position like Kobe Bryant being “Guard” ■ Rarely changes, but it can. ■ When does it change? Team to team? Game to game? ■ Do we need the historical details? Or just store “typical position” ■ Step back. Should we even track this?
Self-questions for JOIN queries (non-aggregation version) 1. 2. 3. 4. Question to yourself → what you are actually doing What columns do I need? What tables do I need? Which keys do I link up? Any criteria to filter for? → WHERE → find the schema of your results → finding your data in the schema → using your foreign/primary keys → filter out your rows → SQL → SELECT [*] → FROM → JOIN. . ON
- Slides: 14