04 Grouping and Aggregating Data Brian Alderman MCT
04 | Grouping and Aggregating Data Brian Alderman | MCT, CEO / Founder of Micro. Tech. Point Tobias Ternstrom | Microsoft SQL Server Program Manager
Querying Microsoft SQL Server 2012 Jump Start 01 | Introducing SQL Server 2012 SQL Server types of statements; other SQL statement elements; basic SELECT statements 02 | Advanced SELECT Statements DISTINCT, Aliases, scalar functions and CASE, using JOIN and MERGE; Filtering and sorting data, NULL values 03 | SQL Server Data Types Introduce data types, data type usage, converting data types, understanding SQL Server function types 04 | Grouping and Aggregating Data Aggregate functions, GROUP BY and HAVING clauses, subqueries; self-contained, correlated, and EXISTS; Views, inline-table valued functions, and derived tables | Lunch Break Eat, drink, and recharge for the afternoon session
Module Overview Aggregate functions GROUP BY and HAVING clauses Subqueries (self-contained, correlated, and EXISTS) Working with table functions
Aggregate Functions
Common built-in aggregate functions Common • SUM • MIN • MAX • AVG • COUNT_BIG Statistical • • STDEVP VARP Other • CHECKSUM_AGG • GROUPING_ID
Working with aggregate functions Aggregate functions: Return a scalar value (with no column name) Ignore NULLs except in COUNT(*) Can be used in SELECT, HAVING, and ORDER BY clauses Frequently used with GROUP BY clause SELECT COUNT (DISTINCT Sales. Order. ID) AS Unique. Orders, AVG(Unit. Price) AS Avg_Unit. Price, MIN(Order. Qty)AS Min_Order. Qty, MAX(Line. Total) AS Max_Line. Total FROM Sales. Order. Detail; Unique. Orders Avg_Unit. Price Min_Order. Qty Max_Line. Total -------------31465 465. 0934 1 27893. 619000
Using DISTINCT with aggregate functions Use DISTINCT with aggregate functions to summarize only unique values DISTINCT aggregates eliminate duplicate values, not rows (unlike SELECT DISTINCT) Compare (with partial results): SELECT Sales. Person. ID, YEAR(Order. Date) AS Order. Year, COUNT(Customer. ID) AS All_Custs, COUNT(DISTINCT Customer. ID) AS Unique_Custs FROM Sales. Order. Header GROUP BY Sales. Person. ID, YEAR(Order. Date); Sales. Person. ID Order. Year All_Custs Unique_custs -----------289 2006 84 48 281 2008 52 27 285 2007 9 8 277 2006 140 57
Using the GROUP BY clause GROUP BY creates groups for output rows, according to unique combination of values specified in the GROUP BY clause SELECT <select_list> FROM <table_source> WHERE <search_condition> GROUP BY <group_by_list>; GROUP BY calculates a summary value for aggregate functions in subsequent phases SELECT Sales. Person. ID, COUNT(*) AS Cnt FROM Sales. Order. Header GROUP BY Sales. Person. ID; Detail rows are “lost” after GROUP BY clause is processed
Demo Using Aggregate functions
GROUP BY and HAVING
GROUP BY and logical order of operations HAVING, SELECT, and ORDER BY must return a single value per group All columns in SELECT, HAVING, and ORDER BY must appear in GROUP BY clause or be inputs to aggregate expressions Logical Order Phase 5 SELECT 1 FROM 2 WHERE 3 GROUP BY 6 ORDER BY Comments Creates groups If a query uses GROUP BY, all subsequent phases operate on the groups, 4 HAVING Operates on groups not source rows
Using GROUP BY with aggregate functions Aggregate functions are commonly used in SELECT clause, summarize per group: SELECT Customer. ID, COUNT(*) AS cnt FROM Sales. Order. Header GROUP BY Customer. ID; Aggregate functions may refer to any columns, not just those in GROUP BY clause SELECT productid, MAX(Order. Qty) AS largest_order FROM Sales. Order. Detail GROUP BY productid;
Filtering grouped data using HAVING Clause HAVING clause provides a search condition that each group must satisfy HAVING clause is processed after GROUP BY SELECT Customer. ID, COUNT(*) AS Count_Orders FROM Sales. Order. Header GROUP BY Customer. ID HAVING COUNT(*) > 10;
Compare HAVING to WHERE clauses WHERE filters rows before groups created • Using a COUNT(*) expression in HAVING clause is useful to solve Controls which rows problems: are placed into groups common business HAVING filters groups which groups are passed to next logical phase • Show. Controls only customers that have placed more than one order: SELECT Customerid, COUNT(*) AS cnt FROM Sales. Customer AS Cust JOIN Sales. Order. Header AS Ord ON Customer. ID = ORD. Customer. ID GROUP BY Customer. ID HAVING COUNT(*) > 1; • Show only products that appear on 10 or more orders: SELECT Product. ID, COUNT(*) AS cnt FROM Production. Product AS Prod JOIN Sales. Order. Detail AS Ord ON Product. ID = Ord. Product. ID GROUP BY Product. ID HAVING COUNT(*) >= 10;
Demo Using GROUP BY and HAVING
Subqueries
Working with subqueries Subqueries are nested queries or queries within queries Results from inner query are passed to outer query Inner query acts like an expression from perspective of outer query Subqueries can be self-contained or correlated Self-contained subqueries have no dependency on outer query Correlated subqueries depend on values from outer query Subqueries can be scalar, multi-valued, or table-valued
Writing scalar subqueries Scalar subquery returns single value to outer query Can be used anywhere single-valued expression can be used: SELECT, WHERE, etc. SELECT Sales. Order. ID, Product. ID, Unit. Price, Order. Qty FROM Sales. Order. Detail WHERE Sales. Order. ID = (SELECT MAX(Sales. Order. ID) AS Last. Order FROM Sales. Order. Header); If inner query returns an empty set, result is converted to NULL Construction of outer query determines whether inner query must return a single value
Writing multi-valued subqueries Multi-valued subquery returns multiple values as a single column set to the outer query Used with IN predicate If any value in the subquery result matches IN predicate expression, the predicate returns TRUE SELECT Customer. ID, Sales. Order. Id, Territory. ID FROM Salesorder. Header WHERE Customer. ID IN ( SELECT Customer. ID FROM Sales. Customer WHERE Territory. ID = 10); May also be expressed as a JOIN (test both for performance)
Writing queries using EXISTS with subqueries The keyword EXISTS does not follow a column name or other expression. The SELECT list of a subquery introduced by EXISTS typically only uses an asterisk (*). SELECT Customer. ID, Person. ID FROM Sales. Customer AS Cust WHERE EXISTS ( SELECT * FROM Sales. Order. Header AS Ord WHERE Customer. ID = Ord. Customer. ID); SELECT Customer. ID, Person. ID FROM Sales. Customer AS Cust WHERE NOT EXISTS ( SELECT * FROM Sales. Order. Header AS Ord WHERE Customer. ID = Ord. Customer. ID);
Demo Using subqueries
Table Functions
Creating simple views Views are saved queries created in a database by administrators and developers Views are defined with a single SELECT statement ORDER BY is not permitted in a view definition without the use of TOP, OFFSET/FETCH, or FOR XML To sort the output, use ORDER BY in the outer query View creation supports additional options beyond the scope of this class CREATE VIEW Human. Resources. Employee. List AS SELECT Business. Entity. ID, Job. Title, Hire. Date, Vacation. Hours FROM Human. Resources. Employee; SELECT * FROM Human. Resources. Employee. List
Creating simple inline table-valued functions Table-valued functions are created by administrators and developers Create and name function and optional parameters with CREATE FUNCTION Declare return type as TABLE Define inline SELECT statement following RETURN CREATE FUNCTION Sales. fn_Line. Total (@Sales. Order. ID INT) RETURNS TABLE AS RETURN SELECT Sales. Order. ID, CAST((Order. Qty * Unit. Price * (1 - Special. Offer. ID)) AS DECIMAL(8, 2)) AS Line. Total FROM Sales. Order. Detail WHERE Sales. Order. ID = @Sales. Order. ID ;
Writing queries with derived tables Derived tables are named query expressions created within an outer SELECT statement Not stored in database – represents a virtual relational table When processed, unpacked into query against underlying referenced objects Allow you to write more modular queries SELECT <column_list> FROM ( <derived_table_definition> ) AS <derived_table_alias>; Scope of a derived table is the query in which it is defined
Guidelines for derived tables Derived Tables Must • Have an alias • Have names for all columns • Have unique names for all columns • Not use an ORDER BY clause (without TOP or OFFSET/FETCH) • Not be referred to multiple times in the same query Derived Tables May • Use internal or external aliases for columns • Refer to parameters and/or variables • Be nested within other derived tables
Passing arguments to derived tables Derived tables may refer to arguments Arguments may be: Variables declared in the same batch as the SELECT statement Parameters passed into a table-valued function or stored procedure DECLARE @emp_id INT = 9; SELECT orderyear, COUNT(DISTINCT custid) AS cust_count FROM ( SELECT YEAR(orderdate) AS orderyear, custid FROM Sales. Orders WHERE empid=@emp_id ) AS derived_year GROUP BY orderyear;
Creating queries with common table expressions Use WITH clause to create a CTE: Define the table expression in WITH clause Reference the CTE in the outer query Assign column aliases (inline or external) Pass arguments if desired WITH CTE_year AS ( SELECT YEAR(Order. Date) AS Order. Year, customer. ID FROM Sales. Order. Header ) SELECT orderyear, COUNT(DISTINCT Customer. ID) AS Cust. Count FROM CTE_year GROUP BY Order. Year;
Demo Table functions
Summary Aggregate functions are used in SELECT, HAVING, and ORDER By clauses, but are most frequently used with the GROUP BY clause and returns a scalar value Common built-in aggregate functions include Common • SUM • MIN • MAX • AVG • COUNT_BIG Statistical • • STDEVP VARP Other • CHECKSUM_AGG • GROUPING_ID
Summary Use DISTINCT with aggregate functions to only summarize the unique values as it will eliminate duplicate values, not rows GROUP BY creates groups for output rows, according to unique combination of values specified in the GROUP BY clause. GROUP BY also calculates a summary value for aggregate functions in subsequent phases HAVING clause provides a search condition that each group must satisfy and is processed after the GROUP BY clause
Summary Subqueries are nested queries or queries within queries where the results from inner query are passed to the outer query Type of subqueries include Scalar subqueries Multi-valued subqueries Subqueries with the EXISTS clause
Summary Views are named tables expressions with definitions stored in a database that can be referenced in a SELECT statement just like a table Views are defined with a single SELECT statement and then saved in the database as queries Table-valued functions are created with the CREATE FUNCTION. They contain a RETURN type of table Derived tables allow you to write more modular queries as named query expressions that are created within an outer SELECT statement. They represent a virtual relational table so are not stored in the database CTEs are similar to derived tables in scope and naming requirements but unlike derived tables, CTEs support multiple definitions, multiple references, and recursion
Course Topics Querying Microsoft SQL Server 2012 Jump Start 01 | Introducing SQL Server 2012 SQL Server types of statements; other SQL statement elements; basic SELECT statements 02 | Advanced SELECT Statements DISTINCT, Aliases, scalar functions and CASE, using JOIN and MERGE; Filtering and sorting data, NULL values 03 | SQL Server Data Types Introduce data types, data type usage, converting data types, understanding SQL Server function types 04 | Grouping and Aggregating data Aggregate functions, GROUP BY and HAVING clauses, subqueries; self-contained, correlated, and EXISTS; Views, inline-table valued functions, and derived tables | Lunch Break Eat, drink, and recharge for the afternoon session
© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Office, Azure, System Center, Dynamics and other product names are or may be registered trademarks and/or trademarks in the U. S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
- Slides: 35