Aggregate Functions and Collations Aggregate Functions Aggregators aggregate

  • Slides: 9
Download presentation
Aggregate Functions and Collations

Aggregate Functions and Collations

Aggregate Functions Aggregators (aggregate functions) take 0 or more values and return some form

Aggregate Functions Aggregators (aggregate functions) take 0 or more values and return some form of summary or those values. When seen many before: ◦ count(x) - returns the number of items in x ◦ max(x) - returns the largest value in x ◦. . . You can also create a custom aggregator with the Python sqlite 3 module ◦ Lets say we wanted an aggregate function called char_length that tallied the number of characters in the strings passed to it Example Usage: CREATE TABLE x (col TEXT); INSERT INTO x VALUES ('hi'), ('bye'); SELECT char_length(col) FROM x; -- returns 5

Custom Aggregator class Char. Counter: def __init__(self): self. char_count = 0 def step(self, value):

Custom Aggregator class Char. Counter: def __init__(self): self. char_count = 0 def step(self, value): self. char_count += len(value) def finalize(self): return self. char_count conn. create_aggregate("char_length", 1, Char. Counter)

create_aggregate Just like create_function, the create_aggregate method of the SQLite 3 Connection class has

create_aggregate Just like create_function, the create_aggregate method of the SQLite 3 Connection class has three arguments: ◦ name - the SQL name to call the aggregate function by ◦ num_params - the number of parameters for the aggregate function ◦ aggregate_class - the Python class to pass data to The aggregate class must have three methods: ◦ __init__(self) - the init method that initializes the class (usually an attribute to 0) ◦ step(self, value) - a method that is called for every values to be aggregated ◦ finalize(self) - a method that returns the value for the aggregate function/class

How is an aggregate function different from a normal function? Aggregates get called multiple

How is an aggregate function different from a normal function? Aggregates get called multiple times Aggregates can have multiple parameters Aggregates can store state and return summaries Aggregates are more complicated

Collations are the sort order and equality for a data type. SQLite has 3

Collations are the sort order and equality for a data type. SQLite has 3 built in collating functions (a. k. a. collating sequences): ◦ BINARY - Compares the binary representation of the two data types (default) ◦ NOCASE - Same as above except the 26 upper case letters are treated like lower case ◦ RTRIM - Same as BINARY except trailing whitespace is ignored You can specific that you want this sort order in two places: CREATE TABLE x (col TEXT COLLATE NOCASE); Or: SELECT * FROM x ORDER BY col COLLATE NOCASE;

Custom Collation Lets say we want a collation (named "LTRIM") that doesn't care about

Custom Collation Lets say we want a collation (named "LTRIM") that doesn't care about leading white space. ◦ " dog" == "dog" ◦ " cat" < "n dog" We can define a collation function that does that: def collate_ltrim(left, right): left, right = left. lstrip(), right. lstrip() if left == right: return 0 if left < right: return -1 else: return 1 conn. create_collation("LTRIM", collate_ltrim) conn. execute("SELECT * FROM students ORDER BY name COLLATE LTRIM; ")

create_collation The create_collation method of the Connection class takes two arguments: ◦ name -

create_collation The create_collation method of the Connection class takes two arguments: ◦ name - the name the collation is called in SQL queries ◦ callable - a function that is called to figure out if two values are greater then, less than, or equal to each other. The callable needs to take two values: ◦ If the values are equal, return 0 ◦ If the first value is smaller than the second, return -1 ◦ If the first value is greater than the second, return 1 If you need to remove a collation, just replace the callable with None: conn. create_collation("LTRIM", None)

Why would you want to define a collation as a column attribute instead of

Why would you want to define a collation as a column attribute instead of an ORDER BY clause? Makes no difference So that future ORDER BYs don't need to specify it explicitly If we want to change behavior of comparison operators (> < =. . . ) What's a collation?