MultiLanguage Extensibility in MS SQL Server James Hamilton
Multi-Language Extensibility in MS SQL Server James Hamilton James. RH@microsoft. com Microsoft SQL Server 2002. 06. 25
Agenda u Common Language Runtime (CLR) Integration Overview: Ø Ø u SQL Features Enabled: Ø Ø Ø Ø Ø u Basic infrastructure Design philosophy Ø 4 S’s: Safety, Security, Scalability, & Speed CLR assemblies Scalar functions Relational functions Aggregate functions Stored Procedures Triggers Types and Methods SQL Types Package Access methods Systems & Integration issues 2
Basic Infrastructure u CLR Hosted inside SQL Server: Ø Ø Ø u Competitive offerings: Ø u Oracle & DB 2: in-proc Java & JDBC Data access in process: Ø Ø u 4 S’s: safety, security, scalability, & speed Run verified, type-safe code in process. NET Framework languages (VB, C#, Java, …) Based on ADO. NET Same programming model as middle-tier SQLTypes support: Ø SQL type semantics in managed code on client & server tiers 3
Philosophy: Safety & Security u Safety: Ø Ø u User code does not compromise integrity of server process Ø Verifiable code Ø Leverage CLR’s code access security User code cannot call UI, create threads, synchronization, or call unmanaged code Security: Ø Ø Ø Access to SQL data from user code via SQL authorization model Access to system resources from user code via. NET Framework code permissions Administrators control permissions given to 4 assemblies
Philosophy: Security model u Resources Ø threads, files, unmanaged code access etc. u API Ø that exposes a resource introduces a demand for that permission u Stack-walk Ø protected by permissions based permission check: every assembly in call-stack has permission u Permissions granted to assemblies determined by machine and user-level security policy u Allows host to add another policy layer to further restrict permissions 5
Philosophy: 3 Permission Sets u SAFESQL Ø Ø u EXTERNAL ACCESS Ø Ø u Internal computation plus data access No access to resources outside SQL Server No unmanaged calls Must be verifiable SAFESQL + access to external resources Requires EXTERNAL ACCESS permission to create SQL Server will impersonate the caller Must be verifiable UNRESTRICTED Ø Ø No controls: can call unmanaged, & un-verifiable Only Sysadmin can create 6
Philosophy: Scalability & Speed u Scalability: Ø u As many concurrent users as TSQL Ø Integrated SQL and runtime threads Ø Collaboration between SQL and GC Speed: Ø Ø Ø Efficient data access in process Compiled user code, not interpreted as TSQL Fast transitions in/out of runtime 7
Philosophy: Functions speed u. NET functions approximating speed of TSQL inline expressions u. NET Framework functions much faster than TSQL functions for complex expressions 8
Agenda u CLR Integration Overview: Ø Ø u SQL Features Enabled: Ø Ø Ø Ø Ø u Basic infrastructure Design philosophy Ø 4 S’s: Safety, Security, Scalability, & Speed CLR assemblies Scalar functions Relational functions Aggregate functions Stored Procedures Triggers Types and Methods Access methods SQL Types Package Systems & Integration issues 9
Development Steps VB, C#, … VS. NET Project Build Runtime hosted inside SQL Queries: select sum(tax(sal, state) ) from Emp where county = ‘King’ Assembly: “Tax. Lib. dll” SQL Data Definition: create assembly … create function … create procedure … create trigger … create type … SQL Server 10
Assembly Registration CREATE ASSEMBLY lib_geom FROM ‘\m 1typesgeometry. dll’ WITH PERMISSION SET SAFE WITH AUTOREGISTER DROP ASSEMBLY lib_geom u Assemblies stored in DB Ø u Code permissions assigned per assembly Ø u Safe. SQL, external access, & unrestricted Autoregister functions Ø u Backup, restore, etc. with data Using. NET custom attributes Assembly benefits Ø Ø Self-describing metadata: types & file dependencies Unit of deployment: permissions & versioning 11
ALTER Assembly u Schema bound: Ø Ø Ø u Cannot invalidate persistent data or indexes No tables with columns of UDT from this assembly No indexes on functions of that assembly Packaging considerations Ø Place routines and types in different assemblies 12
Register A Function CREATE FUNCTION distance ( @x 1 int, @y 1 int, @x 2 int, @y 2 int ) RETURNS float EXTERNAL NAME ‘lib_geom: CPoint. Distance’ DETERMINISTIC RETURNS NULL ON NULL INPUT DROP FUNCTION distance u Functions called from queries: Ø Ø u u Static class functions Deterministic functions: Ø No SQL updates or access to global state Can be scalar or table-valued Using a function in a query: SELECT s. name FROM Supplier s WHERE dbo. distance( s. x, s. y, @x, @y ) < 3 13
Call Function SELECT name FROM Employee WHERE dbo. tax(salary) >= 50000 Project Filter Scan SQLSERVR. EXE Common Language Runtime double tax( double sal ) { if (sal < 50000. 0) return sal * 0. 15; if (sal >= 50000. 0 && sal <= 90000. 0) return sal * 0. 23; else return sal * 0. 35; } Employee Unmanaged code Managed code 14
Register A Procedure CREATE PROCEDURE check_inventory EXTERNAL NAME ‘lib_events: CInventory. check_level’ DROP PROCEDURE check_inventory u Procedures called directly Ø Ø Ø Can contain SQL queries, updates, or DDL Can return results directly to client Not directly callable (as functions are) from queries 15
Register A Trigger CREATE TRIGGER supplier_event ON supplier AFTER INSERT, UPDATE EXTERNAL NAME ‘lib_events: CNotif. Supp_Event’ DROP TRIGGER supplier_event u Similar to procedures Ø with access to inserted & deleted tables 16
UDT: Introduction u Extends SQL Server type system u Complex structure & associated behavior u Managed classes in any CLR language u Functionally subsume SQL-99 distinct types Ø equivalent to structured types u Other products: Oracle Cartridges, Informix Data. Blades, & IBM DB Extenders 17
UDT: Ordering values u Two kinds of ordering supported u Binary ordering indicated via CLR property: public const bool u Operator-based Ø Is. Byte. Ordered; ordering Overloaded comparison operators: public static SQLBool operator == (<type>, <type>); 18
UDT: Creating u Registered as a type from an already registered assembly CREATE ASSEMBLY Map. Lib FROM ‘\mysrvshareMap. Lib. dll’ CREATE TYPE Point EXTERNAL NAME ‘Map. Lib: Basetypes’ 19
UDT: Instantiating u Can be declared as column type: Create table Cities( Name varchar(20), State varchar(20), Location Point DEFAULT new Point(0, 0)) u Variables & params of T-SQL &. NET routines can UDTs 20
UDT: Read operations u Registered methods, properties, & public data members can be use in SQL queries u Methods assumed to be non-mutators unless marked using custom attribute u Only non-mutators allowed in SELECTs Declare @p Set @p = new point(32, 23) Select Location: : Distance(@p) From Cities 21
UDT: Ordering Operations u Indexing, UNIQUE, & PRIMARY KEY constraints: Requires binary ordering support Ø Inconsistent operator based ordering can cause corrupt indices, incorrect query results Ø u ORDER BY, GROUP BY, & comparison operators: If UDT supports binary ordering, always use binary ordering Ø Else use overloaded operators; rely on consistency and correctness of UDT implementation Ø 22
UDT: UDT Write Operations u Constructor operator can be called using new UPDATE Cities SET Location = new Point(12. 3, 46. 2) u Properties and public data members can be modified through assignment UPDATE Cities SET Location: : X = 23. 5, Location: : Y = 23. 5 WHERE Name = ‘Anchorage’ 23
UDT: Write Operations u Methods marked mutators callable in UPDATEs UPDATE Cities SET Location: : Set. XY(23. 5, 23. 5) WHERE Name = ‘Anchorage’ u INSERTs get values from string, binary form or from constructor invocation INSERT Cities(‘Anchorage’, ‘Alaska’, ’ 19. 3: 2. 3’) 24
UDAggs Required Methods u u Implement QP interface to aggregate values over a group Interface needed: Ø Ø Initialize a group Pass values in the group to accumulate the aggregation Merge multiple groups (for parallel plans) Finalize computation and retrieve result 25
Creating UDAggs u u Aggregates implemented as set of methods packaged in a. NET class UDAgg is bound to a class in existing assembly CREATE AGGREGATE Concat(nvarchar) RETURNS nvarchar EXTERNAL ‘My. Lib: Concat’ 26
UDAgg class definition e. g. Public class Concat { //Private store for accumulating results. private SQLString agg; //Optimizer properties public static bool Is. Null. On. Nulls. Set() {return true; } public static bool Is. Null. On. Empty. Set() {return false; } //Aggregation interface public void Init(); public void Accum(SQLString str); public void Merge(Concat otheragg); public SQLString Term(); } 27
Final Stage: Access Methods u u Complete extensibility solution requires user defined access methods: Informix supports adding access methods: Ø Ø Ø u Expose locking, concurrency control, recovery, etc. Hard to make work in general … few engineers able to write internal storage engine code well Many will try and they will have bugs … hard on product quality image Solution: Tailor existing access methods Ø Ø E. g. map spatial queries to queries over 2 Btrees or single Z-transform Query rewrite language such that system can rewrite a function as an appropriate set of operations over 1 or more tables/indexes 28
CLR SQL Types Package u u Defines C#, VB, & Java types corresponding SQL Server types Reduce impedance mismatch between programming language & data Consistent expression evaluation in mid- & server-tier programming SQL Types library Ø Ø Managed classes: system. Data. SQLTypes Provide SQL semantics Ø Nullability, three-valued logic Ø Precision & scale in operations 29
SQL Types Example u Tax function implemented with SQL types: using System; using System. Data. SQLTypes; public class my. Finances { public static SQLDouble tax( SQLDouble sal ) { if ( sal < 50000. 0 ) return sal * 0. 15; if ( sal >= 50000. 0 && sal <= 90000. 0 ) return sal * 0. 23 else return sal * 0. 35; } } 30
Agenda u CLR Integration Overview: Ø Ø u SQL Features Enabled: Ø Ø Ø Ø Ø u Basic infrastructure Design philosophy Ø 4 S’s: Safety, Security, Scalability, & Speed CLR assemblies Scalar functions Relational functions Aggregate functions Stored Procedures Triggers Types and Methods Access methods SQL Types Package Systems & Integration issues 31
Integration: In-Proc Data Access u Goals: Ø Ø u Symmetric model Ø Client, server, & mid-tier Fully supported by dev tools Ø VS dev environment including debug In same address space as server: Ø Ø Don’t marshal to TDS (tabular data stream) Don’t loop-back through network interface Avoid unnecessary copies Avoid unnecessary transitions across managed/unmanaged (into VM) interface 32
Integration: Process Model u Thread integration: Ø Ø Ø u Lazy preemptive: Ø Ø Ø u SQL Server: non-preemptive user-level thread scheduling Fibers multiplexed on O/S thread with thread migration Implication: TLS storage used by VM won’t work in SQL execution environment When SQL calls potentially blocking code it must “go preemptive” (allocate a thread) So thread required for each VM call – expensive Lazy pre-emptive: assume preemption not required and set timer to catch rare cases when thread allocation is needed Garbage collection blocks VM threads: Ø Ø Hard on multi-user DB throughput GC doesn’t directly suspend – calls DB to suspend and DB schedules other non-VM hosted work 33
Integration: Memory Management u Memory allocation from DB system (rather than via O/S): Ø Ø Ø Allows memory resources to be used for different purposes over time Garbage collection very general but DB systems often have more info: Ø e. g. free memory pool at end of statement Goal: Ø DB memory costing able to request memory from VM when needed Ø Per user memory consumption tracking & quotas 34
Integration: Remaining Issues u Assemblies (code) kept in database: Ø Ø u u Support attention propagation (user interrupt) into VM Track all VM resources Ø u Dispatched to VM through buffers Stream interface would be better yet O/S portability layer tailored to DB execution environment Error containment/minimize multi-user impact of failures: Ø Ø Out-of-proc execution Multiple VMs 35
Integration: Query optimization u Gather function cost statistics Ø u Reorder of predicate evaluation Ø u Value histograms, execution cost Based on cost of evaluation Function indexes Ø Ø Speed up expensive functions Extends computed column indexes and indexed (materialized) views 36
CLR Integration Summary u u DB extensibility without product stability risk Ability to add types and behavior w/o access to engine source code Ø u Rich server programming model Ø Ø u Scalable, safe, secure & fast Any CLR language Symmetric mid- & server-tier model Better development environment Ø Integrates SQL tools with Visual Studio 37
- Slides: 37