Kettle Database connections Tools and utilities Repositories Virtual
Конценции Kettle • ■■ Database connections • ■■ Tools and utilities • ■■ Repositories • ■■ Virtual File Systems • ■■ Parameters and variables • ■■ Visual programming
Transformation
row metadata Каждый step может содержать описание строк (rows), которые он записывает. Это определение строк также называют row metadata. В row metadata входит следующее: • ■■ Name: Уникальное в row название поля. • ■■ Data type: The data type of the field. • ■■ Length: The length of a String, or number of a Big. Number data type. • ■■ Precision: The decimal precision of a number of a Big. Number data type. • ■■ Mask: The representation format (or conversion mask). This will come into play if you convert numeric (Number, Integer, Big. Number) or Date data types to String. This happens, for example, during data preview in the user interface or during serialization to text or XML. • ■■ Decimal: The decimal symbol in a number. This symbol is culturally defined and is typically either a dot (. ) or a comma (, ). • ■■ Group: The grouping symbol. This symbol is also culturally defined and is typically either a comma (, ), a dot (. ), or a single quotation mark (‘). • ■■ Step origin: Kettle сохраняет путь к исходному полю в row.
Row set содержит набор из ноля или нескольких полей (fields), которые могут содержать данные следующих типов: • ■■ String: Any type of character data without any particular limit. • ■■ Number: A double precision floating point number. • ■■ Integer: A signed long integer (64 -bit). • ■■ Big. Number: A number with arbitrary (unlimited) precision. • ■■ Date: A date-time value with millisecond precision. • ■■ Boolean: A Boolean value can contain true or false. • ■■ Binary: Binary fields can contain images, sounds, videos, and other types of binary data.
Row set
Notes
Область видимости переменных • Valid in the parent job • Valid in the current job • Valid in the grand-parent job • Valid in the root job • Valid in the Java Virtual Machine (JVM)
Kettle ETL tool ■■ Connectivity ■■ Platform independence and scalability ■■ Design flexibility and component reuse ■■ Extensibility ■■ Data transformations ■■ Testing and debugging ■■ Lineage and impact analysis ■■ Logging and auditing
Connectivity • ■■ Connect to and get data from the most common relational database systems including Oracle, MS SQL Server, IBM DB/2, Ingres, My. SQL, or Postgre. SQL. • ■■ Read data from ASCII files in a delimited or fixed format. • ■■ Read data from XML files (XML is the lingua franca of data interchange). • ■■ Read data from popular Office formats such as Access databases or Excel spreadsheets. • ■■ Get files from external sites using FTP, SFTP, or SSH (preferably without scripting).
Средства для проверки доступа к данным • • • ■■ Test a repository connection. ■■ Ping a host to check whether it’s available. ■■Wait for a SQL command to return success/failure based on a row count condition. ■■ Check for empty folders. ■■ Check for the existence of a file, table, or column. ■■ Compare files or folders. ■■ Set a timeout on FTP and SSH connections. ■■ Create failure/success outputs on every available job step.
Data Profiling • ■■ Number of NULL or empty values • ■■ Number of distinct values • ■■Minimum, maximum, and average value (numeric fields) • ■■Minimum, maximum, and average length (string fields) • ■■ Patterns (for example, ###-#### for phone numbers) • ■■ Data distribution
Data Transformations • ■■ Slowly Changing Dimension support • ■■ Lookup values • ■■ Pivot and unpivot • ■■ Conditional split • ■■ Sort, merge, and join • ■■ Aggregate
Data Conversion Date to String Внутреннее представление Date содержит данные в виде date/time с точностью до миллисекунды. Полная информация о формате содержится в документации Sun Java API. Пример для даты December 6, 2009 at 21 hours, 6 minutes and 54. 321 seconds: Conversion Mask (format) Result • yyyy/MM/dd’T’HH: mm: ss. SSS 2009/12/06 T 21: 06: 54. 321 • h: mm a 9: 06 PM • HH: mm: ss 21: 06: 54 • M-d-yy 12 -6 -09
Data Conversion Numeric to String Преобразование Numeric в String описано в документации Java API. Примеры: Value Conversion Mask Decimal Grouping Result Symbol • 1234. 5678 #, ###. , 1, 234. 57 • 1234. 5678 000, 00000 , . 001. 234, 56780 • -1. 9 #. 00; -#. 00. , -1. 9 • 1. 9 #. 00; -#. 00. , 1. 9 • 12 00000; -00000 00012
Configuration Files and the. kettle Directory • ■■. spoonrc • ■■ jdbc. properties • ■■ kettle. pwd • ■■ repositories. xml • ■■ shared. xml
Литература • Pentaho® Kettle Solutions Building Open Source ETL Solutions with Pentaho Data Integration Matt Casters, Roland Bouman, Jos van Dongen • Pentaho 3. 2 Data Integration Beginner's Guide María Carina Roldán • Pentaho Data Integration 4 Cookbook Adrián Sergio Pulvirenti, María Carina Roldán • Pentaho® Solutions Business Intelligence and Data Warehousing with Pentaho and My. SQL® Roland Bouman, Jos van Dongen
- Slides: 43