Pentaho Data Integration – Kettle (CE)
I. Objectives
This course is focused on the use of the Pentaho ETL component: Pentaho Data Integration.
Upon completion of the course, the student will have sufficient knowledge to:
Install the product and the corresponding Java version.
Perform data transformation (ETL) from different data sources to different destinations.
II. Requirements
SQL knowledge.
Optional:
Knowledge of other ETL tools.
III. Duration
20 hours.
IV. Methodology
The course is developed through theoretical presentation accompanied by practical demonstrations and explanations of the results obtained.
The student carries out practices with the product for each concept explained. Different data sources will be used, including text files, spreadsheets and databases, mainly relational (MySQL).
Resolution of doubts about the concepts presented.
V. Content
Introduction to Pentaho Community Edition and its components. PCE components. Java requirements.
Java and PDI installation.
PDI. Pentaho Data Integrator. Databases:Connections to databases.Using shared connections.Basic components of transformations:Creating transformations.Importing and exporting using tables.Importing and exporting from plain text files, csv, Excel, xml, etc.Using calculator and formulas.Selecting columns and data filters.Using lookup, group by, split, pivot.Merge join.Mapping.Jobs, variables and properties:Modifying Kettle properties.Creating jobs.Using variables.Using parameters.Job flow and error management.Additional elements:Using PDI from terminal.Running code in database.Dynamic processing of files.Moving files in the operating system.Writing in the PDI “log”.Waiting for files (filewatcher).Checking for file existence.Checking for table existence (BBDD).
Practices with transformations and jobs.