The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data

Paperback
from $0.00

Author: Joe Caserta

ISBN-10: 0764567578

ISBN-13: 9780764567575

Category: Data Warehousing & Mining

Search in google:

The single most authoritative guide on the most difficult phase of building a data warehouseThe extract, transform, and load (ETL) phase of the data warehouse development life cycle is far and away the most difficult, time-consuming, and labor-intensive phase of building a data warehouse. Done right, companies can maximize their use of data storage; if not, they can end up wasting millions of dollars storing obsolete and rarely used data. Bestselling author Ralph Kimball, along with Joe Caserta, shows you how a properly designed ETL system extracts the data from the source systems, enforces data quality and consistency standards, conforms the data so that separate sources can be used together, and finally delivers the data in a presentation-ready format.Serving as a road map for planning, designing, building, and running the back-room of a data warehouse, this book provides complete coverage of proven, timesaving ETL techniques. Beginning with a quick overview of ETL fundamentals, it then looks at ETL data structures, both relational and dimensional. The authors show how to build useful dimensional structures, providing practical examples of techniques. Along the way you’ll learn how to:Plan and design your ETL systemChoose the appropriate architecture from the many possible optionsBuild the development/test/production suite of ETL processesBuild a comprehensive data cleaning subsystemTune the overall ETL process for optimum performance

Pt. IRequirements, realities, and architecture1Ch. 1Surrounding the requirements3Ch. 2ETL data structures29Pt. IIData flow53Ch. 3Extracting55Ch. 4Cleaning and conforming113Ch. 5Delivering dimension tables161Ch. 6Delivering fact tables209Pt. IIIImplementation and operations255Ch. 7Development257Ch. 8Operations301Ch. 9Metadata351Ch. 10Responsibilities383Pt. IVReal time streaming ETL systems419Ch. 11Real-time ETL systems421Ch. 12Conclusions461