Data Lakes For Dummies Cheat Sheet

Alan Simpson

Updated

2022-02-25 17:43:49

From the book

Data Lakes For Dummies

Download E-Book

Data Warehousing For Dummies

Explore Book

Download E-Book

Data Warehousing For Dummies

Explore Book

A data lake is an enterprise-scale home for analytical data from all corners of your company or governmental agency. No matter what your analytical data landscape looks like today, your organization will benefit from building a data lake.

Five phases to building a data lake

Your data lake journey begins with a thorough understanding of today’s analytics and data throughout your entire organization. Then, you’ll methodically progress through conceptual and high-level activities into your implementation activities. Follow these phases whose first letters spell out A LAKE:

ASSESS your current state, and score the results.
Prepare a LOFTY VISION for what your data lake will bring you, both technology-wise and in terms of business value.
Decide on your data lake ARCHITECTURE, starting at the conceptual level and then shifting into specific products and services.
Begin with your KICKOFF ACTIVITIES that will deliver the first end-to-end data pipelines that culminate in high-value analytics.
Progressively EXPAND your data lake through subsequent phases.

Three types of data for a data lake

If you’ve been working primarily with traditional data warehouses and data marts, you’re in for a treat. Not only will your data lake include the structured data that you’re used to working with, but you’ll also ingest, manage, and deliver:

Semi-structured data, such as tweets, blog posts, and email messages
Unstructured data, such as photos, videos, and audio files

Your next generation of analytics will be built from the fusion of these various types of data. Sometimes, the insights you need aren’t just in the numbers or the stats, but in what you can learn from these other forms of data.

Four zones inside a data lake

From the 30,000-foot view, your data lake appears to be a large store of all types of data. When you peel the lid back, though, your data lake should be well organized into the following zones:

The bronze zone, where you ingest your raw data into inexpensive storage that is infinitely expandable . . . or at least pretty close to infinitely expandable!
The silver zone, where you store your formerly raw data that is now cleansed and enriched
The gold zone, where you store curated packages of data that are prepared to support users and analytical needs all across your enterprise
The sandbox, where you can quickly place data from elsewhere in your data lake — or even new data coming in from the outside — for experimental or short-term analysis

Supporting an entire analytics continuum

Your data lake will support a broad range of analytics in a coordinated, well-architected manner. Prepare to make use of:

Descriptive analytics, which tell you what happened in the past or what’s happening right now
Diagnostic analytics, which dig into your descriptive analytics and help you understand why something happened or is happening
Predictive analytics, which tell you what’s likely to happen
Discovery analytics, in which you turn your analytical power loose on mountains of data with a mission to tell us interesting and important patterns and other insights out of all of this data, without our asking specific questions
Prescriptive analytics, which take all your other categories of analytics to the last mile and guide you to decision-making, present you with alternatives for taking action, and make a recommendation for your “best” course of action

About This Article

About the book author:

Alan Simpson is a web development professional and prolific tech author with more than 100 publications to his credit.

This article can be found in the category:

Databases

Hot off the press

Explore Related content

Data Warehousing For Dummies

Database Development For Dummies

NoSQL For Dummies

Hadoop For Dummies

FileMaker Pro Design and Scripting For Dummies

Oracle 12c For Dummies

LINQ For Dummies

Records Management For Dummies

Book & Article Categories

Book & Article Categories

Collections

Data Lakes For Dummies Cheat Sheet

Five phases to building a data lake

Three types of data for a data lake

Four zones inside a data lake

Supporting an entire analytics continuum

About This Article

About the book author:

This article can be found in the category:

Explore Related content

Book & Article Categories

Book & Article Categories

Collections

Data Lakes For Dummies Cheat Sheet

Five phases to building a data lake

Three types of data for a data lake

Four zones inside a data lake

Supporting an entire analytics continuum

About This Article

This article is from the book:

About the book author:

This article can be found in the category:

Explore Related content

Data Lakes For Dummies Cheat Sheet

Records Management For Dummies Cheat Sheet

Selecting the Correct SAS Product

Appraising Records and Managing Retention Scheduling

Managing Records on Local and Network Drives

Benefits of Records Management

SAS Procedures and Their Location in SAS Enterprise Guide