Leveraging Singular Value Decomposition for Predictive Analytics

Mohamed Chaouchi

Tommy Jung

Anasse Bari

Updated

2016-11-29 16:00:18

From the book

Predictive Analytics For Dummies

Download E-Book

Data Science Essentials For Dummies

Explore Book

Download E-Book

Data Science Essentials For Dummies

Explore Book

You can leverage singular value decomposition for predictive analytics. Singular value decomposition (SVD) represents a dataset by eliminating the less important parts and generating an accurate approximation of the original dataset. In this regard, SVD and PCA are methods of data reduction.

SVD will take a matrix as an input and decompose it into a product of three simpler matrices.

An m by n matrix M can be represented as a product of three other matrices as follows:

M = U * S * V ^T

Where U is an m by r matrix, V is an n by r matrix, and S is an r by r matrix; where r is the rank of the matrix M. The * represents matrix multiplication. ^T indicates matrix transposition.

In a data matrix where fewer concepts can describe the data, or can relate the data matrix’s columns to its rows, then SVD is a very useful tool to extract those concepts. For example, a dataset might contains books’ ratings, where the reviews are the rows and books the columns. The books can be grouped by type or domain, such as literature and fiction, history, biographies, children’s or teen books. Those will be the concepts that SVD can help extract.

These concepts must be meaningful and conclusive. If you stick to only a few concepts or dimensions to describe a larger dataset, our approximation will not be as accurate. This is primarily why it's important to only eliminate concepts that are less important and not relevant to the overall dataset.

Latent semantic indexing is a data mining and natural language processing technique that is used in document retrieval and word similarity. Latent semantic indexing employs SVD to group documents to the concepts that could consist of different words found in those documents. The universe of words can be very large, and various words can be grouped into a concept. SVD helps reduce the noisy correlation between those words and their documents, and it gives you a representation of that universe using far fewer dimensions than the original dataset.

It is easier to see that documents discussing similar topics can use different words to describe those same topics. A document describing lions in Zimbabwe and another document describing elephants in Kenya should be grouped together. So you rely on concepts (wildlife in Africa, in this case), not words, to group these documents. The relation between documents and their words is established with those concepts or topics.

SVD and PCA have been used in classification and clustering. Generating those concepts is just a form of classification and grouping the data. Both have also been used for collaborative filtering.

About This Article

About the book author:

Mohamed Chaouchi is a veteran software engineer who has conducted extensive research using data mining methods.

Tommy Jung is a software engineer with expertise in enterprise web applications and analytics.

Anasse Bari, Ph.D. is data science expert and a university professor who has many years of predictive modeling and data analytics experience.

This article can be found in the category:

General Data Science

Hot off the press

Explore Related content

Data Science Essentials For Dummies

Data Analytics & Visualization All-in-One For Dummies

Tableau For Dummies

Microsoft Power BI For Dummies

Decision Intelligence For Dummies

Data Lakes For Dummies

SAS For Dummies

Predictive Analytics For Dummies

Data Science Programming All-in-One For Dummies

Data Science Strategy For Dummies

Data Science For Dummies

Data Mining For Dummies

Blockchain Data Analytics For Dummies

Algorithms For Dummies

Book & Article Categories

Book & Article Categories

Collections

Leveraging Singular Value Decomposition for Predictive Analytics

About This Article

About the book author:

This article can be found in the category:

Explore Related content

Book & Article Categories

Book & Article Categories

Collections

Leveraging Singular Value Decomposition for Predictive Analytics

About This Article

This article is from the book:

About the book author:

This article can be found in the category:

Explore Related content

Linear Regression vs. Logistic Regression

Data Analytics & Visualization All-in-One Cheat Sheet

Decision Intelligence For Dummies Cheat Sheet

Microsoft Power BI For Dummies Cheat Sheet

Laws and Regulations You Should Know for Blockchain Data Analysis Projects

Aligning Blockchain Data with Real-World Business Processes

An Intro to Aligning Blockchain Data Analytics with Business Goals

Blockchain Use Cases

Fitting Blockchain into Today’s Business Processes

A Quick Comparison of Blockchain Data Analytics Toolsets and Frameworks

The Primary Types of Blockchain

10 Tools for Developing Blockchain Analytics Models

10 Tips for Visualizing Blockchain Data

10 Uses for Blockchain Analytics

Blockchain Data Analytics For Dummies Cheat Sheet

How to Perform Pattern Matching in Python

How Data is Collected and Why It Can Be Problematic

The Need for Reliable Sources in Data Science Applications

How Pattern Matching Works in Data Science

The Basics of Deep Learning Framework Usage and Low-End Framework Options