Machine Learning For Dummies
Book image
Explore Book Buy On Amazon
The real world of machine learning relies heavily on huge datasets. Imagine trying to wend your way through the enormous data generated just by the sales made by every day. The point is that you need products that help you manage these huge datasets in a manner that makes them easier to work with and faster to process. This is where Spark comes in. It relies on a clustering technique.

The emphasis of Spark is speed. When you visit the site, you’re greeted by statistics, such as Spark’s capability to process data a hundred times faster than other products, such as Hadoop MapReduce (see the tutorial) in memory. However, Spark also offers flexibility in that it works with Java, Scala, Python, and R, and it runs on any platform that supports Apache. You can even run Spark in the cloud if you want.

Spark works with huge datasets, which means that you need to know programming languages, database management, and other developer techniques to use it. This means that the Spark learning curve can be quite high, and you need to provide time for developers on your team to learn it. The simple examples at Spark’s website give you some ideas of just what is involved. Notice that all the examples include some level of coding, so you really do need to have programming skills to use this option.

About This Article

This article is from the book:

About the book authors:

John Paul Mueller is a prolific freelance author and technical editor. He's covered everything from networking and home security to database management and heads-down programming.

Luca Massaron is a data scientist who specializes in organizing and interpreting big data, turning it into smart data with data mining and machine learning techniques.

This article can be found in the category: