Machine Learning For Dummies
Book image
Explore Book Buy On Amazon
The real world of machine learning relies heavily on huge datasets. Imagine trying to wend your way through the enormous data generated just by the sales made by Amazon.com every day. The point is that you need products that help you manage these huge datasets in a manner that makes them easier to work with and faster to process. This is where Spark comes in. It relies on a clustering technique.

The emphasis of Spark is speed. When you visit the site, you’re greeted by statistics, such as Spark’s capability to process data a hundred times faster than other products, such as Hadoop MapReduce (see the tutorial) in memory. However, Spark also offers flexibility in that it works with Java, Scala, Python, and R, and it runs on any platform that supports Apache. You can even run Spark in the cloud if you want.

Spark works with huge datasets, which means that you need to know programming languages, database management, and other developer techniques to use it. This means that the Spark learning curve can be quite high, and you need to provide time for developers on your team to learn it. The simple examples at Spark’s website give you some ideas of just what is involved. Notice that all the examples include some level of coding, so you really do need to have programming skills to use this option.

About This Article

This article is from the book:

About the book authors:

John Mueller has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). Luca Massaron, a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques.

John Mueller has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). Luca Massaron, a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques.

This article can be found in the category: