{"appState":{"pageLoadApiCallsStatus":true},"articleState":{"article":{"headers":{"creationTime":"2016-10-04T20:37:08+00:00","modifiedTime":"2016-10-04T20:37:08+00:00","timestamp":"2022-09-14T18:16:12+00:00"},"data":{"breadcrumbs":[{"name":"Technology","_links":{"self":"https://dummies-api.dummies.com/v2/categories/33512"},"slug":"technology","categoryId":33512},{"name":"Information Technology","_links":{"self":"https://dummies-api.dummies.com/v2/categories/33572"},"slug":"information-technology","categoryId":33572},{"name":"AI","_links":{"self":"https://dummies-api.dummies.com/v2/categories/33574"},"slug":"ai","categoryId":33574},{"name":"Machine Learning","_links":{"self":"https://dummies-api.dummies.com/v2/categories/33575"},"slug":"machine-learning","categoryId":33575}],"title":"Machine Learning: Using Spark to Deal with Massive Data","strippedTitle":"machine learning: using spark to deal with massive data","slug":"machine-learning-using-spark-deal-massive-data","canonicalUrl":"","seo":{"metaDescription":"The real world of machine learning relies heavily on huge datasets. Imagine trying to wend your way through the enormous data generated just by the sales made b","noIndex":0,"noFollow":0},"content":"The real world of machine learning relies heavily on huge datasets. Imagine trying to wend your way through the enormous data generated just by the sales made by Amazon.com every day. The point is that you need products that help you manage these huge datasets in a manner that makes them easier to work with and faster to process. This is where <a href=\"http://spark.apache.org/\">Spark</a> comes in. It relies on a clustering technique.\r\n\r\nThe emphasis of Spark is speed. When you visit the site, you’re greeted by statistics, such as Spark’s capability to process data a hundred times faster than other products, such as <a href=\"http://hadoop.apache.org/\">Hadoop</a> MapReduce (see the <a href=\"https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html\">tutorial</a>) in memory. However, Spark also offers flexibility in that it works with Java, Scala, Python, and R, and it runs on any platform that supports Apache. You can even run Spark in the cloud if you want.\r\n<p class=\"article-tips remember\">Spark works with huge datasets, which means that you need to know programming languages, database management, and other developer techniques to use it. This means that the Spark learning curve can be quite high, and you need to provide time for developers on your team to learn it. The simple examples at Spark’s website give you some ideas of just what is involved. Notice that all the examples include some level of coding, so you really do need to have programming skills to use this option.</p>","description":"The real world of machine learning relies heavily on huge datasets. Imagine trying to wend your way through the enormous data generated just by the sales made by Amazon.com every day. The point is that you need products that help you manage these huge datasets in a manner that makes them easier to work with and faster to process. This is where <a href=\"http://spark.apache.org/\">Spark</a> comes in. It relies on a clustering technique.\r\n\r\nThe emphasis of Spark is speed. When you visit the site, you’re greeted by statistics, such as Spark’s capability to process data a hundred times faster than other products, such as <a href=\"http://hadoop.apache.org/\">Hadoop</a> MapReduce (see the <a href=\"https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html\">tutorial</a>) in memory. However, Spark also offers flexibility in that it works with Java, Scala, Python, and R, and it runs on any platform that supports Apache. You can even run Spark in the cloud if you want.\r\n<p class=\"article-tips remember\">Spark works with huge datasets, which means that you need to know programming languages, database management, and other developer techniques to use it. This means that the Spark learning curve can be quite high, and you need to provide time for developers on your team to learn it. The simple examples at Spark’s website give you some ideas of just what is involved. Notice that all the examples include some level of coding, so you really do need to have programming skills to use this option.</p>","blurb":"","authors":[{"authorId":9109,"name":"John Paul Mueller","slug":"john-paul-mueller","description":" <p><b>John Mueller</b> has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). <b>Luca Massaron,</b> a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques. ","hasArticle":false,"_links":{"self":"https://dummies-api.dummies.com/v2/authors/9109"}},{"authorId":9110,"name":"Luca Massaron","slug":"luca-massaron","description":" <p><b>John Mueller</b> has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). <b>Luca Massaron,</b> a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques. ","hasArticle":false,"_links":{"self":"https://dummies-api.dummies.com/v2/authors/9110"}}],"primaryCategoryTaxonomy":{"categoryId":33575,"title":"Machine Learning","slug":"machine-learning","_links":{"self":"https://dummies-api.dummies.com/v2/categories/33575"}},"secondaryCategoryTaxonomy":{"categoryId":0,"title":null,"slug":null,"_links":null},"tertiaryCategoryTaxonomy":{"categoryId":0,"title":null,"slug":null,"_links":null},"trendingArticles":null,"inThisArticle":[],"relatedArticles":{"fromBook":[{"articleId":226836,"title":"10 Ways to Improve Your Machine Learning Models","slug":"10-ways-improve-machine-learning-models","categoryList":["technology","information-technology","ai","machine-learning"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/226836"}},{"articleId":226831,"title":"Performing Classification Tasks for Machine Learning","slug":"performing-classification-tasks-machine-learning","categoryList":["technology","information-technology","ai","machine-learning"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/226831"}},{"articleId":226828,"title":"10 Machine Learning Packages to Master","slug":"10-machine-learning-packages-master","categoryList":["technology","information-technology","ai","machine-learning"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/226828"}},{"articleId":226825,"title":"Using Machine Learning to Analyze Reviews from E-Commerce","slug":"using-machine-learning-analyze-reviews-e-commerce","categoryList":["technology","information-technology","ai","machine-learning"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/226825"}},{"articleId":226822,"title":"Understanding How Machines Read","slug":"understanding-machines-read","categoryList":["technology","information-technology","ai","machine-learning"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/226822"}}],"fromCategory":[{"articleId":284149,"title":"The Machine Learning Process","slug":"the-machine-learning-process","categoryList":["technology","information-technology","ai","machine-learning"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/284149"}},{"articleId":284144,"title":"Machine Learning: Leveraging Decision Trees with Random Forest Ensembles","slug":"machine-learning-leveraging-decision-trees-with-random-forest-ensembles","categoryList":["technology","information-technology","ai","machine-learning"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/284144"}},{"articleId":284139,"title":"What Is Computer Vision?","slug":"what-is-computer-vision","categoryList":["technology","information-technology","ai","machine-learning"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/284139"}},{"articleId":284133,"title":"How to Use Anaconda for Machine Learning","slug":"how-to-use-anaconda-for-machine-learning","categoryList":["technology","information-technology","ai","machine-learning"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/284133"}},{"articleId":284130,"title":"The Relationship between AI and Machine Learning","slug":"the-relationship-between-ai-and-machine-learning","categoryList":["technology","information-technology","ai","machine-learning"],"_links":{"self":"https://dummies-api.dummies.com/v2/articles/284130"}}]},"hasRelatedBookFromSearch":false,"relatedBook":{"bookId":281761,"slug":"machine-learning-for-dummies","isbn":"9781119724018","categoryList":["technology","information-technology","ai","machine-learning"],"amazon":{"default":"https://www.amazon.com/gp/product/1119724015/ref=as_li_tl?ie=UTF8&tag=wiley01-20","ca":"https://www.amazon.ca/gp/product/1119724015/ref=as_li_tl?ie=UTF8&tag=wiley01-20","indigo_ca":"http://www.tkqlhce.com/click-9208661-13710633?url=https://www.chapters.indigo.ca/en-ca/books/product/1119724015-item.html&cjsku=978111945484","gb":"https://www.amazon.co.uk/gp/product/1119724015/ref=as_li_tl?ie=UTF8&tag=wiley01-20","de":"https://www.amazon.de/gp/product/1119724015/ref=as_li_tl?ie=UTF8&tag=wiley01-20"},"image":{"src":"https://www.dummies.com/wp-content/uploads/machine-learning-for-dummies-2nd-edition-cover-9781119724018-203x255.jpg","width":203,"height":255},"title":"Machine Learning For Dummies","testBankPinActivationLink":"","bookOutOfPrint":true,"authorsInfo":"<p><p><b>John Mueller</b> has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). <b>Luca Massaron,</b> a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques. <p><b>John Mueller</b> has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). <b><b data-author-id=\"9110\">Luca Massaron</b>,</b> a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques.</p>","authors":[{"authorId":9109,"name":"John Paul Mueller","slug":"john-paul-mueller","description":" <p><b>John Mueller</b> has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). <b>Luca Massaron,</b> a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques. ","hasArticle":false,"_links":{"self":"https://dummies-api.dummies.com/v2/authors/9109"}},{"authorId":9110,"name":"Luca Massaron","slug":"luca-massaron","description":" <p><b>John Mueller</b> has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). <b>Luca Massaron,</b> a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques. ","hasArticle":false,"_links":{"self":"https://dummies-api.dummies.com/v2/authors/9110"}}],"_links":{"self":"https://dummies-api.dummies.com/v2/books/"}},"collections":[],"articleAds":{"footerAd":"<div class=\"du-ad-region row\" id=\"article_page_adhesion_ad\"><div class=\"du-ad-unit col-md-12\" data-slot-id=\"article_page_adhesion_ad\" data-refreshed=\"false\" \r\n data-target = \"[{"key":"cat","values":["technology","information-technology","ai","machine-learning"]},{"key":"isbn","values":["9781119724018"]}]\" id=\"du-slot-63221a6c43b26\"></div></div>","rightAd":"<div class=\"du-ad-region row\" id=\"article_page_right_ad\"><div class=\"du-ad-unit col-md-12\" data-slot-id=\"article_page_right_ad\" data-refreshed=\"false\" \r\n data-target = \"[{"key":"cat","values":["technology","information-technology","ai","machine-learning"]},{"key":"isbn","values":["9781119724018"]}]\" id=\"du-slot-63221a6c46661\"></div></div>"},"articleType":{"articleType":"Articles","articleList":null,"content":null,"videoInfo":{"videoId":null,"name":null,"accountId":null,"playerId":null,"thumbnailUrl":null,"description":null,"uploadDate":null}},"sponsorship":{"sponsorshipPage":false,"backgroundImage":{"src":null,"width":0,"height":0},"brandingLine":"","brandingLink":"","brandingLogo":{"src":null,"width":0,"height":0},"sponsorAd":"","sponsorEbookTitle":"","sponsorEbookLink":"","sponsorEbookImage":{"src":null,"width":0,"height":0}},"primaryLearningPath":"Advance","lifeExpectancy":null,"lifeExpectancySetFrom":null,"dummiesForKids":"no","sponsoredContent":"no","adInfo":"","adPairKey":[]},"status":"publish","visibility":"public","articleId":226657},"articleLoadedStatus":"success"},"listState":{"list":{},"objectTitle":"","status":"initial","pageType":null,"objectId":null,"page":1,"sortField":"time","sortOrder":1,"categoriesIds":[],"articleTypes":[],"filterData":{},"filterDataLoadedStatus":"initial","pageSize":10},"adsState":{"pageScripts":{"headers":{"timestamp":"2024-03-04T05:50:01+00:00"},"adsId":0,"data":{"scripts":[{"pages":["all"],"location":"header","script":"\r\n<script src=\"https://cdn.optimizely.com/js/10563184655.js\"></script>","enabled":false},{"pages":["all"],"location":"header","script":"\r\n<script>var _comscore = _comscore || [];_comscore.push({ c1: \"2\", c2: \"15097263\" });(function() {var s = document.createElement(\"script\"), el = document.getElementsByTagName(\"script\")[0]; s.async = true;s.src = (document.location.protocol == \"https:\" ? \"https://sb\" : \"http://b\") + \".scorecardresearch.com/beacon.js\";el.parentNode.insertBefore(s, el);})();</script><noscript><img src=\"https://sb.scorecardresearch.com/p?c1=2&c2=15097263&cv=2.0&cj=1\" /></noscript>\r\n","enabled":true},{"pages":["all"],"location":"footer","script":"\r\n<script type='text/javascript'>\r\n(function(){var g=function(e,h,f,g){\r\nthis.get=function(a){for(var a=a+\"=\",c=document.cookie.split(\";\"),b=0,e=c.length;b<e;b++){for(var d=c[b];\" \"==d.charAt(0);)d=d.substring(1,d.length);if(0==d.indexOf(a))return d.substring(a.length,d.length)}return null};\r\nthis.set=function(a,c){var b=\"\",b=new Date;b.setTime(b.getTime()+6048E5);b=\"; expires=\"+b.toGMTString();document.cookie=a+\"=\"+c+b+\"; path=/; \"};\r\nthis.check=function(){var a=this.get(f);if(a)a=a.split(\":\");else if(100!=e)\"v\"==h&&(e=Math.random()>=e/100?0:100),a=[h,e,0],this.set(f,a.join(\":\"));else return!0;var c=a[1];if(100==c)return!0;switch(a[0]){case \"v\":return!1;case \"r\":return c=a[2]%Math.floor(100/c),a[2]++,this.set(f,a.join(\":\")),!c}return!0};\r\nthis.go=function(){if(this.check()){var a=document.createElement(\"script\");a.type=\"text/javascript\";a.src=g;document.body&&document.body.appendChild(a)}};\r\nthis.start=function(){var t=this;\"complete\"!==document.readyState?window.addEventListener?window.addEventListener(\"load\",function(){t.go()},!1):window.attachEvent&&window.attachEvent(\"onload\",function(){t.go()}):t.go()};};\r\ntry{(new g(100,\"r\",\"QSI_S_ZN_5o5yqpvMVjgDOuN\",\"https://zn5o5yqpvmvjgdoun-wiley.siteintercept.qualtrics.com/SIE/?Q_ZID=ZN_5o5yqpvMVjgDOuN\")).start()}catch(i){}})();\r\n</script><div id='ZN_5o5yqpvMVjgDOuN'></div>\r\n","enabled":false},{"pages":["all"],"location":"header","script":"\r\n<script>\r\n (function(h,o,t,j,a,r){\r\n h.hj=h.hj||function(){(h.hj.q=h.hj.q||[]).push(arguments)};\r\n h._hjSettings={hjid:257151,hjsv:6};\r\n a=o.getElementsByTagName('head')[0];\r\n r=o.createElement('script');r.async=1;\r\n r.src=t+h._hjSettings.hjid+j+h._hjSettings.hjsv;\r\n a.appendChild(r);\r\n })(window,document,'https://static.hotjar.com/c/hotjar-','.js?sv=');\r\n</script>","enabled":false},{"pages":["article"],"location":"header","script":" <script src=\"//get.s-onetag.com/bffe21a1-6bb8-4928-9449-7beadb468dae/tag.min.js\" async defer></script>","enabled":true},{"pages":["homepage"],"location":"header","script":"<meta name=\"facebook-domain-verification\" content=\"irk8y0irxf718trg3uwwuexg6xpva0\" />","enabled":true},{"pages":["homepage","article","category","search"],"location":"footer","script":"\r\n<noscript>\r\n<img height=\"1\" width=\"1\" src=\"https://www.facebook.com/tr?id=256338321977984&ev=PageView&noscript=1\"/>\r\n</noscript>\r\n","enabled":true}]}},"pageScriptsLoadedStatus":"success"},"navigationState":{"navigationCollections":[{"collectionId":287568,"title":"BYOB (Be Your Own Boss)","hasSubCategories":false,"url":"/collection/for-the-entry-level-entrepreneur-287568"},{"collectionId":293237,"title":"Be a Rad Dad","hasSubCategories":false,"url":"/collection/be-the-best-dad-293237"},{"collectionId":295890,"title":"Career Shifting","hasSubCategories":false,"url":"/collection/career-shifting-295890"},{"collectionId":294090,"title":"Contemplating the Cosmos","hasSubCategories":false,"url":"/collection/theres-something-about-space-294090"},{"collectionId":287563,"title":"For Those Seeking Peace of Mind","hasSubCategories":false,"url":"/collection/for-those-seeking-peace-of-mind-287563"},{"collectionId":287570,"title":"For the Aspiring Aficionado","hasSubCategories":false,"url":"/collection/for-the-bougielicious-287570"},{"collectionId":291903,"title":"For the Budding Cannabis Enthusiast","hasSubCategories":false,"url":"/collection/for-the-budding-cannabis-enthusiast-291903"},{"collectionId":299891,"title":"For the College Bound","hasSubCategories":false,"url":"/collection/for-the-college-bound-299891"},{"collectionId":291934,"title":"For the Exam-Season Crammer","hasSubCategories":false,"url":"/collection/for-the-exam-season-crammer-291934"},{"collectionId":301547,"title":"For the Game Day Prepper","hasSubCategories":false,"url":"/collection/big-game-day-prep-made-easy-301547"}],"navigationCollectionsLoadedStatus":"success","navigationCategories":{"books":{"0":{"data":[{"categoryId":33512,"title":"Technology","hasSubCategories":true,"url":"/category/books/technology-33512"},{"categoryId":33662,"title":"Academics & The Arts","hasSubCategories":true,"url":"/category/books/academics-the-arts-33662"},{"categoryId":33809,"title":"Home, Auto, & Hobbies","hasSubCategories":true,"url":"/category/books/home-auto-hobbies-33809"},{"categoryId":34038,"title":"Body, Mind, & Spirit","hasSubCategories":true,"url":"/category/books/body-mind-spirit-34038"},{"categoryId":34224,"title":"Business, Careers, & Money","hasSubCategories":true,"url":"/category/books/business-careers-money-34224"}],"breadcrumbs":[],"categoryTitle":"Level 0 Category","mainCategoryUrl":"/category/books/level-0-category-0"}},"articles":{"0":{"data":[{"categoryId":33512,"title":"Technology","hasSubCategories":true,"url":"/category/articles/technology-33512"},{"categoryId":33662,"title":"Academics & The Arts","hasSubCategories":true,"url":"/category/articles/academics-the-arts-33662"},{"categoryId":33809,"title":"Home, Auto, & Hobbies","hasSubCategories":true,"url":"/category/articles/home-auto-hobbies-33809"},{"categoryId":34038,"title":"Body, Mind, & Spirit","hasSubCategories":true,"url":"/category/articles/body-mind-spirit-34038"},{"categoryId":34224,"title":"Business, Careers, & Money","hasSubCategories":true,"url":"/category/articles/business-careers-money-34224"}],"breadcrumbs":[],"categoryTitle":"Level 0 Category","mainCategoryUrl":"/category/articles/level-0-category-0"}}},"navigationCategoriesLoadedStatus":"success"},"searchState":{"searchList":[],"searchStatus":"initial","relatedArticlesList":[],"relatedArticlesStatus":"initial"},"routeState":{"name":"Article4","path":"/article/technology/information-technology/ai/machine-learning/machine-learning-using-spark-deal-massive-data-226657/","hash":"","query":{},"params":{"category1":"technology","category2":"information-technology","category3":"ai","category4":"machine-learning","article":"machine-learning-using-spark-deal-massive-data-226657"},"fullPath":"/article/technology/information-technology/ai/machine-learning/machine-learning-using-spark-deal-massive-data-226657/","meta":{"routeType":"article","breadcrumbInfo":{"suffix":"Articles","baseRoute":"/category/articles"},"prerenderWithAsyncData":true},"from":{"name":null,"path":"/","hash":"","query":{},"params":{},"fullPath":"/","meta":{}}},"dropsState":{"submitEmailResponse":false,"status":"initial"},"profileState":{"auth":{},"userOptions":{},"status":"success"}}

Machine Learning: Using Spark to Deal with Massive Data

By: John Paul Mueller and Luca Massaron and

Updated: 10-04-2016

From The Book: Machine Learning For Dummies

Machine Learning For Dummies

Book image

Explore Book Buy On Amazon

The real world of machine learning relies heavily on huge datasets. Imagine trying to wend your way through the enormous data generated just by the sales made by Amazon.com every day. The point is that you need products that help you manage these huge datasets in a manner that makes them easier to work with and faster to process. This is where Spark comes in. It relies on a clustering technique.

The emphasis of Spark is speed. When you visit the site, you’re greeted by statistics, such as Spark’s capability to process data a hundred times faster than other products, such as Hadoop MapReduce (see the tutorial) in memory. However, Spark also offers flexibility in that it works with Java, Scala, Python, and R, and it runs on any platform that supports Apache. You can even run Spark in the cloud if you want.

Spark works with huge datasets, which means that you need to know programming languages, database management, and other developer techniques to use it. This means that the Spark learning curve can be quite high, and you need to provide time for developers on your team to learn it. The simple examples at Spark’s website give you some ideas of just what is involved. Notice that all the examples include some level of coding, so you really do need to have programming skills to use this option.

About This Article

This article is from the book:

Machine Learning For Dummies ,

About the book authors:

John Mueller has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). Luca Massaron, a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques.

John Mueller has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). Luca Massaron, a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques.

This article can be found in the category:

Machine Learning ,