10 Hadoop Resources Worthy of a Bookmark - dummies

10 Hadoop Resources Worthy of a Bookmark

By Dirk deRoos

Following are ten terrific Hadoop resources that are worthy of creating a bookmark in your browser. These resources help you create a lifelong learning plan for Hadoop.

Central nervous system: Apache.org

The Apache Software Foundation (ASF) is the central community for open source software projects. Not just any project can be an Apache project — many consensus-driven processes convert a piece of software from its initial designs and beta code (its incubator status) to full-fledged, generally available software.

The ASF isn’t just where projects like Hadoop are managed — it’s where they “live and breathe.” Today, there are hundreds of Apache projects. With this in mind, you should bookmark the Apache Hadoop page as one of your mainstay learning resources. This site is important because you can access the source code there.

Tweet this

Twitter isn’t the place to learn Hadoop per se — after all, you can’t easily master MapReduce programming in lessons that span only 140 characters. Be that as it may, quite a number of big data gurus are on Twitter, and they express opinions and point to resources that can make you a smarter Hadoop user.

A number of top-influencer lists in the Twitter landscape cover Hadoop and big data, and that’s the best way to find these Hadoop personalities and add them to your Twitter lists.

Hortonworks University

Hortonworks University provides Hadoop training and certifications. The site offers Hadoop courses built for either administrator or developer practitioners with the option of a rigorous certification program. Hortonworks employs some of the deepest and most noted Hadoop experts in the world, so you’re assured of quality expertise behind the courseware.

Cloudera University

Cloudera University is similar in its business model and charter to Hortonworks University, providing a number of learning avenues that run the gamut from traditional text to video. Cloudera is a prominent fixture in the Hadoop world. (Doug Cutting, the “father” of Hadoop is its chief architect.) The site offers an extensive set of courses, and more, which are based on the Cloudera Distribution for Hadoop (CDH).



BigDataUniversity.com (the case doesn’t matter when you enter the URL in your browser) is a fantastic resource for learning about — you guessed it — big data. Of course, big data isn’t just Hadoop, so you’ll find more than Hadoop resources at this site. This university has over 100,000 students enrolled and learning about Hadoop and big data every day.


planet Big Data Blog Aggregator

It’s great when the name of a site tells you exactly what it does — like planet Big Data Blog Aggregator: It’s an aggregator of blogs about big data, Hadoop, and other related topics on the planet (well, on Planet Earth anyway).

Both big names and no-names show up on the site, but that’s helpful: Though there’s undoubtedly commitment to Hadoop by Cloudera, Hortonworks, IBM, and others, it’s often refreshing and valuable to get exposure to the thoughts and opinions of grass roots, non-affiliated practitioners by communities not tied to a specific vendor in your learning roadmap.

Quora’s Apache Hadoop forum

The Quora Apache Hadoop forum is the cornerstone for anyone looking to find out more about Hadoop, or about big data in general, for that matter.

As in any forum, the range of questions and answers you can find at this site is dizzying, but they all lead you to what you’re looking for: knowledge. The site has linkages to Hadoop and to its individual components — for example, it has specific forums for MapReduce, HDFS, Pig, HBase, and more.

The site also has associated Hadoop forums; for example, Cloudera and Hortonworks have specific discussion groups for their distributions — a testament to how popular this forum is.

The IBM Big Data Hub

The IBM Big Data Hub is an excellent place to learn about Hadoop and its ecosystem. Despite being owned and operated by IBM, this site’s content isn’t always linked with IBM products.

The IBM Big Data Hub provides any visitor with enough knowledge to quench anyone’s thirst for big data. You’ll find all sorts of blogs, videos, analysts’ articles, use cases, infographics, presentations, and more. It’s truly a treasure trove of big data resources.

Conferences not to be missed

There are many Hadoop conferences, and even more big data conferences — the Hadoop Summit and Strata Hadoop World as the quintessential conferences not to be missed. Typically, a distribution vendor co-sponsors these conferences. For example, Yahoo! and Hortonworks sponsor the Hadoop Summit, and Cloudera is the co-sponsor of Strata Hadoop World.

Both Strata Hadoop World and the Hadoop Summit are the gathering places of the brightest Hadoop minds in the business; these conferences attract a wide array of Hadoop-interested professionals, including decision makers, architects, developers, analysts, and more.

The Google papers that started it all

What is now known as Hadoop has its genesis in a number of papers written by Google employees who were focused on the problem of indexing the Web.

While the Apache Nutch project (an open source technology for crawling the Web) was turning its focus on scaling outward in order to index higher volumes of web data, Google published a paper, “The Google File System” (October 2003), which greatly influenced Doug Cutting and his Nutch co-founder, Mike Cafarella. Shortly after, Google released its paper “MapReduce: Simplified Data Processing on Large Clusters” (December 2004).

Together, the concept of a distributed file system and a large-scale parallel processing framework were taken by Cutting and Cafarella to develop Apache Hadoop. Of course, Cutting commercialized this work while at Yahoo!, and the rest, as they say, is history.