Big Data For Dummies book cover

Big Data For Dummies

By: Judith S. Hurwitz and Alan Nugent Published: 04-15-2013

Find the right big data solution for your business or organization

Big data management is one of the major challenges facing business, industry, and not-for-profit organizations. Data sets such as customer transactions for a mega-retailer, weather patterns monitored by meteorologists, or social network activity can quickly outpace the capacity of traditional data management tools. If you need to develop or manage big data solutions, you'll appreciate how these four experts define, explain, and guide you through this new and often confusing concept. You'll learn what it is, why it matters, and how to choose and implement solutions that work.

  • Effectively managing big data is an issue of growing importance to businesses, not-for-profit organizations, government, and IT professionals
  • Authors are experts in information management, big data, and a variety of solutions
  • Explains big data in detail and discusses how to select and implement a solution, security concerns to consider, data storage and presentation issues, analytics, and much more
  • Provides essential information in a no-nonsense, easy-to-understand style that is empowering

Big Data For Dummies cuts through the confusion and helps you take charge of big data solutions for your organization.

Articles From Big Data For Dummies

page 1
page 2
page 3
page 4
page 5
page 6
page 7
page 8
page 9
page 10
page 11
page 12
116 results
116 results
Big Data For Dummies Cheat Sheet

Cheat Sheet / Updated 02-09-2022

To stay competitive today, companies must find practical ways to deal with big data — that is, to learn new ways to capture and analyze growing amounts of information about customers, products, and services. Data is becoming increasingly complex in structured and unstructured ways. New sources of data come from machines, such as sensors; social business sites; and website interaction, such as click-stream data. Meeting these changing business requirements demands that the right information be available at the right time.

View Cheat Sheet
Integrate Big Data with the Traditional Data Warehouse

Article / Updated 03-26-2016

While the worlds of big data and the traditional data warehouse will intersect, they are unlikely to merge anytime soon. Think of a data warehouse as a system of record for business intelligence, much like a customer relationship management (CRM) or accounting system. These systems are highly structured and optimized for specific purposes. In addition, these systems of record tend to be highly centralized. The diagram shows a typical approach to data flows with warehouses and marts: Organizations will inevitably continue to use data warehouses to manage the type of structured and operational data that characterizes systems of record. These data warehouses will still provide business analysts with the ability to analyze key data, trends, and so on. However, the advent of big data is both challenging the role of the data warehouse and providing a complementary approach. Think of the relationship between the data warehouse and big data as merging to become a hybrid structure. In this hybrid model, the highly structured optimized operational data remains in the tightly controlled data warehouse, while the data that is highly distributed and subject to change in real time is controlled by a Hadoop-based (or similar NoSQL) infrastructure. It's inevitable that operational and structured data will have to interact in the world of big data, where the information sources have not (necessarily) been cleansed or profiled. Increasingly, organizations are understanding that they have a business requirement to be able to combine traditional data warehouses with their historical business data sources with less structured and vetted big data sources. A hybrid approach supporting traditional and big data sources can help to accomplish these business goals.

View Article
How to Analyze Big Data to Get Results

Article / Updated 03-26-2016

Big data is most useful if you can do something with it, but how do you analyze it? Companies like Amazon and Google are masters at analyzing big data. And they use the resulting knowledge to gain a competitive advantage. Just think about Amazon's recommendation engine. The company takes all your buying history together with what it knows about you, your buying patterns, and the buying patterns of people like you to come up with some pretty good suggestions. It's a marketing machine, and its big data analytics capabilities have made it extremely successful. The ability to analyze big data provides unique opportunities for your organization as well. You'll be able to expand the kind of analysis you can do. Instead of being limited to sampling large data sets, you can now use much more detailed and complete data to do your analysis. However, analyzing big data can also be challenging. Changing algorithms and technology, even for basic data analysis, often has to be addressed with big data. The first question that you need to ask yourself before you dive into big data analysis is what problem are you trying to solve? You may not even be sure of what you are looking for. You know you have lots of data that you think you can get valuable insight from. And certainly, patterns can emerge from that data before you understand why they are there. If you think about it though, you're sure to have an idea of what you're interested in. For instance, are you interested in predicting customer behavior to prevent churn? Do you want to analyze the driving patterns of your customers for insurance premium purposes? Are you interested in looking at your system log data to ultimately predict when problems might occur? The kind of high-level problem is going to drive the analytics you decide to use. Alternately, if you're not exactly sure of the business problem you're trying to solve, maybe you need to look at areas in your business that need improvement. Even an analytics-driven strategy — targeted at the right area — can provide useful results with big data. When it comes to analytics, you might consider a range of possible kinds, which are briefly outlined in the table. Analysis Type Description Basic analytics for insight Slicing and dicing of data, reporting, simple visualizations, basic monitoring. Advanced analytics for insight More complex analysis such as predictive modeling and other pattern-matching techniques. Operationalized analytics Analytics become part of the business process. Monetized analytics Analytics are utilized to directly drive revenue.

View Article
Best Practices for Big Data Integration

Article / Updated 03-26-2016

Many companies are exploring big data problems and coming up with some innovative solutions. Now is the time to pay attention to some best practices, or basic principles, that will serve you well as you begin your big data journey. In reality, big data integration fits into the overall process of integration of data across your company. Therefore, you can't simply toss aside everything you have learned from data integration of traditional data sources. The same rules apply whether you are thinking about traditional data management or big data management. Keep these key issues at the top of your priority list for big data integration: Keep data quality in perspective. Your emphasis on data quality depends on the stage of your big data analysis. Don't expect to be able to control data quality when you do your initial analysis on huge volumes of data. However, when you narrow down your big data to identify a subset that is most meaningful to your organization, this is when you need to focus on data quality. Ultimately, data quality becomes important if you want your results to be understood n context with your historical data. As your company relies more and more on analytics as a key planning tool, data quality can mean the difference between success and failure. Consider real-time data requirements. Big data will bring streaming data to the forefront. Therefore, you will have to have a clear understanding of how you integrate data in motion into your environment for predictable analysis. Don't create new silos of information. While so much of the emphasis around big data is focused on Hadoop and other unstructured and semi-structured sources, remember that you have to manage this data in context with the business. You will therefore need to integrate these sources with your line of business data and your data warehouse.

View Article
Big Data Planning Stages

Article / Updated 03-26-2016

Four stages are part of the planning process that applies to big data. As more businesses begin to use the cloud as a way to deploy new and innovative services to customers, the role of data analysis will explode. Therefore, consider another part of your planning process and add three more stages to your data cycle. Stage 1: Planning with data: The only way to make sure that business leaders are taking a balanced perspective on all the elements of the business is to have a clear understanding of how data sources are related. The business needs a road map for determining what data is needed to plan for new strategies and new directions. Stage 2: Doing the analysis: Executing on big data analysis requires learning a set of new tools and new skills. Many organizations will need to hire some big data scientists who can understand how to take this massive amount of disparate data and begin to understand how all the data elements relate in the context of the business problem or opportunity. Stage 3: Checking the results: Make sure you aren’t relying on data sources that will take you in the wrong direction. Many companies will use third-party data sources and may not take the time to vet the quality of the data, but you have to make sure that you are on a strong foundation. Stage 4: Acting on the plan: Each time a business initiates a new strategy, it is critical to constantly create a big data business evaluation cycle. This approach of acting based on results of big data analytics and then testing the results of executing business strategy is the key to success. Stage 5: Monitoring in real time: Big data analytics enables you to monitor data in near real time proactively. This can have a profound impact on your business. If you are a pharmaceutical company conducting a clinical trial, you may be able to adjust or cancel a trial to avoid a lawsuit. Stage 6: Adjusting the impact: When your company has the tools to monitor continuously, it is possible to adjust processes and strategy based on data analytics. Being able to monitor quickly means that a process can be changed earlier and result in better overall quality. Stage 7: Enabling experimentation: Combining experimentation with real-time monitoring and rapid adjustment can transform a business strategy. You have less risk with experimentation because you can change directions and outcomes more easily if you are armed with the right data. The greatest challenge for the business is to be able to look into the future and anticipate what might change and why. Companies want to be able to make informed decisions in a faster and more efficient manner. The business wants to apply that knowledge to take action that can change business outcomes. Leaders also need to understand the nuances of the business impacts that are across product lines and their partner ecosystem. The best businesses take a holistic approach to data.

View Article
Ten Hot Big Data Trends

Article / Updated 03-26-2016

As you enter the world of big data, you'll need to absorb many new types of database and data-management technologies. Here are the top-ten big data trends: Hadoop is becoming the underpinning for distributed big data management. Hadoop is a distributed file system that can be used in conjunction with MapReduce to process and analyze massive amounts of data, enabling the big data trend. Hadoop will be tightly integrated into data warehousing technologies so that structured and unstructured data can be integrated more effectively. Big data makes it possible to leverage data from sensors to change business outcomes. More and more businesses are using highly sophisticated sensors on the equipment that runs their operations. New innovations in big data technology are making it possible to analyze all this data to get advanced notification of problems that can be fixed to protect the business. Big data can help a business initiative become a real-time action to increase revenue. Companies in markets such as retail are using real-time streaming data analytics to keep track of customer actions and offer incentives to increase revenue per customer. Big data can be integrated with historical data warehouses to transform planning. Big data can provide a company with a better understanding of massive amounts of data about their business. This information about the current state of the business can be combined with historical data to get a full view of the context for business change. Big data can change the way diseases are managed by adding predictive analytics. Increasingly, healthcare practitioners are looking to big data solutions to gain insights into disease by compare symptoms and test results to databases of results from hundreds of thousands of other cases. This allows practitioners to more quickly predict outcomes and save lives. Cloud computing will transform the way that data will be managed in the future. Cloud computing is invaluable as a tool to support the expansion of big data. Increasingly, cloud services that are optimized for data will mean that many more services and delivery models will make big data more practical for companies of all sizes. Security and governance will be the difference between success and failure of businesses leveraging big data. Big data can be a huge benefit, but it isn't risk-free. Companies will discover that if they are not careful, it is possible to expose private information through big data analysis. Companies need to balance the need to analyze results with best practices for security and governance. Veracity, or truthfulness, of big data will become the most important issue for the coming year. Many companies can get carried away with the ability to analyze massive amounts of data and get back compelling results that predict business outcomes. Therefore, companies will find that the truthfulness of the data must become a top priority or decision making will suffer. As big data moves out of the experimental stage, more packaged offerings will be developed. Most big data projects initiated over the past few years have been experimental. Companies are cautiously working with new tools and technology. Now big data is about to enter the mainstream. Lots of packaged big data offerings will flood the market. Use cases and new innovative ways to apply big data will explode. Early successes with big data in different industries such as manufacturing, retail, and healthcare will lead to many more industries looking at ways to leverage massive amounts of data to transform their industries.

View Article
Explore the Big Data Stack

Article / Updated 03-26-2016

To understand big data, it helps to see how it stacks up — that is, to lay out the components of the architecture. A big data management architecture must include a variety of services that enable companies to make use of myriad data sources in a fast and effective manner. Here's a closer look at what's in the image and the relationship between the components: Interfaces and feeds: On either side of the diagram are indications of interfaces and feeds into and out of both internally managed data and data feeds from external sources. To understand how big data works in the real world, start by understanding this necessity. What makes big data big is that it relies on picking up lots of data from lots of sources. Therefore, open application programming interfaces (APIs) will be core to any big data architecture. In addition, keep in mind that interfaces exist at every level and between every layer of the stack. Without integration services, big data can't happen. Redundant physical infrastructure: The supporting physical infrastructure is fundamental to the operation and scalability of a big data architecture. Without the availability of robust physical infrastructures, big data would probably not have emerged as such an important trend. To support an unanticipated or unpredictable volume of data, a physical infrastructure for big data has to be different than that for traditional data. The physical infrastructure is based on a distributed computing model. This means that data may be physically stored in many different locations and can be linked together through networks, the use of a distributed file system, and various big data analytic tools and applications. Security infrastructure: The more important big data analysis becomes to companies, the more important it will be to secure that data. For example, if you are a healthcare company, you will probably want to use big data applications to determine changes in demographics or shifts in patient needs. This data about your constituents needs to be protected both to meet compliance requirements and to protect the patients' privacy. You will need to take into account who is allowed to see the data and under what circumstances they are allowed to do so. You will need to be able to verify the identity of users as well as protect the identity of patients. Operational data sources: When you think about big data, understand that you have to incorporate all the data sources that will give you a complete picture of your business and see how the data impacts the way you operate your business. Traditionally, an operational data source consisted of highly structured data managed by the line of business in a relational database. But as the world changes, it is important to understand that operational data now has to encompass a broader set of data sources.

View Article
Defining Big Data: Volume, Velocity, and Variety

Article / Updated 03-26-2016

Big data enables organizations to store, manage, and manipulate vast amounts of disparate data at the right speed and at the right time. To gain the right insights, big data is typically broken down by three characteristics: Volume: How much data Velocity: How fast data is processed Variety: The various types of data While it is convenient to simplify big data into the three Vs, it can be misleading and overly simplistic. For example, you may be managing a relatively small amount of very disparate, complex data or you may be processing a huge volume of very simple data. That simple data may be all structured or all unstructured. Even more important is the fourth V, veracity. How accurate is that data in predicting business value? Do the results of a big data analysis actually make sense? Data must be able to be verified based on both accuracy and context. An innovative business may want to be able to analyze massive amounts of data in real time to quickly assess the value of that customer and the potential to provide additional offers to that customer. It is necessary to identify the right amount and types of data that can be analyzed in real time to impact business outcomes. Big data incorporates all the varieties of data, including structured data and unstructured data from e-mails, social media, text streams, and so on. This kind of data management requires companies to leverage both their structured and unstructured data.

View Article
Understanding Unstructured Data

Article / Updated 03-26-2016

Unstructured data is different than structured data in that its structure is unpredictable. Examples of unstructured data include documents, e-mails, blogs, digital images, videos, and satellite imagery. It also includes some data generated by machines or sensors. In fact, unstructured data accounts for the majority of data that's on your company's premises as well as external to your company in online private and public sources such as Twitter and Facebook. In the past, most companies weren't able to either capture or store this vast amount of data. It was simply too expensive or too overwhelming. Even if companies were able to capture the data, they didn't have the tools to easily analyze the data and use the results to make decisions. Very few tools could make sense of these vast amounts of data. The tools that did exist were complex to use and did not produce results in a reasonable time frame. In the end, those who really wanted to go to the enormous effort of analyzing this data were forced to work with snapshots of data. This has the undesirable effect of missing important events because they were not in a particular snapshot. One approach that is becoming increasingly valued as a way to gain business value from unstructured data is text analytics, the process of analyzing unstructured text, extracting relevant information, and transforming it into structured information that can then be leveraged in various ways. The analysis and extraction processes take advantage of techniques that originated in computational linguistics, statistics, and other computer science disciplines.

View Article
The Role of Traditional Operational Data in the Big Data Environment

Article / Updated 03-26-2016

Knowing what data is stored and where it is stored are critical building blocks in your big data implementation. It's unlikely that you'll use RDBMSs for the core of the implementation, but it's very likely that you'll need to rely on the data stored in RDBMSs to create the highest level of value to the business with big data. Most large and small companies probably store most of their important operational information in relational database management systems (RDBMSs), which are built on one or more relations and represented by tables. These tables are defined by the way the data is stored.The data is stored in database objects called tables — organized in rows and columns. RDBMSs follow a consistent approach in the way that data is stored and retrieved. To get the most business value from your real-time analysis of unstructured data, you need to understand that data in context with your historical data on customers, products, transactions, and operations. In other words, you will need to integrate your unstructured data with your traditional operational data.

View Article
page 1
page 2
page 3
page 4
page 5
page 6
page 7
page 8
page 9
page 10
page 11
page 12