Data Science For Dummies, 3rd Edition
Book image
Explore Book Buy On Amazon
Traditionally, big data is the term for data that has incredible volume, velocity, and variety. Traditional database technologies aren't capable of handling big data — more innovative data-engineered solutions are required. To evaluate your project for whether it qualifies as a big data project, consider the following criteria:
  • Volume: Between 1 terabytes/year and10 petabytes/year

  • Velocity: Between 30 kilobytes/second and 30 gigabytes/second

  • Variety: Combined sources of unstructured, semi-structured, and structured data

Data science and data engineering are not the same

Hiring managers tend to confuse the roles of data scientist and data engineer. While it is possible to find someone who does a little of both, each field is incredibly complex. It’s unlikely that you’ll find someone with robust skills and experience in both areas. For this reason, it’s important to be able to identify what type of specialist is most appropriate for helping you achieve your specific goals. The descriptions below should help you do that.
  • Data scientists: Data scientists use coding, quantitative methods (mathematical, statistical, and machine learning), and highly specialized expertise in their study area to derive solutions to complex business and scientific problems.

  • Data engineers: Data engineers use skills in computer science and software engineering to design systems for, and solve problems with, handling and manipulating big data sets.

Data science and business intelligence are also not the same

Business-centric data scientists and business analysts who do business intelligence are like cousins. Both types of specialist use data to achieve the same business goals, but their approaches, technologies, and functions are different. The descriptions below spell out the differences between the two roles.
  • Business intelligence (BI): BI solutions are generally built using datasets generated internally — from within an organization rather than from without, in other words. Common tools and technologies include online analytical processing, extract transform and load, and data warehousing. Although BI sometimes involves forward-looking methods like forecasting, these methods are based on simple mathematical inferences from historical or current data.

  • Business-centric data science: Business-centric data science solutions are built using datasets that are both internal and external to an organization. Common tools, technologies, and skillsets include cloud-based analytics platforms, statistical and mathematical programming, machine learning, data analysis using Python and R, and advanced data visualization. Business-centric data scientists use advanced mathematical or statistical methods to analyze and generate predictions from vast amounts of business data.

About This Article

This article is from the book:

About the book author:

Lillian Pierson is the CEO of Data-Mania, where she supports data professionals to evolve into world-class leaders and entrepreneurs. She has trained more than 1 million individuals on the topics of AI and data science. Pierson has assisted global leaders in IT, major governmental and non-governmental entities, media corporations, and not-for-profit technology groups.

This article can be found in the category: