Algorithms For Dummies
Book image
Explore Book Buy On Amazon
Scientists began fighting against impressive amounts of data for years before anyone coined the term big data. At this point, the Internet didn’t produce the vast sums for data that it does today. It’s useful to remember that big data is not just simply a fad created by software and hardware vendors but has a basis in many of the following fields:
  • Astronomy: Consider the data received from spacecraft on a mission (such as Voyager or Galileo) and all the data received from radio telescopes, which are specialized antennas used to receive radio waves from astronomical bodies. A common example is the Search for Extraterrestrial Intelligence (SETI) project, which looks for extraterrestrial signals by observing radio frequencies arriving from space. The amount of data received and the computer power used to analyze a portion of the sky for a single hour is impressive . If aliens are out there, it’s very hard to spot them. (The movie Contact explores what could happen should humans actually intercept a signal.)
  • Meteorology: Think about trying to predict weather for the near term given the large number of required measures, such as temperature, atmospheric pressure, humidity, winds, and precipitation at different times, locations, and altitudes. Weather forecasting is really one of the first problems in big data and quite a relevant one. According to Weather Analytics, a company that provides climate data, more than 33 percent of Worldwide Gross Domestic Product (GDP) is determined by how weather conditions affect agriculture, fishing, tourism, and transportation, just to name a few. Dating back to the 1950s, the first supercomputers of the time were used to crunch as much data as possible because, in meteorology, the more data, the more accurate the forecast. That’s the reason everyone is amassing more storage and processing capacity, as you can read in this story regarding the Korean Meteorological Association for weather forecasting and studying climate change.
  • Physics: Consider the large amounts of data produced by experiments using particle accelerators in an attempt to determine the structure of matter, space, and time. For example, the Large Hadron Collider, the largest particle accelerator ever created, produces 15PB (petabytes) of data every year as a result of particle collisions.
  • Genomics: Sequencing a single DNA strand, which means determining the precise order of the many combinations of the four bases — adenine, guanine, cytosine, and thymine — that constitute the structure of the molecule, requires quite a lot of data. For instance, a single chromosome, a structure containing the DNA in the cell, may require from 50MB to 300MB. A human being has 46 chromosomes, and the DNA data for just one person consumes an entire DVD. Just imagine the massive storage required to document the DNA data of a large number of people or to sequence other life forms on earth.
  • Oceanography: Because of the many sensors placed in the oceans to measure temperature, currents, and, using hydrophones, even sounds for acoustic monitoring for scientific purposes (discovering about fish, whales, and plankton) and military defense purposes (finding sneaky submarines from other countries). You can have a sneak peek at this old surveillance problem, which is turning more complex and digital.
  • Satellites: Recording images from the entire globe and sending them back to earth in order to monitor the Earth’s surface and its atmosphere isn’t a new business (TIROS 1, the first satellite to send back images and data, dates back to 1960). Over the years, however, the world has launched more than 1,400 active satellites that provide earth observation. The amount of data arriving on earth is astonishing and serves both military (surveillance) and civilian purposes, such as tracking economic development, monitoring agriculture, and monitoring changes and risks. A single European Space Agency’s satellite, Sentinel 1A, generates 5PB of data during two years of operation.
Accompanying these older data trends, new amounts of data are now generated or carried about by the Internet, creating new issues and requiring solutions in terms of both data storage and algorithms for processing:
  • As reported by the National Security Agency (NSA), the amount of information flowing through the Internet every day from all over the world amounted to 1,826PB of data in 2013, and 1.6 percent of it consisted of e-mails and telephone calls. To assure national security, the NSA must verify the content of at least 0.025 percent of all emails and phone calls (looking for key words that could signal something like a terrorist plot). That still amounts to 25PB per year, which equates to 37,500 CD-ROMs every year of data stored and analyzed (and that’s growing). You can read the full story.
  • The Internet of Things (IoT) is becoming a reality. You may have heard the term many times in the last 15 years, but now the growth of the stuff connected to the Internet is going to explode. The idea is to put sensors and transmitters on everything and use the data to both better control what happens in the world and to make objects smarter. Transmitting devices are getting tinier, cheaper and less power demanding; some are already so small that they can be put everywhere. (Just look at the ant-sized radio developed by Stanford engineers.) Experts estimate that by 2020, there will be six times as many connected things on earth as there will be people, but many research companies and think tanks are already revisiting those figures.

About This Article

This article is from the book:

About the book authors:

John Paul Mueller has produced 102 books and more than 600 articles to date on topics ranging from networking to machine learning. Luca Massaron is a data scientist specializing in organizing and interpreting big data and transforming it into smart data by means of the simplest and most effective data mining and machine learning techniques.

This article can be found in the category: