Data Science Programming All-in-One For Dummies
Book image
Explore Book Buy On Amazon
The data you use for data science programming initiatives comes from a number of sources. The most common data source is from information entered by humans at some point. Even when a system collects shopping-site data automatically, humans initially enter the information.

data collection ©Shutterstock/ra2 studio

A human clicks various items, adds them to a shopping cart, specifies characteristics (such as size) and quantity, and then checks out. Later, after the sale, the human gives the shopping experience, product, and delivery method a rating and makes comments. In short, every shopping experience becomes a data-collection exercise as well.

You can’t sit by each shopper’s side and provide instructions on how to enter data consistently. Consequently, the data you receive is inconsistent and nearly unusable at times. By reviewing forms of successful online stores, however, you can see how to provide a virtual self to assist the shopper in making consistent entries.

The forms you provide for entering information have a great deal to do with the data you collect. When a form contains fewer handwritten entries and more check boxes, it tends to provide a better experience for the customer and a more consistent data source for you.

Many data sources today rely on input gathered from human sources. Humans also provide manual input. You call or go into an office somewhere to make an appointment with a professional. A receptionist then gathers information from you that’s needed for the appointment. This manually collected data eventually ends up in a dataset somewhere for analysis purposes.

By providing training on proper data entry techniques, you can improve the consistency of input that the receptionist provides. In addition, you’re unlikely to have just one receptionist providing input, so training can also help the entire group of receptionists provide consistent input despite individual differences in perspective.

Some forms of regulated data entry of this sort have become so complex today that the people doing it actually require a formal education, such as medical data entry personnel. The point is that the industry, as a whole, is generally moving toward trained data entry people, so your organization should make use of this trend to improve the consistency of the data you receive.

Data is also collected from sensors, and these sensors can take almost any form. For example, many organizations base physical data collection, such as the number of people viewing an object in a window, on cellphone detection. Facial recognition software could potentially detect repeat customers.

However, sensors can create datasets from almost anything. The weather service relies on datasets created by sensors that monitor environmental conditions such as rain, temperature, humidity, cloud cover, and so on.

Robotic monitoring systems help correct small flaws in robotic operation by constantly analyzing data collected by monitoring sensors. A sensor, combined with a small AI application, could tell you when your dinner is cooked to perfection tonight. The sensor collects data, but the AI application uses rules to help define when the food is properly cooked.

Of the forms of data collection, the data provided by sensors is the easiest to make consistent. However, sensor data is often inconsistent because vendors keep adding functionality as a means of differentiation.

The solution to this problem is better data standards so that vendors must adhere to certain specifics when creating data. Standards efforts are ongoing, but it pays to ensure that the sensors you use to collect data all rely on the same standards to ensure that you obtain consistent inpu

About This Article

This article is from the book:

About the book authors:

John Mueller has published more than 100 books on technology, data, and programming. John has a website and blog where he writes articles on technology and offers assistance alongside his published books.

Luca Massaron is a data scientist specializing in insurance and finance. A Google Developer Expert in machine learning, he has been involved in quantitative analysis and algorithms since 2000.

John Mueller has published more than 100 books on technology, data, and programming. John has a website and blog where he writes articles on technology and offers assistance alongside his published books.

Luca Massaron is a data scientist specializing in insurance and finance. A Google Developer Expert in machine learning, he has been involved in quantitative analysis and algorithms since 2000.

This article can be found in the category: