Machine Learning: Creating Your Own Features in Data

By John Paul Mueller, Luca Massaron

Sometimes the raw data you obtain from various sources won’t have the features needed to perform machine learning tasks. When this happens, you must create your own features in order to obtain the desired result. Creating a feature doesn’t mean creating data from thin air. You create new features from existing data.

Understanding the need to create features

One great limitation of machine learning algorithms is that it can be impossible to guess a formula that could link your response to the features you’re using. Sometimes this inability to guess happens because you can’t map the response using the information you have available (meaning that you don’t have the right information). In other cases, the information you provided doesn’t help the algorithm learn properly.

For instance, if you’re modeling the price of real estate properties, the surface of the land is quite predictive because larger properties tend to cost more. But if instead of the surface, you provide your machine learning algorithm with the length of the sides of the land (the latitude and longitude coordinates of its corners), your algorithm may not figure out what to do with the information you provided. Some algorithms will manage to find the relationship between the features, but most algorithms won’t.

The answer to this problem is feature creation. Feature creation is that part of machine learning that is considered more an art than a science because it implies human intervention in creatively mixing the existing features. You perform this task by means of addition, subtraction, multiplication, and ratio to generate new derived features with more predictive power than the originals.

Knowing the problem well and figuring out how a human being would solve it is part of feature creation. So, connecting to the previous example, the fact that land surface connects to the property price is common knowledge. If surface is missing from your features when trying to guess the value of a property, you can recover such information from the existing data — and doing so increases the performance of the predictions.

Regardless of whether you rely on common sense, common knowledge, or specialized expertise, you can do a lot for your machine algorithm if you first figure out what information should work the best for the problem and then try to have it available or derive it from among your features.

Creating features automatically

You can create some new features automatically. One way to achieve automatic feature creation is to use polynomial expansion. Specific ways are available to achieve polynomial expansion so that you create features automatically in both R and Python. For the time being, you need to grasp the concepts behind polynomial expansion.

In polynomial expansion, you automatically create interactions between features as well as create powers (for instance, computing the square of a feature). Interactions rely on multiplication of the features. Creating a new feature using multiplication helps to keep track of how features tend to behave as a whole. Therefore, it helps to map complex relationships between your features that can hint at special situations.

A great example of an interaction is the noise emitted from a car and the price of the car. Consumers don’t appreciate noisy cars unless they buy a sports car, in which case the engine noise is a plus that reminds the owner of the car’s power. It also makes bystanders notice the cool car, so noise plays a great role in showing off because noise will certainly attract others’ attention. On the other hand, noise when driving a family car is not all that cool.

In a machine learning application, in trying to predict the rate of preference for a certain car, features such as noise and the price of the car are predictive by themselves. However, multiplying the two values and adding them to the set of features can unequivocally hint to a learning algorithm that the target is a sports car (when you multiply high noise levels by a high price).

Powers help by creating nonlinear relations between the response and the features, hinting at specific situations.

As another example, imagine that you have to predict a person’s yearly expenses. Age is a good predictor because as people grow old and mature, their life and family situation change, too. Students start out poor but then find work and can build a family. From a general point of view, expenses tend to grow as does age until a certain point. Retirement usually marks a point at which expenses tend to diminish. Age contains such information, but it’s a feature that tends to grow, and relating expenses to its growth doesn’t help to describe the inversion that occurs at a certain age.

Adding the squared feature helps create a counter effect to age itself, which is small at the beginning but grows quickly with age. The final effect is a parabola, with an initial growth characterized by a peak in expenses at a certain age, and then a decrease.

As mentioned initially, knowing in advance such dynamics (noise and sports car, consumption and elder age) can help you create the right features. But if you don’t know these dynamics in advance, polynomial expansion will automatically create them for you because, given a certain order, it will create interactions and powers of that order. The order will point out the number of multiplications and the maximum power to apply to the existing features.

So a polynomial expansion of order 2 raises all the features to the second power and multiplies every single feature by all the others. (You get the multiplication of all the combinations of two features.) Clearly, the higher the number, the more new features will be created, but many of them will be redundant and just contribute to making your machine learning algorithm overfit the data.

When using polynomial expansion, you have to pay attention to the explosion of features you are creating. Powers increase linearly, so if you have five features and you need an expansion of order 2, each feature is raised by up to the second power. Increasing the order of one just adds a new power feature for each original feature. Instead, interactions increase based on combinations of the features up to that order.

In fact, with five features and a polynomial expansion of order 2, all ten unique combinations of the coupling of the features are created. Increasing the order to 3 will require the creation of all the unique combinations of two variables, plus the unique combinations of three variables, that is, 20 features.