Data Science Programming All-in-One For Dummies
Book image
Explore Book Buy On Amazon
Data science programming begins with the language you choose. The most common languages for data science programming are Python and R. Every data form in Python and R begins with a scalar — a single item of a particular type. Precisely how you define a scalar depends on how you want to view objects within your code and the definitions of scalars for your language.

For example, R provides these native, simple data types:

  • Character
  • Numeric (real or decimal)
  • Integer
  • Logical
  • Complex
In many respects, R views strings as vectors of characters; the scalar element is a character, not a string. The difference is important when thinking about how R works with scalars. R also provides a character vector, which is different from an R string. You can read about the difference at gastonsanchez.com.

Python provides these native, simple data types:

  • Boolean
  • Integer
  • Float
  • Complex
  • String
Note that Python doesn’t include a character data type because it works with strings, not with characters. Yes, you can create a string containing a single character and you can interact with individual characters in a string, but there isn’t an actual character type. To see this fact for yourself, try this code:
anA = chr(65)
print(type(anA))
The output will be <class 'str'>, rather than <class 'char'>, which is what most languages would provide. Consequently, a string is a scalar in Python but a vector in R. Keeping language differences in mind will help as you perform analysis on your data.

Most languages also support what you might term as semi-native data types. For example, Python supports a Fraction data type that you create by using code like this:

from fractions import Fraction
x = Fraction(2, 3)
print(x)
print(type(x))
The fact that you must import Fraction means that it’s not available all the time, as something like complex or int is. The tip-off that this is not a built-in class is the class output of <class 'fractions.Fraction'>. However, you get Fraction with your Python installation, which means that it’s actually a part of the language (hence, semi-native).

External libraries that define additional scalar data types are available for most languages. Access to these additional scalar types is important in some cases. Python provides access to just one data type in any particular category.

For example, if you need to create a variable that represents a number without a decimal portion, you use the integer data type. Using a generic designation like this is useful because it simplifies code and gives the developer a lot less to worry about.

However, in scientific calculations, you often need better control over how data appears in memory, which means having more data types — something that numpy provides for you.

For example, you might need to define a particular scalar as a short (a value that is 16 bits long). Using numpy, you could define it as myShort = np.short(15). You could define a variable of precisely the same size using the np.int16 function. You can discover more about the scalars provided by the NumPy library for Python. You also find that most languages provide means of extending the native types (see the articles at Python.org and greenteapress.com for additional details).

About This Article

This article is from the book:

About the book authors:

John Mueller has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). Luca Massaron, a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques.

John Mueller has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS). Luca Massaron, a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques.

This article can be found in the category: