Pandas¶

The pandas package provides us with the DataFrame data type for working with structured data, or in other words, data that is organized in a tabular format with defined rows and columns.

We have previously seen two tools that can be used for working with structured data: 2D Numpy arrays and dictionaries. DataFrames provide the following advantages over these other data structures:

  1. The elements of a Numpy array must all be of the same data type. While individual columns of a pandas DataFrame must have a consistent data type, different columns can contain different types.

  2. Unlike a Numpy array, we can assign names to the columns and (less importantly) to the rows of a DataFrame.

  3. We have seen that we can use dictionaries to represent tabular data. For example, we can set the values in a dictionary to be lists representing columns, in which case the keys will represent column names. However, there would be no explicit concept of a row in such a setup. DataFrames explicitly define both rows and columns.

It is a common convention to import pandas under the alias pd.

import pandas as pd
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-7dd3504c366f> in <module>
----> 1 import pandas as pd

ModuleNotFoundError: No module named 'pandas'