Pandas¶
The pandas package provides us with the DataFrame data type for working with structured data, or in other words, data that is organized in a tabular format with defined rows and columns.
We have previously seen two tools that can be used for working with structured data: 2D Numpy arrays and dictionaries. DataFrames provide the following advantages over these other data structures:
The elements of a Numpy array must all be of the same data type. While individual columns of a pandas DataFrame must have a consistent data type, different columns can contain different types.
Unlike a Numpy array, we can assign names to the columns and (less importantly) to the rows of a DataFrame.
We have seen that we can use dictionaries to represent tabular data. For example, we can set the values in a dictionary to be lists representing columns, in which case the keys will represent column names. However, there would be no explicit concept of a row in such a setup. DataFrames explicitly define both rows and columns.
It is a common convention to import pandas under the alias pd
.
import pandas as pd
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-7dd3504c366f> in <module>
----> 1 import pandas as pd
ModuleNotFoundError: No module named 'pandas'