Introduction
In the previous chapter, we saw how to transform data and attributes obtained from raw sources into expected attributes and values through pandas. After structuring data into a tabular form, with each field containing the expected (correct and clean) values, we can say that this data is prepared for further analysis, which involves utilizing the prepared data to solve business problems. To ensure the best outcomes for a project, we need to be clear about the scope of the data, the questions we can address with it, and what problems we can solve with it before we can make any useful inference from the data.
To do that, not only do we need to understand the kind of data we have, but also the way some attributes are related to other attributes, what attributes are useful for us, and how they vary in the data provided. Performing this analysis on data and exploring ways we can use it, is not a straightforward task. We have to perform several initial exploratory tests on our data. Then, we need to interpret their results and possibly create and analyze more statistics and visualizations before we make a statement about the scope or analysis of the dataset. In data science pipelines, this process is referred to as Exploratory Data Analysis.
In this chapter, we will go through techniques to explore and analyze data by means of solving some problems critical for businesses, such as identifying attributes useful for marketing, analyzing key performance indicators, performing comparative analyses, and generating insights and visualizations. We will use the pandas, Matplotlib, and seaborn libraries in Python to solve these problems.