Data Visualization

Ofili Lewis
3 min readJan 26, 2023
Photo by Luke Chesser on Unsplash

Data visualization is the process of representing data in a graphical or pictorial format. It is a powerful tool that allows data scientists and analysts to communicate complex information in a simple and intuitive way. Effective data visualization can help to identify patterns and trends in the data, making it easier to understand and interpret.

There are several popular libraries for data visualization in Python, including matplotlib, seaborn, and plotly. One of the most commonly used plots in data visualization is the scatter plot, which can be used to visualize the relationship between two variables. For example, the code below uses the scatter() function from matplotlib to create a scatter plot of two columns in a DataFrame:

import matplotlib.pyplot as plt

df = pd.read_csv("data.csv")
plt.scatter(df["column_1"], df["column_2"])
plt.xlabel("Column 1")
plt.ylabel("Column 2")
plt.show()

Another popular plot is the line plot, which is used to visualize the change in a variable over time. For example, the code below uses the line() function from matplotlib to create a line plot of a column in a DataFrame:

import matplotlib.pyplot as plt

df = pd.read_csv("data.csv")
plt.line(df["date"], df["column"])
plt.xlabel("Date")
plt.ylabel("Column")
plt.show()

Bar plots and histograms are also commonly used to visualize data in Python. Bar plots are used to compare the values of different groups or categories, while histograms are used to visualize the distribution of a variable. For example, the code below uses the bar() function from matplotlib to create a bar plot of a column in a DataFrame:

import matplotlib.pyplot as plt

df = pd.read_csv("data.csv")
plt.bar(df["category"], df["column"])
plt.xlabel("Category")
plt.ylabel("Column")
plt.show()

Another popular data visualization library in Python is seaborn. It is built on top of matplotlib and provides additional functionality for creating more complex plots. For example, the code below uses the lmplot() function from seaborn to create a scatter plot with a fitted line of best fit:

import seaborn as sns

df = pd.read_csv("data.csv")
sns.lmplot(x="column_1", y="column_2", data=df)
plt.show()

In SQL, data visualization can be achieved using a combination of SQL commands and data types. For example, to create a scatter plot, the following command can be used:

WITH CTE AS (
SELECT column_1, column_2
FROM table_name
)
SELECT column_1, column_2
FROM CTE

This command uses the Common Table Expression (CTE) to select the two columns and then plot them using any visualization tool such as Tableau, PowerBI or Excel.

Another popular plot is the line plot, which is used to visualize the change in a variable over time. In SQL, a line plot can be created by using the GROUP BY clause to group the data by a specific time interval, such as day or month, and then selecting the relevant columns to be plotted. For example, the following command can be used to create a line plot of total sales by month:

WITH CTE AS (
SELECT DATE_TRUNC('month', date) AS month, SUM(sales) as total_sales
FROM table_name
GROUP BY month
)
SELECT month, total_sales
FROM CTE

Bar plots can also be created in SQL by using the GROUP BY clause to group the data by a specific category and then selecting the relevant columns to be plotted. For example, the following command can be used to create a bar plot of total sales by product category:

WITH CTE AS (
SELECT category, SUM(sales) as total_sales
FROM table_name
GROUP BY category
)
SELECT category, total_sales
FROM CTE

Histograms can also be created in SQL by using the GROUP BY clause to group the data into bins and then selecting the relevant columns to be plotted. For example, the following command can be used to create a histogram of total sales by price range:

WITH CTE AS (
SELECT CASE
WHEN price < 10 THEN '0-10'
WHEN price BETWEEN 10 AND 20 THEN '10-20'
WHEN price > 20 THEN '20+'
ELSE 'NA'
END AS price_range,
SUM(sales) as total_sales
FROM table_name
GROUP BY price_range
)
SELECT price_range, total_sales
FROM CTE

In conclusion, data visualization is a powerful tool that allows data scientists and analysts to communicate complex information in a simple and intuitive way. By using the above techniques, data scientists can create effective data visualizations that can help to identify patterns and trends in the data, making it easier to understand and interpret. Data visualization is not only limited to python, SQL also plays a crucial role in creating visually appealing plots that can communicate insights effectively.

--

--

Ofili Lewis

Transforming and making data more accessible so that organizations can use it to evaluate and optimize performance.