Visualizing Climate Change
In this blog, we will make visualization of climate change using historic temperature data from The Global Historical Climatology Network data set, compiled by the National Centers for Environmental Information of the US National Oceanic and Atmospheric Administration. We will be using the sqlite3, pandas, plotly express libraries.
First, we will import the relevant data from csv files.
import sqlite3
import pandas as pd
import plotly.express as px
#establish a connection
conn = sqlite3.connect("temps.db")
#read in the csv
df1 = pd.read_csv("temps_stacked.csv")
df2 = pd.read_csv("countries.csv")
#recode the columns to get rid of space
df2 = df2.rename(columns= {"FIPS 10-4": "code"})
df3 = pd.read_csv("station-metadata.csv")
Let’s inspect the three files.
df1.head()
| ID | Year | Month | Temp | |
|---|---|---|---|---|
| 0 | ACW00011604 | 1961 | 1 | -0.89 |
| 1 | ACW00011604 | 1961 | 2 | 2.36 |
| 2 | ACW00011604 | 1961 | 3 | 4.72 |
| 3 | ACW00011604 | 1961 | 4 | 7.73 |
| 4 | ACW00011604 | 1961 | 5 | 11.28 |
df2.head()
| code | ISO 3166 | Name | |
|---|---|---|---|
| 0 | AF | AF | Afghanistan |
| 1 | AX | - | Akrotiri |
| 2 | AL | AL | Albania |
| 3 | AG | DZ | Algeria |
| 4 | AQ | AS | American Samoa |
df3.head()
| ID | LATITUDE | LONGITUDE | STNELEV | NAME | |
|---|---|---|---|---|---|
| 0 | ACW00011604 | 57.7667 | 11.8667 | 18.0 | SAVE |
| 1 | AE000041196 | 25.3330 | 55.5170 | 34.0 | SHARJAH_INTER_AIRP |
| 2 | AEM00041184 | 25.6170 | 55.9330 | 31.0 | RAS_AL_KHAIMAH_INTE |
| 3 | AEM00041194 | 25.2550 | 55.3640 | 10.4 | DUBAI_INTL |
| 4 | AEM00041216 | 24.4300 | 54.4700 | 3.0 | ABU_DHABI_BATEEN_AIR |
Next, we will convert the pandas dataframe to tables in a database.
#convert to database
df1.to_sql("temperatures", conn, if_exists="replace", index=False)
df2.to_sql("countries", conn, if_exists="replace", index=False)
df3.to_sql("stations", conn, if_exists="replace", index=False)
#close the connection
conn.close()
C:\Users\justi\anaconda3\lib\site-packages\pandas\core\generic.py:2872: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
sql.to_sql(
Now, let’s write a function to output a country’s yearly climate data.
def query_climate_database(country, year_begin, year_end, month):
"""
A function that takes four arguments and returns a Pandas dataframe by making a query
"""
conn = sqlite3.connect("temps.db")
#command line that gets the relevant info
cmd = f"SELECT S.name, S.latitude, S.longitude, C.Name, T.year, T.month, T.temp \
FROM temperatures T \
LEFT JOIN stations S ON T.id = S.id\
LEFT JOIN countries C ON C.code = SUBSTRING(T.id, 1, 2)\
WHERE C.Name='{country}' AND T.year >= {year_begin} \
AND T.year<={year_end} AND T.month = {month}"
#converts to Pandas dataframe
df = pd.read_sql_query(cmd, conn)
conn.close()
return df
Let’s test our function.
query_climate_database(country = "India",
year_begin = 1980,
year_end = 2020,
month = 1)
| NAME | LATITUDE | LONGITUDE | Name | Year | Month | Temp | |
|---|---|---|---|---|---|---|---|
| 0 | PBO_ANANTAPUR | 14.583 | 77.633 | India | 1980 | 1 | 23.48 |
| 1 | PBO_ANANTAPUR | 14.583 | 77.633 | India | 1981 | 1 | 24.57 |
| 2 | PBO_ANANTAPUR | 14.583 | 77.633 | India | 1982 | 1 | 24.19 |
| 3 | PBO_ANANTAPUR | 14.583 | 77.633 | India | 1983 | 1 | 23.51 |
| 4 | PBO_ANANTAPUR | 14.583 | 77.633 | India | 1984 | 1 | 24.81 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 3147 | DARJEELING | 27.050 | 88.270 | India | 1983 | 1 | 5.10 |
| 3148 | DARJEELING | 27.050 | 88.270 | India | 1986 | 1 | 6.90 |
| 3149 | DARJEELING | 27.050 | 88.270 | India | 1994 | 1 | 8.10 |
| 3150 | DARJEELING | 27.050 | 88.270 | India | 1995 | 1 | 5.60 |
| 3151 | DARJEELING | 27.050 | 88.270 | India | 1997 | 1 | 5.70 |
3152 rows × 7 columns
We can then make our second function that visualizes climate change.
We will answer this question : How does the average yearly change in temperature vary within a given country?
We will use the sklearn library to calculate the estimated yearly increase in temperature in every station. To compute the avearage yearly increase, we’ll first write a function to calculate the average change in temperature per year using linear regression.
from sklearn.linear_model import LinearRegression
def coef(data_group):
"""
A function that returns the slope of the best-fit linear regression line
"""
X = data_group[["Year"]]
y = data_group["Temp"]
LR = LinearRegression()
LR.fit(X, y)
slope = LR.coef_[0]
return slope
def temperature_coefficient_plot(country, year_begin, year_end, month, min_obs, **kwargs):
"""
This function takes in country, year_begin, year_end, and month, as the previous sql function does. It also
takes in a minimum observation argument, which dictates the minimum amount of observation for any station's
data to be accepted. Anything less will be filtered out. It also accepts additional keywords arguments to
style the plot.
"""
#call the query function to get the dataframe
df = query_climate_database(country, year_begin, year_end, month)
#filters out any station not meeting the min_obs requirement
df['len'] = df.groupby(["NAME"])["Temp"].transform(len)
#print(df.head())
df = df[df['len'] >= min_obs]
#adds esitmated yearly increase column using our coef function
coefs = df.groupby(["NAME", "LATITUDE", "LONGITUDE"]).apply(coef)
coefs = coefs.reset_index()
#print(coefs.head())
coefs.rename(columns = {0:'Estimated Yearly Increase (°C)'}, inplace = True)
#plots the data
fig = px.scatter_mapbox(coefs, # data for the points you want to plot
lat = "LATITUDE", # column name for latitude informataion
lon = "LONGITUDE", # column name for longitude information
hover_name = "NAME", # what's the bold text that appears when you hover over
color = "Estimated Yearly Increase (°C)",
**kwargs) # map style
return fig
Now let’s test our function.
from plotly.io import write_html
color_map = px.colors.diverging.RdGy_r # choose a colormap
fig = temperature_coefficient_plot("India", 1980, 2020, 1,
min_obs = 10,
zoom = 2,
mapbox_style="carto-positron",
color_continuous_scale=color_map,
title = "Estimates of Yearly Increase in Temperature in January for Stations in India, years 1980 - 2020",
color_continuous_midpoint = 0,
width = 1000,
height = 600
)
fig.show()
#saves our plot
write_html(fig, "India.html")
Let’s look at another country.
color_map = px.colors.diverging.RdGy_r # choose a colormap
fig = temperature_coefficient_plot("China", 1980, 2020, 1,
min_obs = 10,
zoom = 2,
mapbox_style="carto-positron",
color_continuous_scale=color_map,
title = "Estimates of Yearly Increase in Temperature in January for Stations in China, years 1980 - 2020",
color_continuous_midpoint = 0,
width = 1000,
height = 600
)
fig.show()
#saves our plot
write_html(fig, "China.html")
Finally, more interesting visualizations!
Let’s look at Antartica, and how climate change has impacted the coldest continent on our planet. We will use the sns library to graph the distribution of temperature changes in stations of Antartica.
First, let’s modify our query function. We’ll look at every month of each year. We’ll use the latitude parameter to limit our search to Antartica stations (<-80).
def query_climate_database_2(year_begin, year_end, lat):
"""
A function that takes three arguments and returns a Pandas dataframe by making a query
"""
conn = sqlite3.connect("temps.db")
#command line that gets the relevant info
cmd = f"SELECT S.name, T.year, T.month, T.temp \
FROM temperatures T \
LEFT JOIN stations S ON T.id = S.id\
WHERE T.year >= {year_begin} \
AND T.year<={year_end} AND S.latitude < {lat}"
#converts to Pandas dataframe
df = pd.read_sql_query(cmd, conn)
conn.close()
return df
def plot_antartica(year_begin, year_end, lat, **kwargs):
#use the same coefs function to determine the avearge yearly change
df = query_climate_database_2(year_begin, year_end, lat)
coefs = df.groupby(["NAME", "Month"]).apply(coef)
#coefs.rename(columns = {0:'Estimated Yearly Increase (°C)'}, inplace = True)
coefs = coefs.reset_index()
#plot the data points
fig = px.scatter(data_frame = coefs,
x = 0,
y = "NAME",
**kwargs
)
return fig
from plotly.io import write_html
#call our function with predetermined parameters
fig = plot_antartica(1980, 2020, -80,
title = "Average Yearly Change in Temperature\nat Antarctic Climate Stations\n(Each dot is one month of the year)",
labels={
"NAME": "STATION NAME",
"0": "Average Yearly Change in Temperature",
},
)
fig.show()
#saves our plot
write_html(fig, "antartica.html")
It appears that some stations in Antartica have seen significant increases in temperature. Still, most dots fall witin +-0.25 degrees per year, except one significant outlier in the Brianna station, which should be disregarded.
For the last plot, I want to look at the yearly change in temperature of the same country across different times of the year. Does climate change affect summer and winter differently?
Let’s look at India. We’ll look at January, June, and September to get a comprehensive coverage of the year.
#First, let's define a new function to select the data.
def query_climate_database_month(country, year_begin, year_end, month1, month2, month3):
"""
A function that takes four arguments and returns a Pandas dataframe by making a query
"""
conn = sqlite3.connect("temps.db")
#command line that gets the relevant info
cmd = f"SELECT S.name, S.latitude, S.longitude, C.Name, T.year, T.month, T.temp \
FROM temperatures T \
LEFT JOIN stations S ON T.id = S.id\
LEFT JOIN countries C ON C.code = SUBSTRING(T.id, 1, 2)\
WHERE C.Name='{country}' AND T.year >= {year_begin} \
AND T.year<={year_end} AND T.month = {month1} OR T.month = {month2} OR T.month = {month3} "
#converts to Pandas dataframe
df = pd.read_sql_query(cmd, conn)
conn.close()
return df
def temperature_coefficient_plot_month(country, year_begin, year_end, month1, month2, month3, min_obs, **kwargs):
"""
This function takes in countries, year_begin, year_end, and month, as the previous sql function does. It also
takes in a minimum observation argument, which dictates the minimum amount of observation for any station's
data to be accepted. Anything less will be filtered out. It also accepts additional keywords arguments to
style the plot.
"""
#call the query function to get the dataframe
df = query_climate_database_month(country, year_begin, year_end, month1, month2, month3)
#filters out any station not meeting the min_obs requirement
df['len'] = df.groupby(["NAME", "Month"])["Temp"].transform(len)
#print(df.head())
df = df[df['len'] >= min_obs]
#adds esitmated yearly increase column using our coef function
coefs = df.groupby(["NAME", "LATITUDE", "LONGITUDE", "Month"]).apply(coef)
coefs = coefs.reset_index()
#print(coefs.head())
coefs.rename(columns = {0:'Estimated Yearly Increase (°C)', "Name":"Country"}, inplace = True)
#plots the data
fig = px.box(data_frame=coefs, # data for the points you want to plot
y = "Estimated Yearly Increase (°C)", # column name for longitude information
facet_col = "Month",
**kwargs) # map style
return fig
#call our function
from plotly.io import write_html
fig = temperature_coefficient_plot_month("India", 1990, 2020, 1, 6, 9, 10,
title = "Estimates of Yearly Increase in Temperature in January, June, and September for Stations in India, years 1990 - 2020",
width = 1000,
height = 600)
fig.show()
write_html(fig, "month.html")
As the box plot shows, there is a negligible difference in how climate change affects different seasons, disregarding outliers. All three months appear to be centered around zero, while January shows less variance in temperature readings. This could be related to India’s specific climate.