Visualizing Climate Change

In this blog, we will make visualization of climate change using historic temperature data from The Global Historical Climatology Network data set, compiled by the National Centers for Environmental Information of the US National Oceanic and Atmospheric Administration. We will be using the sqlite3, pandas, plotly express libraries.

First, we will import the relevant data from csv files.

import sqlite3
import pandas as pd
import plotly.express as px

#establish a connection
conn = sqlite3.connect("temps.db")

#read in the csv
df1 = pd.read_csv("temps_stacked.csv")

df2 = pd.read_csv("countries.csv")
#recode the columns to get rid of space
df2 = df2.rename(columns= {"FIPS 10-4": "code"})
df3 = pd.read_csv("station-metadata.csv")

Let’s inspect the three files.

df1.head()
ID Year Month Temp
0 ACW00011604 1961 1 -0.89
1 ACW00011604 1961 2 2.36
2 ACW00011604 1961 3 4.72
3 ACW00011604 1961 4 7.73
4 ACW00011604 1961 5 11.28
df2.head()
code ISO 3166 Name
0 AF AF Afghanistan
1 AX - Akrotiri
2 AL AL Albania
3 AG DZ Algeria
4 AQ AS American Samoa
df3.head()
ID LATITUDE LONGITUDE STNELEV NAME
0 ACW00011604 57.7667 11.8667 18.0 SAVE
1 AE000041196 25.3330 55.5170 34.0 SHARJAH_INTER_AIRP
2 AEM00041184 25.6170 55.9330 31.0 RAS_AL_KHAIMAH_INTE
3 AEM00041194 25.2550 55.3640 10.4 DUBAI_INTL
4 AEM00041216 24.4300 54.4700 3.0 ABU_DHABI_BATEEN_AIR

Next, we will convert the pandas dataframe to tables in a database.

#convert to database
df1.to_sql("temperatures", conn, if_exists="replace", index=False)

df2.to_sql("countries", conn, if_exists="replace", index=False)

df3.to_sql("stations", conn, if_exists="replace", index=False)

#close the connection
conn.close()
C:\Users\justi\anaconda3\lib\site-packages\pandas\core\generic.py:2872: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
  sql.to_sql(

Now, let’s write a function to output a country’s yearly climate data.

def query_climate_database(country, year_begin, year_end, month):
    """
    A function that takes four arguments and returns a Pandas dataframe by making a query
    """
    conn = sqlite3.connect("temps.db")
    
    #command line that gets the relevant info    
    cmd = f"SELECT S.name, S.latitude, S.longitude, C.Name, T.year, T.month, T.temp \
            FROM temperatures T \
            LEFT JOIN stations S ON T.id = S.id\
            LEFT JOIN countries C ON C.code = SUBSTRING(T.id, 1, 2)\
            WHERE C.Name='{country}' AND T.year >= {year_begin} \
            AND T.year<={year_end} AND T.month = {month}"
    
    #converts to Pandas dataframe
    df = pd.read_sql_query(cmd, conn)
    conn.close()
    return df

Let’s test our function.

query_climate_database(country = "India", 
                       year_begin = 1980, 
                       year_end = 2020,
                       month = 1)
NAME LATITUDE LONGITUDE Name Year Month Temp
0 PBO_ANANTAPUR 14.583 77.633 India 1980 1 23.48
1 PBO_ANANTAPUR 14.583 77.633 India 1981 1 24.57
2 PBO_ANANTAPUR 14.583 77.633 India 1982 1 24.19
3 PBO_ANANTAPUR 14.583 77.633 India 1983 1 23.51
4 PBO_ANANTAPUR 14.583 77.633 India 1984 1 24.81
... ... ... ... ... ... ... ...
3147 DARJEELING 27.050 88.270 India 1983 1 5.10
3148 DARJEELING 27.050 88.270 India 1986 1 6.90
3149 DARJEELING 27.050 88.270 India 1994 1 8.10
3150 DARJEELING 27.050 88.270 India 1995 1 5.60
3151 DARJEELING 27.050 88.270 India 1997 1 5.70

3152 rows × 7 columns

We can then make our second function that visualizes climate change.

We will answer this question : How does the average yearly change in temperature vary within a given country?

We will use the sklearn library to calculate the estimated yearly increase in temperature in every station. To compute the avearage yearly increase, we’ll first write a function to calculate the average change in temperature per year using linear regression.

from sklearn.linear_model import LinearRegression

def coef(data_group):
    """
    A function that returns the slope of the best-fit linear regression line
    """
    X = data_group[["Year"]]
    y = data_group["Temp"]
    LR = LinearRegression()
    LR.fit(X, y)
    slope = LR.coef_[0]
    return slope
def temperature_coefficient_plot(country, year_begin, year_end, month, min_obs, **kwargs):
    """
    This function takes in country, year_begin, year_end, and month, as the previous sql function does. It also
    takes in a minimum observation argument, which dictates the minimum amount of observation for any station's 
    data to be accepted. Anything less will be filtered out. It also accepts additional keywords arguments to 
    style the plot. 
    """
    #call the query function to get the dataframe
    df = query_climate_database(country, year_begin, year_end, month)
    
    #filters out any station not meeting the min_obs requirement
    df['len'] = df.groupby(["NAME"])["Temp"].transform(len)
    #print(df.head())
    df = df[df['len'] >= min_obs]
    
    #adds esitmated yearly increase column using our coef function
    coefs = df.groupby(["NAME", "LATITUDE", "LONGITUDE"]).apply(coef)
    coefs = coefs.reset_index()
    #print(coefs.head())
    coefs.rename(columns = {0:'Estimated Yearly Increase (°C)'}, inplace = True)
    
    #plots the data
    fig = px.scatter_mapbox(coefs, # data for the points you want to plot
                        lat = "LATITUDE", # column name for latitude informataion
                        lon = "LONGITUDE", # column name for longitude information
                        hover_name = "NAME", # what's the bold text that appears when you hover over
                        color = "Estimated Yearly Increase (°C)",
                        **kwargs) # map style
    return fig

Now let’s test our function.

from plotly.io import write_html
color_map = px.colors.diverging.RdGy_r # choose a colormap

fig = temperature_coefficient_plot("India", 1980, 2020, 1, 
                                   min_obs = 10,
                                   zoom = 2,
                                   mapbox_style="carto-positron",
                                   color_continuous_scale=color_map,
                                   title = "Estimates of Yearly Increase in Temperature in January for Stations in India, years 1980 - 2020",
                                   color_continuous_midpoint = 0,
                                   width = 1000,
                                   height = 600
                                  )

fig.show()
#saves our plot
write_html(fig, "India.html")

Let’s look at another country.

color_map = px.colors.diverging.RdGy_r # choose a colormap

fig = temperature_coefficient_plot("China", 1980, 2020, 1, 
                                   min_obs = 10,
                                   zoom = 2,
                                   mapbox_style="carto-positron",
                                   color_continuous_scale=color_map,
                                   title = "Estimates of Yearly Increase in Temperature in January for Stations in China, years 1980 - 2020",
                                   color_continuous_midpoint = 0,
                                   width = 1000,
                                   height = 600
                                  )

fig.show()
#saves our plot
write_html(fig, "China.html")

Finally, more interesting visualizations!

Let’s look at Antartica, and how climate change has impacted the coldest continent on our planet. We will use the sns library to graph the distribution of temperature changes in stations of Antartica.

First, let’s modify our query function. We’ll look at every month of each year. We’ll use the latitude parameter to limit our search to Antartica stations (<-80).

def query_climate_database_2(year_begin, year_end, lat):
    """
    A function that takes three arguments and returns a Pandas dataframe by making a query
    """
    conn = sqlite3.connect("temps.db")
    
    #command line that gets the relevant info    
    cmd = f"SELECT S.name, T.year, T.month, T.temp \
            FROM temperatures T \
            LEFT JOIN stations S ON T.id = S.id\
            WHERE T.year >= {year_begin} \
            AND T.year<={year_end} AND S.latitude < {lat}"
    
    #converts to Pandas dataframe
    df = pd.read_sql_query(cmd, conn)
    conn.close()
    return df
def plot_antartica(year_begin, year_end, lat, **kwargs):
    #use the same coefs function to determine the avearge yearly change
    df = query_climate_database_2(year_begin, year_end, lat)
    coefs = df.groupby(["NAME", "Month"]).apply(coef)
    #coefs.rename(columns = {0:'Estimated Yearly Increase (°C)'}, inplace = True)
    coefs = coefs.reset_index()

    #plot the data points
    fig = px.scatter(data_frame = coefs, 
                x = 0, 
                y = "NAME",
                **kwargs
          )
    return fig
from plotly.io import write_html
#call our function with predetermined parameters
fig = plot_antartica(1980, 2020, -80, 
                     title = "Average Yearly Change in Temperature\nat Antarctic Climate Stations\n(Each dot is one month of the year)",
                     labels={
                     "NAME": "STATION NAME",
                     "0": "Average Yearly Change in Temperature",
                 },
                    )
fig.show()

#saves our plot
write_html(fig, "antartica.html")

It appears that some stations in Antartica have seen significant increases in temperature. Still, most dots fall witin +-0.25 degrees per year, except one significant outlier in the Brianna station, which should be disregarded.

For the last plot, I want to look at the yearly change in temperature of the same country across different times of the year. Does climate change affect summer and winter differently?

Let’s look at India. We’ll look at January, June, and September to get a comprehensive coverage of the year.

#First, let's define a new function to select the data.
def query_climate_database_month(country, year_begin, year_end, month1, month2, month3):
    """
    A function that takes four arguments and returns a Pandas dataframe by making a query
    """
    conn = sqlite3.connect("temps.db")
    
    #command line that gets the relevant info    
    cmd = f"SELECT S.name, S.latitude, S.longitude, C.Name, T.year, T.month, T.temp \
            FROM temperatures T \
            LEFT JOIN stations S ON T.id = S.id\
            LEFT JOIN countries C ON C.code = SUBSTRING(T.id, 1, 2)\
            WHERE C.Name='{country}' AND T.year >= {year_begin} \
            AND T.year<={year_end} AND T.month = {month1} OR T.month = {month2} OR T.month = {month3} "
    
    #converts to Pandas dataframe
    df = pd.read_sql_query(cmd, conn)
    conn.close()
    return df
def temperature_coefficient_plot_month(country, year_begin, year_end, month1, month2, month3, min_obs, **kwargs):
    """
    This function takes in countries, year_begin, year_end, and month, as the previous sql function does. It also
    takes in a minimum observation argument, which dictates the minimum amount of observation for any station's 
    data to be accepted. Anything less will be filtered out. It also accepts additional keywords arguments to 
    style the plot. 
    """
    #call the query function to get the dataframe
    df = query_climate_database_month(country, year_begin, year_end, month1, month2, month3)
    
    #filters out any station not meeting the min_obs requirement
    df['len'] = df.groupby(["NAME", "Month"])["Temp"].transform(len)
    #print(df.head())
    df = df[df['len'] >= min_obs]
    
    #adds esitmated yearly increase column using our coef function
    coefs = df.groupby(["NAME", "LATITUDE", "LONGITUDE", "Month"]).apply(coef)
    coefs = coefs.reset_index()
    #print(coefs.head())
    coefs.rename(columns = {0:'Estimated Yearly Increase (°C)', "Name":"Country"}, inplace = True)
    
    #plots the data
    fig = px.box(data_frame=coefs, # data for the points you want to plot
                        y = "Estimated Yearly Increase (°C)", # column name for longitude information
                        facet_col = "Month",
                        **kwargs) # map style
    return fig
#call our function
from plotly.io import write_html
fig = temperature_coefficient_plot_month("India", 1990, 2020, 1, 6, 9, 10, 
                                   title = "Estimates of Yearly Increase in Temperature in January, June, and September for Stations in India, years 1990 - 2020",
                                   width = 1000,
                                   height = 600)

fig.show()
write_html(fig, "month.html")

As the box plot shows, there is a negligible difference in how climate change affects different seasons, disregarding outliers. All three months appear to be centered around zero, while January shows less variance in temperature readings. This could be related to India’s specific climate.

Written on April 14, 2022