Visualizing Climate Change

In this blog, we will make visualization of climate change using historic temperature data from The Global Historical Climatology Network data set, compiled by the National Centers for Environmental Information of the US National Oceanic and Atmospheric Administration. We will be using the sqlite3, pandas, plotly express libraries.

First, we will import the relevant data from csv files.

import sqlite3
import pandas as pd
import plotly.express as px

#establish a connection
conn = sqlite3.connect("temps.db")

#read in the csv
df1 = pd.read_csv("temps_stacked.csv")

df2 = pd.read_csv("countries.csv")
#recode the columns to get rid of space
df2 = df2.rename(columns= {"FIPS 10-4": "code"})
df3 = pd.read_csv("station-metadata.csv")

Let’s inspect the three files.

df1.head()

	ID	Year	Month	Temp
0	ACW00011604	1961	1	-0.89
1	ACW00011604	1961	2	2.36
2	ACW00011604	1961	3	4.72
3	ACW00011604	1961	4	7.73
4	ACW00011604	1961	5	11.28

df2.head()

	code	ISO 3166	Name
0	AF	AF	Afghanistan
1	AX	-	Akrotiri
2	AL	AL	Albania
3	AG	DZ	Algeria
4	AQ	AS	American Samoa

df3.head()

	ID	LATITUDE	LONGITUDE	STNELEV	NAME
0	ACW00011604	57.7667	11.8667	18.0	SAVE
1	AE000041196	25.3330	55.5170	34.0	SHARJAH_INTER_AIRP
2	AEM00041184	25.6170	55.9330	31.0	RAS_AL_KHAIMAH_INTE
3	AEM00041194	25.2550	55.3640	10.4	DUBAI_INTL
4	AEM00041216	24.4300	54.4700	3.0	ABU_DHABI_BATEEN_AIR

Next, we will convert the pandas dataframe to tables in a database.

#convert to database
df1.to_sql("temperatures", conn, if_exists="replace", index=False)

df2.to_sql("countries", conn, if_exists="replace", index=False)

df3.to_sql("stations", conn, if_exists="replace", index=False)

#close the connection
conn.close()

C:\Users\justi\anaconda3\lib\site-packages\pandas\core\generic.py:2872: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
  sql.to_sql(

Now, let’s write a function to output a country’s yearly climate data.

def query_climate_database(country, year_begin, year_end, month):
    """
    A function that takes four arguments and returns a Pandas dataframe by making a query
    """
    conn = sqlite3.connect("temps.db")
    
    #command line that gets the relevant info    
    cmd = f"SELECT S.name, S.latitude, S.longitude, C.Name, T.year, T.month, T.temp \
            FROM temperatures T \
            LEFT JOIN stations S ON T.id = S.id\
            LEFT JOIN countries C ON C.code = SUBSTRING(T.id, 1, 2)\
            WHERE C.Name='{country}' AND T.year >= {year_begin} \
            AND T.year<={year_end} AND T.month = {month}"
    
    #converts to Pandas dataframe
    df = pd.read_sql_query(cmd, conn)
    conn.close()
    return df

Let’s test our function.

query_climate_database(country = "India", 
                       year_begin = 1980, 
                       year_end = 2020,
                       month = 1)

	NAME	LATITUDE	LONGITUDE	Name	Year	Month	Temp
0	PBO_ANANTAPUR	14.583	77.633	India	1980	1	23.48
1	PBO_ANANTAPUR	14.583	77.633	India	1981	1	24.57
2	PBO_ANANTAPUR	14.583	77.633	India	1982	1	24.19
3	PBO_ANANTAPUR	14.583	77.633	India	1983	1	23.51
4	PBO_ANANTAPUR	14.583	77.633	India	1984	1	24.81
...	...	...	...	...	...	...	...
3147	DARJEELING	27.050	88.270	India	1983	1	5.10
3148	DARJEELING	27.050	88.270	India	1986	1	6.90
3149	DARJEELING	27.050	88.270	India	1994	1	8.10
3150	DARJEELING	27.050	88.270	India	1995	1	5.60
3151	DARJEELING	27.050	88.270	India	1997	1	5.70

3152 rows × 7 columns

We can then make our second function that visualizes climate change.

We will answer this question : How does the average yearly change in temperature vary within a given country?

We will use the sklearn library to calculate the estimated yearly increase in temperature in every station. To compute the avearage yearly increase, we’ll first write a function to calculate the average change in temperature per year using linear regression.

from sklearn.linear_model import LinearRegression

def coef(data_group):
    """
    A function that returns the slope of the best-fit linear regression line
    """
    X = data_group[["Year"]]
    y = data_group["Temp"]
    LR = LinearRegression()
    LR.fit(X, y)
    slope = LR.coef_[0]
    return slope

def temperature_coefficient_plot(country, year_begin, year_end, month, min_obs, **kwargs):
    """
    This function takes in country, year_begin, year_end, and month, as the previous sql function does. It also
    takes in a minimum observation argument, which dictates the minimum amount of observation for any station's 
    data to be accepted. Anything less will be filtered out. It also accepts additional keywords arguments to 
    style the plot. 
    """
    #call the query function to get the dataframe
    df = query_climate_database(country, year_begin, year_end, month)
    
    #filters out any station not meeting the min_obs requirement
    df['len'] = df.groupby(["NAME"])["Temp"].transform(len)
    #print(df.head())
    df = df[df['len'] >= min_obs]
    
    #adds esitmated yearly increase column using our coef function
    coefs = df.groupby(["NAME", "LATITUDE", "LONGITUDE"]).apply(coef)
    coefs = coefs.reset_index()
    #print(coefs.head())
    coefs.rename(columns = {0:'Estimated Yearly Increase (°C)'}, inplace = True)
    
    #plots the data
    fig = px.scatter_mapbox(coefs, # data for the points you want to plot
                        lat = "LATITUDE", # column name for latitude informataion
                        lon = "LONGITUDE", # column name for longitude information
                        hover_name = "NAME", # what's the bold text that appears when you hover over
                        color = "Estimated Yearly Increase (°C)",
                        **kwargs) # map style
    return fig

Now let’s test our function.

from plotly.io import write_html
color_map = px.colors.diverging.RdGy_r # choose a colormap

fig = temperature_coefficient_plot("India", 1980, 2020, 1, 
                                   min_obs = 10,
                                   zoom = 2,
                                   mapbox_style="carto-positron",
                                   color_continuous_scale=color_map,
                                   title = "Estimates of Yearly Increase in Temperature in January for Stations in India, years 1980 - 2020",
                                   color_continuous_midpoint = 0,
                                   width = 1000,
                                   height = 600
                                  )

fig.show()
#saves our plot
write_html(fig, "India.html")

Let’s look at another country.

color_map = px.colors.diverging.RdGy_r # choose a colormap

fig = temperature_coefficient_plot("China", 1980, 2020, 1, 
                                   min_obs = 10,
                                   zoom = 2,
                                   mapbox_style="carto-positron",
                                   color_continuous_scale=color_map,
                                   title = "Estimates of Yearly Increase in Temperature in January for Stations in China, years 1980 - 2020",
                                   color_continuous_midpoint = 0,
                                   width = 1000,
                                   height = 600
                                  )

fig.show()
#saves our plot
write_html(fig, "China.html")

Finally, more interesting visualizations!

First, let’s modify our query function. We’ll look at every month of each year. We’ll use the latitude parameter to limit our search to Antartica stations (<-80).

def query_climate_database_2(year_begin, year_end, lat):
    """
    A function that takes three arguments and returns a Pandas dataframe by making a query
    """
    conn = sqlite3.connect("temps.db")
    
    #command line that gets the relevant info    
    cmd = f"SELECT S.name, T.year, T.month, T.temp \
            FROM temperatures T \
            LEFT JOIN stations S ON T.id = S.id\
            WHERE T.year >= {year_begin} \
            AND T.year<={year_end} AND S.latitude < {lat}"
    
    #converts to Pandas dataframe
    df = pd.read_sql_query(cmd, conn)
    conn.close()
    return df

def plot_antartica(year_begin, year_end, lat, **kwargs):
    #use the same coefs function to determine the avearge yearly change
    df = query_climate_database_2(year_begin, year_end, lat)
    coefs = df.groupby(["NAME", "Month"]).apply(coef)
    #coefs.rename(columns = {0:'Estimated Yearly Increase (°C)'}, inplace = True)
    coefs = coefs.reset_index()

    #plot the data points
    fig = px.scatter(data_frame = coefs, 
                x = 0, 
                y = "NAME",
                **kwargs
          )
    return fig

from plotly.io import write_html
#call our function with predetermined parameters
fig = plot_antartica(1980, 2020, -80, 
                     title = "Average Yearly Change in Temperature\nat Antarctic Climate Stations\n(Each dot is one month of the year)",
                     labels={
                     "NAME": "STATION NAME",
                     "0": "Average Yearly Change in Temperature",
                 },
                    )
fig.show()

#saves our plot
write_html(fig, "antartica.html")

It appears that some stations in Antartica have seen significant increases in temperature. Still, most dots fall witin +-0.25 degrees per year, except one significant outlier in the Brianna station, which should be disregarded.

For the last plot, I want to look at the yearly change in temperature of the same country across different times of the year. Does climate change affect summer and winter differently?

Let’s look at India. We’ll look at January, June, and September to get a comprehensive coverage of the year.

#First, let's define a new function to select the data.
def query_climate_database_month(country, year_begin, year_end, month1, month2, month3):
    """
    A function that takes four arguments and returns a Pandas dataframe by making a query
    """
    conn = sqlite3.connect("temps.db")
    
    #command line that gets the relevant info    
    cmd = f"SELECT S.name, S.latitude, S.longitude, C.Name, T.year, T.month, T.temp \
            FROM temperatures T \
            LEFT JOIN stations S ON T.id = S.id\
            LEFT JOIN countries C ON C.code = SUBSTRING(T.id, 1, 2)\
            WHERE C.Name='{country}' AND T.year >= {year_begin} \
            AND T.year<={year_end} AND T.month = {month1} OR T.month = {month2} OR T.month = {month3} "
    
    #converts to Pandas dataframe
    df = pd.read_sql_query(cmd, conn)
    conn.close()
    return df

def temperature_coefficient_plot_month(country, year_begin, year_end, month1, month2, month3, min_obs, **kwargs):
    """
    This function takes in countries, year_begin, year_end, and month, as the previous sql function does. It also
    takes in a minimum observation argument, which dictates the minimum amount of observation for any station's 
    data to be accepted. Anything less will be filtered out. It also accepts additional keywords arguments to 
    style the plot. 
    """
    #call the query function to get the dataframe
    df = query_climate_database_month(country, year_begin, year_end, month1, month2, month3)
    
    #filters out any station not meeting the min_obs requirement
    df['len'] = df.groupby(["NAME", "Month"])["Temp"].transform(len)
    #print(df.head())
    df = df[df['len'] >= min_obs]
    
    #adds esitmated yearly increase column using our coef function
    coefs = df.groupby(["NAME", "LATITUDE", "LONGITUDE", "Month"]).apply(coef)
    coefs = coefs.reset_index()
    #print(coefs.head())
    coefs.rename(columns = {0:'Estimated Yearly Increase (°C)', "Name":"Country"}, inplace = True)
    
    #plots the data
    fig = px.box(data_frame=coefs, # data for the points you want to plot
                        y = "Estimated Yearly Increase (°C)", # column name for longitude information
                        facet_col = "Month",
                        **kwargs) # map style
    return fig

#call our function
from plotly.io import write_html
fig = temperature_coefficient_plot_month("India", 1990, 2020, 1, 6, 9, 10, 
                                   title = "Estimates of Yearly Increase in Temperature in January, June, and September for Stations in India, years 1990 - 2020",
                                   width = 1000,
                                   height = 600)

fig.show()
write_html(fig, "month.html")

As the box plot shows, there is a negligible difference in how climate change affects different seasons, disregarding outliers. All three months appear to be centered around zero, while January shows less variance in temperature readings. This could be related to India’s specific climate.

Written on April 14, 2022