COVID-19 Maps
These maps are longer updated as the effects of COVID have drastically reduced. Frozen from 14/05/2023.
I felt that currently available COVID-19 maps do not account for population when providing figures of confirmed cases. I think that it leads to a misrepresentation of the areas which are most affected and misses out on important trends.
So I used Folium to create choropleth leaflet maps to visualise data such as confirmed COVID-19 cases over time and confirmed COVID-19 cases divided by population.
The cases divided by population map shows that residents of affluent suburbs were affected more than others. (Woollahra, Waverly, Hunters Hill, Mosman)
The maps show tooltips containing confirmed COVID-19 cases, the number of tests and Local Government Area statistics.
Code
I have also written some of my code so that anyone can replicate what I have done here.
The code below is a sample of the code I have written for creating the time slider map and cases map in one view.
import numpy as np
import pandas as pd
import os
import branca.colormap as cm
import folium # makes interactive
import geopandas as gpd
from folium.plugins import TimeSliderChoropleth
I like to modify the output of the console so that dataframes are more readable by increasing the console output size.
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 140)
Open the COVID-19 dataset provided by the NSW Ministry of Health.
df = pd.read_csv(
"https://data.nsw.gov.au/data/dataset/5424aa3b-550d-4637-ae50-7f458ce327f4/"
"resource/227f6b65-025c-482c-9f22-a25cf1b8594f/download/"
"covid-19-tests-by-date-and-location-and-result.csv")
Remove missing data and select relevant columns.
df = df[~df['lga_code19'].isna()]
df = df[['test_date','lga_name19','result']]
Edit the dates to reflect the Australian date format as we put days first.
df['test_date'] = pd.to_datetime(df['test_date'], dayfirst=True)
Sort the values by date in ascending order so the first test conducted is at the top.
df = df.sort_values('test_date')
Read the shapefile that provides the geometry of Local Government Areas, then select relevant columns and drop duplicates.
lga_coord = gpd.read_file('./nsw_lga_polygon_shp-gda2020/NSW_LGA_POLYGON_shp.shp')
lga_coord = lga_coord[['NSW_LGA__3','geometry']]
lga_coord = lga_coord.drop_duplicates('NSW_LGA__3')
Create a label encoding of the categorical variable 'result'.
df['result'] = [1 if i == 'Case - Confirmed' else 0 for i in df['result']]
Group by LGA and for each COVID-19 case occurence, obtain the number of cases, number of tests and the cumulative number cases.
df['case_count'] = df.groupby(['lga_name19'], as_index=False)['result'].transform('sum')
df['test_count'] = df.groupby(['lga_name19'], as_index=False)['result'].transform('count')
df['cum_count'] = df.groupby(['lga_name19'], as_index=False)['result'].transform('cumsum')
Group the data by LGA and using the 'result' column obtain the sum which is the number of confirmed cases and the count which is the number of tests conducted.
grouped_data = df.groupby(['lga_name19'], as_index=False)['result'].sum()
grouped_data_tmp = df.groupby(['lga_name19'], as_index=False)['result'].count()
Rename the new columns and copy the test count from the temporary dataframe to the main one.
grouped_data = grouped_data.rename({'result': 'case_count'}, axis='columns')
grouped_data['test_count'] = grouped_data_tmp['result'].copy()
Edit the LGA names column to match the shapefile.
# Strip the cities and areas tag from the LGA names (The suffix part in brackets)
grouped_data['lga_name19'] = [name.split('(')[0].rstrip() for name in grouped_data['lga_name19']]
# Make a backup of the LGA names
grouped_data['NSW_LGA__3_orig'] = grouped_data['lga_name19'].copy()
# Rename the column to match the shapefile
grouped_data = grouped_data.rename(columns={'lga_name19': 'NSW_LGA__3'})
# Convert the LGA names to uppercase to match the shapefile
grouped_data['NSW_LGA__3'] = [name.upper() for name in grouped_data['NSW_LGA__3']]
df['NSW_LGA__3'] = [name.upper() for name in df['lga_name19']]
Join the shapefile with the grouped data.
latest_df = grouped_data.merge(lga_coord, on='NSW_LGA__3')
Create a colour scale using the cumulative case counts.
max_colour = max(df['cum_count'])
min_colour = min(df['cum_count'])
cmap = cm.linear.YlOrRd_09.scale(min_colour, max_colour)
latest_df['colour'] = latest_df['case_count'].map(cmap)
df['colour'] = df['cum_count'].map(cmap)
Make a style dictionary for the slider map which is done by creating a dictionary where each date has a colour and an opacity.
lga_list = latest_df['NSW_LGA__3'].tolist()
lga_idx = range(len(lga_list))
style_dict = {}
for i in lga_idx:
lga = lga_list[i]
result = df[df['NSW_LGA__3'] == lga]
result = result.drop_duplicates('test_date', keep='last') # remove duplicate dates
# set the date as the index
result.index = pd.DatetimeIndex(result['test_date'])
# choose relevant columns and discard others
result = result[['NSW_LGA__3', 'cum_count', 'case_count', 'colour']]
# recreate the index by filling missing dates with empty rows
idx = pd.date_range(start_date, curr_date)
result = result.reindex(idx, fill_value=np.NaN)
# Fill the missing data
result = result.fillna(method='ffill')
result['cum_count'] = result['cum_count'].fillna(0)
result['case_count'] = result['case_count'].fillna(method='bfill')
result['NSW_LGA__3'] = result['NSW_LGA__3'].fillna(method='bfill')
result['colour'] = result['colour'].fillna(cmap(0))
# Change the date time to epochs which is a compatible format which the Folium map function
datetime_index = pd.DatetimeIndex(result.index)
dt_index_epochs = datetime_index.astype(int) // 10**9
dt_index = np.array(dt_index_epochs).astype('U10')
result['dt_index'] = dt_index
# Create the style dictionary
inner_dict = {}
for _, r in result.iterrows():
inner_dict[r['dt_index']] = {'color': r['colour'], 'opacity': 0.5}
style_dict[str(i)] = inner_dict
Create a dataframe with unique coordinates.
lga_df = latest_df[['geometry']]
lga_gdf = gpd.GeoDataFrame(lga_df).reset_index().drop('index', axis=1)
Create the base map zoomed in on Sydney.
slider_map = folium.Map(location=[-33.88, 151.10], height=600, zoom_start=12, min_zoom=None,
max_bounds=True, tiles='cartodbpositron')
Add data and create the time slider.
TimeSliderChoropleth(data=lga_gdf.to_json(),
styledict=style_dict,
name='Confirmed NSW cases over time',
).add_to(slider_map)
Add the legend to the map showing the colour scale.
_ = cmap.add_to(slider_map)
cmap.caption = "Number of Confirmed COVID-19 Cases"
Set up and overlay another map allowing the display of tooltips containing relevant data by editing the opacity on the new map to make it transparent so that the original map is visible.
confirmed_series = latest_df.set_index('NSW_LGA__3')['case_count']
lga_df = latest_df[['geometry','case_count','test_count','NSW_LGA__3_orig']]
lga_gdf = gpd.GeoDataFrame(lga_df).reset_index().drop('index', axis=1)
def style_function_latest(feature):
cntr = confirmed_series.get(int(feature['id'][-5:]), None)
return {
'fillOpacity': 0,
'weight': 0,
'line_opacity': 0.2,
'fillColor': '#black' if cntr is None else cmap(cntr)
}
folium.GeoJson(
lga_gdf.to_json(),
name='Confirmed Cases To Date',
style_function=style_function_latest,
highlight_function=lambda x: {'weight':3, 'color':'black'}, # Makes outlines look pretty
smooth_factor=2.0,
tooltip=folium.GeoJsonTooltip(
fields=['NSW_LGA__3_orig', 'case_count', 'test_count'],
aliases=['Council Area', 'Confirmed Cases as of '+curr_date_str, 'Tests as of '+curr_date_str],
localize=True,
)
).add_to(slider_map)
Add labels for NSW suburbs to prettify the map.
folium.map.CustomPane('labels').add_to(slider_map)
folium.TileLayer('CartoDBPositronOnlyLabels',
pane='labels').add_to(slider_map)
Save the map locally as a HTML file.
slider_map.save(r'./map.html')