Mike Conway, 04 Jan 2022
This notebook utilizes and extends code generated by Cole Citrenbaum
We will compare the following stigmatizing and less stigmatizing terms, generating three plots below:
"substance user", "substance users", "substance use"
and
"substance abuser", "substance abusers", "substance abuse"
First, import relevant Python3 libraries:
from pytrends.request import TrendReq
import datetime as dt
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
%matplotlib inline
Second, call pytrends library with relevant terms and generate pandas data frame
pytrends = TrendReq(hl='en-US', tz=360) # trend request
kw_list = ["\"substance abuser\" + \"substance abusers\" + \"substance abuse\"", "\"substance user\" + \"substance users\" + \"substance use\"" ] # keyword list
pytrends.build_payload(kw_list, cat=0, timeframe='2004-01-01 2021-12-31', geo='US', gprop='') # build pytrends data
trendinfo = pd.DataFrame(pytrends.interest_over_time()) # interest over time, dataframe format
# SHOWS ALL ROWS
#pd.set_option('display.max_rows',trendinfo.shape[0]+1)
# SHOWS HEAD (first 5 rows in dataframe)
trendinfo.head()
"substance abuser" + "substance abusers" + "substance abuse" | "substance user" + "substance users" + "substance use" | isPartial | |
---|---|---|---|
date | |||
2004-01-01 | 76 | 4 | False |
2004-02-01 | 87 | 0 | False |
2004-03-01 | 100 | 7 | False |
2004-04-01 | 70 | 4 | False |
2004-05-01 | 88 | 2 | False |
Third, plotting
# define variables for plotting
indices = trendinfo.index # in datetime format
x = np.array(trendinfo['"substance abuser" + "substance abusers" + "substance abuse"']).reshape((-1, 1)) # substance use array
y = np.array(trendinfo['"substance user" + "substance users" + "substance use"']).reshape((-1, 1)) # substance abuse array
fig, ax = plt.subplots()
ax.scatter(indices, y, color='g', s=1, label='\"substance use\" OR \"substance user\" OR \"substance users\"')
ax.scatter(indices, x, color='r', s=1, label='\"substance abuse\" OR \"substance abuser\" OR \"substance abusers"')
plt.legend(fontsize=8.9, bbox_to_anchor=(0, 1), loc='upper left', borderaxespad=0.)
plt.ylabel('Search Interest')
plt.xlabel('Year')
plt.show()
fig.savefig("substance_use___substance_abuse.png", dpi=300)