Docs¶
gsvi.connection¶
Holds GoogleConnection class.
This module provides the interface to Google Trends via the GoogleConnection class.
Interacts with GT’s time series widget via the get_timeseries() method,
related queries via get_related_queries().
-
class
gsvi.connection.GoogleConnection(language='en-US', timezone=0, timeout=5.0, verbose=False)¶ Connection to Google Trends.
Offers the interface to Google Trends. For now, it connects to the time-series widget and related queries widget.
-
language¶ The language, defaults to ‘en-US’
-
timezone¶ The timezone in minutes, defaults to 0
-
timeout¶ The timeout for the GET-requests.
-
verbose¶ Print request URLs?
Raises: requests.exceptions.RequestExceptionMakes the related-queries request to Google Trends for the specified queries. This method only does very basic input checks as this is handled by the objects using the connection.
Parameters: - queries –
The queries as a list of dicts with ranges as tuples of datetime objects. A maximum of 5 queries is supported. Example:
[{'key': 'apple', 'geo': 'US', 'range': (start, end)} {'key': 'orange', 'geo': 'US', 'range': (start, end)}
- category – The category for the query, defaults to CategoryCodes.NONE.
Returns: A dict of keywords with a pd.Dataframes for top and rising queries for each key.
Raises: ValueErrorrequests.exceptions.RequestException
- queries –
-
get_timeseries(queries: List[Dict[str, Union[str, Tuple[datetime.datetime, datetime.datetime]]]], category=<CategoryCodes.NONE: 0>, granularity='DAY') → List[pandas.core.series.Series]¶ Makes the timeseries request to Google Trends for the specified queries. This method only does very basic input checks as this is handled by the objects using the connection.
Parameters: - queries –
The queries as a list of dicts with ranges as tuples of datetime objects. A maximum of 5 queries is supported. Example:
[{'key': 'apple', 'geo': 'US', 'range': (start, end)} {'key': 'orange', 'geo': 'US', 'range': (start, end)}
- category – The category for the query, defaults to CategoryCodes.NONE.
- granularity – The step length of the requested series, either ‘DAY’/’MONTH’ or ‘HOUR’. Defaults to ‘DAY’. Depending on the query ranges, the granularity returned by GT might differ. Check the SVSeries docs for details.
Returns: A list of pd.Series, one series for each query. The values are normalized over the maximal value (which is set to 100) over all queries by Trends.
Raises: ValueErrorrequests.exceptions.RequestException
- queries –
-
gsvi.timeseries¶
Holds time series request structure for Google Trends.
The SVSeries class implements an algorithm to get arbitrary-length
time series with values in [0, 100] from GT in the get_data() method.
This algorithm ensures that GT itself handles the normalization, thus
making the series easier to compare.
It can fetch uni- and multivariate queries.
Example usage:
gc = GoogleConnection(timeout=10)
start = datetime.datetime(year=2017, month=1, day=1)
end = datetime.datetime(year=2019, month=9, day=30)
series = SVSeries.multivariate(gc,
[{'key': 'apple', 'geo': 'US'},
{'key': 'microsoft', 'geo': 'US'}],
start, end, 'DAY')
data = series.get_data()
-
class
gsvi.timeseries.SVSeries(connection: gsvi.connection.GoogleConnection, queries: List[Dict[str, str]], bounds: Tuple[datetime.datetime, datetime.datetime], **kwargs)¶ Container for uni- or multivariate google search volume time series.
The main purpose of this class is to get arbitrary-length time series data from Google Trends for one or more keywords.
-
connection¶ The connection to Google Trends.
-
queries¶ The user-specified queries dicts as list [{‘key’: ‘word’, ‘geo’: ‘country’}, …].
-
bounds¶ The date range for the time series. Depending on the location of the maximum and the granularity, the lower bound may not hold (see
get_data()).
-
category¶ The category for the search volume. Possible categories are in the CategoryCodes enum.
-
granularity¶ The series granularity, either ‘DAY’, ‘HOUR’ or ‘MONTH’.
-
data¶ The search volume data after the
get_data()call.
-
request_structure¶ The query fragments in levels after the
get_data()call, showing how the optimum was obtained.
-
is_consistent¶ Flag indicating if the data is still consistent with the other attributes of the instance. This is set to True when
get_data()runs successfully.
CAUTION: One has to take care when specifying certain time span/granularity combinations. Google Trends switches from returning weekly to monthly data when the span is >= 1890 days (63 months). SVSeries can handle by extending the lower boundary date if necessary. The same happens with daily data when the span is longer than 269 days AND not a multiple of 269 days. For hourly data, the switch to minute data happens at < 3 days. This weird behavior has changed in the past and might change again in the future! See
get_data()for more on how this problem.-
get_data(delay=10, force_truncation=False) → Union[pandas.core.frame.DataFrame, pandas.core.series.Series]¶ Builds the request structure for the queries and builds requests to Google Trends such that the resulting time series values are normalized to [0, 100]. The returned data might be extended beyond the lower bound specified in the query. This is necessary because GT returns data in different intervals depending on the specified range and granularity. One can enforce the correct length but might get data not in [0, 100] in case the maximum falls into the part that gets truncated.
Parameters: - delay – Put delay seconds between requests to avoid getting banned.
- force_truncation – Truncate to the specified bounds even if the maximal volume (100) does fall into this interval. Default is to not truncate in case the maximum falls into this area.
Returns: The normalized time series as pd.Series (univariate) or pd.Dataframe (multivariate).
Raises: requests.exceptions.RequestExceptionWarning
UserWarning: in case truncation is not forced and maximum is in area to be truncated.
-
classmethod
multivariate(connection: gsvi.connection.GoogleConnection, queries: List[Dict[str, str]], start: datetime.datetime, end: datetime.datetime, **kwargs)¶ Builds a multivariate search volume series. Initially, the series holds no data. Call
get_data()to fill it.Parameters: - connection – The GoogleConnection to use for the requests.
- query – A list of query dicts.
- start – The start of the series >= 2004/01/01.
- end – The end of the series <= now
Keyword Arguments: - granularity – The granularity of the series (‘DAY’, ‘HOUR’ or ‘MONTH’). Defaults to ‘DAY’ if not given.
- category – Volume for a specfic search category (see
gsvi.catcodes). Defaults to CategoryCodes.NONE if not given.
Returns: A SVSeries with empty data.
Raises: ValueError
-
classmethod
univariate(connection: gsvi.connection.GoogleConnection, query: Dict[str, str], start: datetime.datetime, end: datetime.datetime, **kwargs)¶ Builds a univariate search volume series. Initially, the series holds no data. Call
get_data()to fill it.Parameters: - connection – The GoogleConnection to use for the requests.
- query – The query dict.
- start – The start of the series >= 2004/01/01.
- end – The end of the series <= now
Keyword Arguments: - granularity – The granularity of the series (‘DAY’, ‘HOUR’ or ‘MONTH’). Defaults to ‘DAY’ if not given.
- category – Volume for a specfic search category (see
gsvi.catcodes). Defaults to CategoryCodes.NONE if not given.
Returns: A SVSeries with empty data.
Raises: ValueError
-