Amplifying Insights into Web Traffic Using Google Analytics and Time Series Modeling

Resources

Sep 3, 2024

Amplifying Insights into Web Traffic Using Google Analytics and Time Series Modeling

‍

In today's data-driven world, understanding the ebb and flow of web traffic can be the linchpin in a successful marketing strategy. Whether you're a seasoned marketer or a newcomer, mastering the art of time series can elevate your insights and strategic decisions to new heights. This powerful technique unravels the complex tapestry of web analytics, allowing you to pinpoint underlying trends, seasonal patterns, and unexpected anomalies that affect your website's performance.

Starting with the extraction of web traffic data from Google Analytics 4 (GA4) through BigQuery, we'll show you how to leverage SQL to efficiently pull and process large datasets.

Once the data is extracted, we'll transition into Python, where we'll employ libraries such as Pandas and Numpy for data cleaning and manipulation.

Finally, armed with this refined data and insights, we'll dive into forecasting future web traffic trends using Prophet. This not only enhances your ability to predict future trends but also gives you actionable insights that can drive conversion rates and optimize user engagement. Join us on this journey to transform your approach to digital marketing with data insights that were once hidden in plain sight.

‍

Table of content

What's time Series Analysis
Time Series Analysis in Web Analytics
Extracting Data using bigquery
Exporting data into Python using Google Colab
Cleaning and setting up data
Prophet and Forecasting

What is time-series analysis?

‍

The term time-series analysis (TSA) refers to the statistical approach to time-series or the analysis of trend and seasonality. It is often an ad hoc exploration and analysis that usually involves visualizing distributions, trends, cyclic patterns, and relationships between features, and between features and the target(s).

More generally, we can say TSA is roughly exploratory data analysis (EDA) that's specific to time-series data. This comparison can be misleading however since TSA can include both descriptive and exploratory elements.

Therefore, TSA is the initial investigation of a dataset with the goal of discovering patterns, especially trend and seasonality, and obtaining initial insights, testing hypotheses, and extracting meaningful summary statistics.

Definition: Time-Series Analysis (TSA) is the process of extracting a summary and other statistical information from time-series, most importantly, the analysis of trend and seasonality.

A part of TSA is collecting and reviewing data, examining the distribution of variables (and variable types), and checking for errors, outliers, and missing values. Some errors, variable types, and anomalies can be corrected, therefore EDA (Exploratory Data Analysis ) is often performed hand in hand with preprocessing and feature engineering, where columns and fields are selected and transformed. The whole process from data loading to machine learning is highly iterative and may involve multiple instances of TSA at different points.

Here are a few crucial steps for working with time-series:

Data Cleaning
Understanding Variables
Uncovering relationships between varaibles
Identifying trends and seasonality
Preprocessing (Including feature engineering) Training a Machine Learning Model

The steps belonging to TSA and leading to preprocessing (feature engineering) and machine learning are highly iterative, and can be visually appreciated in the following time-series machine learning flywheel:

‍

This flywheel emphasizes the iterative nature of the work. For example, data cleaning comes often after loading the data, but will come up again after we've made another discovery about our variables.

We'll typically look at histograms or distribution plots. For assessing relationships between features and target variables, we'd calculate correlations and visualize them as a correlation heatmap, where the correlation strength between variables is mapped to colors.

‍

TSA in Web Analytics

‍

Time-series analysis (TSA) offers powerful tools to uncover insights hidden within the vast amounts of data generated by websites every day. In the context of web analytics, TSA helps us move beyond basic descriptive metrics and explore deeper patterns, trends, and relationships that drive user behavior. By applying TSA to web analytics, we can better understand how users interact with a site over time, identify opportunities for growth, and optimize for better performance.

TSA becomes particularly valuable in web analytics because most web data is inherently time-based. Metrics such as page views, session durations, conversion rates, and bounce rates all change over time, influenced by a range of factors like seasonality, marketing efforts, or external events. Here's how TSA can be applied to various aspects of web analytics:

‍‍

1. Analyzing Traffic Patterns

One of the primary applications of TSA in web analytics is to understand traffic patterns. By using TSA, you can identify both long-term trends (e.g., a steady increase in traffic due to improved SEO) and seasonal patterns (e.g., a drop in traffic every weekend). This information helps in making data-driven decisions, like when to launch a new campaign or what content to promote at specific times.

‍2. Optimizing Conversion Rates

TSA can also be used to optimize conversion rates by examining how they fluctuate over time. Understanding these fluctuations allows us to identify periods of low conversion, which may correlate with issues in the customer journey or external factors, and high conversion periods, which might indicate the success of specific campaigns. By detecting these patterns, we can adjust strategies and allocate resources more effectively.‍

3. Detecting Anomalies and Outliers

Websites often experience sudden spikes or drops in metrics like traffic, bounce rates, or conversions. TSA helps detect these anomalies quickly by identifying deviations from expected patterns. Whether it's a sudden traffic drop due to site downtime or a spike caused by a viral post, TSA provides a framework to understand these events and react accordingly.

‍4. Understanding User Engagement

TSA can also uncover patterns in user engagement metrics such as session duration, pages per session, and return visits. By analyzing these metrics over time, we can discover periods of high or low engagement and adjust the content or design to improve user retention. For example, if engagement drops during certain hours or days, you might experiment with content scheduling or interface changes.‍

5. Forecasting Future Trends

Forecasting is another powerful application of TSA in web analytics. By using models like Prophet or ARIMA, you can predict future trends in key metrics such as traffic, sales, or conversion rates. This helps in planning and resource allocation, ensuring that the website is prepared for expected surges in traffic or potential downturns.

Tools and Techniques for Applying TSA in Web Analytics

Applying TSA to web analytics data often involves a mix of tools and techniques. Python libraries like Prophet and statsmodels offer robust options for time-series forecasting and analysis. Web analytics platforms, such as Google Analytics, also provide basic time-series analysis capabilities, although more customized and in-depth analysis typically requires exporting data and using specialized tools.

When using TSA in web analytics, it’s important to choose the right technique for your data and objectives. Decomposition techniques can help separate seasonality and trend from noise, while forecasting methods can provide actionable insights into future behavior.

Overcoming Challenges in TSA for Web Analytics

While TSA is a powerful approach, applying it to web data is not without challenges. Issues like missing data, irregular time intervals, and outliers can complicate analysis. However, by carefully preprocessing data, choosing appropriate models, and validating results, these challenges can be managed effectively to ensure reliable insights.

‍

Extracting GA4 (Google Analytics 4) Web Analytics data using BigQuery

To leverage time-series analysis effectively, we first need to get our hands on the right data. Google Analytics 4 (GA4) provides a treasure trove of information about user behavior on your website, but working with raw data directly within GA4 can be limiting. That's where BigQuery comes in.

BigQuery, a powerful data warehouse tool provided by Google Cloud, enables us to query and analyze large datasets with speed and flexibility. By exporting GA4 data to BigQuery, we gain the ability to run custom SQL queries, extract exactly the data we need, and prepare it for time-series analysis.

‍

Note on BigQuery Setup

Unfortunately, setting up BigQuery and configuring the GA4 integration is beyond the scope of this blog. To effectively perform time-series analysis, you will need to have your GA4 data streaming into BigQuery and a sufficient amount of historical data collected on a daily basis. If you haven’t set this up yet, you may need to consult the Google Analytics 4 documentation to enable BigQuery export and ensure that your dataset is being populated with the necessary data for analysis.

Let's begin .......

WITH Clause: Temporary Table Preparation

To prepare the data, the query starts with a WITH clause, creating a temporary table named prep. This table is designed to clean and process raw data from GA4, computing important metrics for each session. The first step is to convert the event_date string into a proper date format using min(parse_date('%Y%m%d', event_date)) as date to find the earliest date associated with each user. The user_pseudo_id is used to uniquely identify each user within GA4, serving as a basis for user-level metrics.

1with prep as (
2select
3  min(parse_date('%Y%m%d', event_date)) as date,
4  user_pseudo_id,
5  concat(user_pseudo_id, (select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id,
6  coalesce(array_agg((select value.string_value from unnest(event_params) where key = 'source') ignore nulls order by event_timestamp)[safe_offset(0)],'(direct)') as source,
7  coalesce(array_agg((select value.string_value from unnest(event_params) where key = 'medium') ignore nulls order by event_timestamp)[safe_offset(0)],'(none)') as medium,
8  coalesce(array_agg((select value.string_value from unnest(event_params) where key = 'campaign') ignore nulls order by event_timestamp)[safe_offset(0)],'(not set)') as campaign,
9  max((select value.int_value from unnest(event_params) where event_name = 'session_start' and key = 'ga_session_number')) as session_number,
10  max((select value.string_value from unnest(event_params) where key = 'session_engaged')) as session_engaged,
11  max((select value.int_value from unnest(event_params) where key = 'engagement_time_msec')) as engagement_time_msec,
12  count(event_name) as event_count,
13  countif(event_name = 'click') as specific_event_count,
14  countif(event_name = 'purchase') as conversions,
15  sum(ecommerce.purchase_revenue) as total_revenue
16from
17  `PUT-YOUR-TABLE-HERE.events_*`
18where
19  _table_suffix between '20230101' and format_date('%Y%m%d', date_sub(current_date(), interval 1 day))
20group by
21  user_pseudo_id,
22  session_id)
23

min(parse_date('%Y%m%d', event_date)) as date converts the event_date string into a proper date format and finds the earliest date for each user.

user_pseudo_id serves as the unique identifier for each user in GA4.

concat(user_pseudo_id, (select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id creates a unique session ID by combining the user ID with the session ID extracted from the nested event_params.

coalesce(array_agg((select value.string_value from unnest(event_params) where key = 'source') ignore nulls order by event_timestamp)[safe_offset(0)],'(direct)') as source, along with similar statements for medium and campaign, extracts the first non-null value from the event_params array for each parameter and provides a default value if none exists.

max(...) extracts the maximum values of metrics such as session_number, session_engaged, and engagement_time_msec, using nested selects to handle data stored in nested fields.

count(event_name) calculates the total count of events for each session.

countif(event_name = 'click') as specific_event_count and countif(event_name = 'purchase') as conversions count specific events like 'clicks' and 'purchases' to provide targeted metrics.

sum(ecommerce.purchase_revenue) as total_revenue aggregates the total revenue from purchases.

The where clause filters data to include only events within a date range from '20230101' to one day before the current date.

group by user_pseudo_id, session_id groups the data by user and session, preparing it for aggregation and further analysis.

‍

Main Query: Transforming and Aggregating the Data

The main query retrieves data from the prep table and performs further transformations.

1select
2  date,
3  case 
4    when source = '(direct)' and (medium in ('(not set)','(none)')) then 'Direct'
5    when regexp_contains(campaign, 'cross-network') then 'Cross-network'
6    when (regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
7      or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$'))
8      and regexp_contains(medium, '^(.*cp.*|ppc|paid.*)$') then 'Paid Shopping'
9    when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
10      and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Search'
11    when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
12      and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Social'
13    when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
14      and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Video'
15    when medium in ('display', 'banner', 'expandable', 'interstitial', 'cpm') then 'Display'
16    when regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
17      or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$') then 'Organic Shopping'
18    when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
19      or medium in ('social','social-network','social-media','sm','social network','social media') then 'Organic Social'
20    when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
21      or regexp_contains(medium,'^(.*video.*)$') then 'Organic Video'
22    when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
23      or medium = 'organic' then 'Organic Search'
24    when regexp_contains(source,'email|e-mail|e_mail|e mail')
25      or regexp_contains(medium,'email|e-mail|e_mail|e mail') then 'Email'
26    when medium = 'affiliate' then 'Affiliates'
27    when medium = 'referral' then 'Referral'
28    when medium = 'audio' then 'Audio'
29    when medium = 'sms' then 'SMS'
30    when medium like '%push'
31      or regexp_contains(medium,'mobile|notification') then 'Mobile Push Notifications'
32    else 'Unassigned'
33  end as session_default_channel_group,
34  source as session_source,
35  medium as session_medium,
36  concat(source,' / ',medium) as session_source_medium,
37  campaign as session_campaign,
38  count(distinct user_pseudo_id) as total_users,
39  count(distinct case when engagement_time_msec > 0  or session_engaged = '1' then user_pseudo_id end) as active_users,
40  count(distinct session_id) as sessions,
41  count(distinct case when session_engaged = '1' then session_id end) as engaged_sessions,
42  safe_divide(sum(engagement_time_msec/1000),count(distinct case when session_engaged = '1' then session_id end)) as average_engagement_time_seconds,
43  safe_divide(count(distinct case when session_engaged = '1' then session_id end),count(distinct user_pseudo_id)) as engaged_sessions_per_user,
44  safe_divide(count(distinct case when session_engaged = '1' then session_id end),count(distinct session_id)) as engagement_rate,
45  safe_divide(sum(event_count),count(distinct session_id)) as events_per_session,
46  sum(event_count) as total_event_count,
47  sum(specific_event_count) as specific_event_count,
48  sum(conversions) as conversions,
49  safe_divide(sum(conversions),count(distinct session_id)) as session_conversion_rate,
50  ifnull(sum(total_revenue),0) as total_revenue
51from
52  prep
53group by
54  date,
55  session_default_channel_group
56  ,session_source
57  ,session_medium
58  ,session_source_medium
59  ,session_campaign
60order by
61  total_users desc
62

‍

date selects the minimum event date for each user.

case ... end as session_default_channel_group classifies sessions into channel groups (like 'Direct', 'Paid Search', 'Organic Social') based on the values of source, medium, and campaign, using regular expressions and conditional logic.

source as session_source, medium as session_medium, and concat(source, ' / ', medium) as session_source_medium provide source and medium attributes, along with a combined string for easier reference.

count(distinct user_pseudo_id) as total_users calculates the total number of unique users for each combination of date and channel group.

count(distinct case when engagement_time_msec > 0 or session_engaged = '1' then user_pseudo_id end) as active_users counts users with positive engagement time or flagged as 'engaged'.

count(distinct session_id) as sessions and count(distinct case when session_engaged = '1' then session_id end) as engaged_sessions compute the total number of sessions and engaged sessions.

safe_divide(...) calculates various performance metrics, including average_engagement_time_seconds, engaged_sessions_per_user, engagement_rate, events_per_session, and session_conversion_rate, ensuring safe division to avoid errors if the denominator is zero.

sum(...) aggregates totals such as total_event_count, specific_event_count, conversions, and total_revenue to provide a comprehensive summary for each channel group.

group by segments the results by date, channel group, and session attributes.

order by total_users desc sorts the output by the total number of users in descending order, highlighting the most active groups.

‍

Full Query for Reference

Here’s the complete query, all in one place, for you to easily copy and use:

1with prep as (
2select
3  min(parse_date('%Y%m%d', event_date)) as date,
4  user_pseudo_id,
5  concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id,
6  coalesce(array_agg((select value.string_value from unnest(event_params) where key = 'source') ignore nulls order by event_timestamp)[safe_offset(0)],'(direct)') as source,
7  coalesce(array_agg((select value.string_value from unnest(event_params) where key = 'medium') ignore nulls order by event_timestamp)[safe_offset(0)],'(none)') as medium,
8  coalesce(array_agg((select value.string_value from unnest(event_params) where key = 'campaign') ignore nulls order by event_timestamp)[safe_offset(0)],'(not set)') as campaign,
9  max((select value.int_value from unnest(event_params) where event_name = 'session_start' and key = 'ga_session_number')) as session_number,
10  max((select value.string_value from unnest(event_params) where key = 'session_engaged')) as session_engaged,
11  max((select value.int_value from unnest(event_params) where key = 'engagement_time_msec')) as engagement_time_msec,
12  count(event_name) as event_count,
13  -- change event_name to the event(s) you want to count
14  countif(event_name = 'click') as specific_event_count,
15  -- change event_name to the conversion event(s) you want to count
16  countif(event_name = 'purchase') as conversions,
17  sum(ecommerce.purchase_revenue) as total_revenue
18from
19  `your_own_dataset_here.events_*`
20where
21
22-- Below you can select the date frame you need or want to explore 
23  _table_suffix between '20230101' and format_date('%Y%m%d', date_sub(current_date(), interval 1 day))
24group by
25  user_pseudo_id,
26  session_id)
27
28select
29  date,
30  case 
31    when source = '(direct)' and (medium in ('(not set)','(none)')) then 'Direct'
32    when regexp_contains(campaign, 'cross-network') then 'Cross-network'
33    when (regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
34      or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$'))
35      and regexp_contains(medium, '^(.*cp.*|ppc|paid.*)$') then 'Paid Shopping'
36    when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
37      and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Search'
38    when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
39      and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Social'
40    when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
41      and regexp_contains(medium,'^(.*cp.*|ppc|paid.*)$') then 'Paid Video'
42    when medium in ('display', 'banner', 'expandable', 'interstitial', 'cpm') then 'Display'
43    when regexp_contains(source,'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
44      or regexp_contains(campaign, '^(.*(([^a-df-z]|^)shop|shopping).*)$') then 'Organic Shopping'
45    when regexp_contains(source,'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
46      or medium in ('social','social-network','social-media','sm','social network','social media') then 'Organic Social'
47    when regexp_contains(source,'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
48      or regexp_contains(medium,'^(.*video.*)$') then 'Organic Video'
49    when regexp_contains(source,'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
50      or medium = 'organic' then 'Organic Search'
51    when regexp_contains(source,'email|e-mail|e_mail|e mail')
52      or regexp_contains(medium,'email|e-mail|e_mail|e mail') then 'Email'
53    when medium = 'affiliate' then 'Affiliates'
54    when medium = 'referral' then 'Referral'
55    when medium = 'audio' then 'Audio'
56    when medium = 'sms' then 'SMS'
57    when medium like '%push'
58      or regexp_contains(medium,'mobile|notification') then 'Mobile Push Notifications'
59    else 'Unassigned'
60  end as session_default_channel_group,
61  source as session_source,
62  medium as session_medium,
63  concat(source,' / ',medium) as session_source_medium,
64  campaign as session_campaign,
65  count(distinct user_pseudo_id) as total_users,
66  count(distinct case when engagement_time_msec > 0  or session_engaged = '1' then user_pseudo_id end) as active_users,
67  count(distinct session_id) as sessions,
68  count(distinct case when session_engaged = '1' then session_id end) as engaged_sessions,
69  safe_divide(sum(engagement_time_msec/1000),count(distinct case when session_engaged = '1' then session_id end)) as average_engagement_time_seconds,
70  safe_divide(count(distinct case when session_engaged = '1' then session_id end),count(distinct user_pseudo_id)) as engaged_sessions_per_user,
71  safe_divide(count(distinct case when session_engaged = '1' then session_id end),count(distinct session_id)) as engagement_rate,
72  safe_divide(sum(event_count),count(distinct session_id)) as events_per_session,
73  sum(event_count) as total_event_count,
74  sum(specific_event_count) as specific_event_count,
75  sum(conversions) as conversions,
76  safe_divide(sum(conversions),count(distinct session_id)) as session_conversion_rate,
77  ifnull(sum(total_revenue),0) as total_revenue
78from
79  prep
80group by
81  date,
82  session_default_channel_group
83  ,session_source
84  ,session_medium
85  ,session_source_medium
86  ,session_campaign
87order by
88  total_users desc

Now that you’ve run the query, you should see results similar to the table below. This output captures all the key metrics we’ve calculated, giving you a comprehensive view of your website's traffic and user behavior patterns.

Bigquery Data Extraction Result

With our data now ready in BigQuery, the next step is to take it further by using Python in Google Colab. Here, we’ll perform data cleaning, conduct exploratory data analysis (EDA), and create insightful visualizations. Finally, we’ll move into advanced time-series forecasting with Prophet to make machine learning-driven predictions. Let’s get started by connecting Google Colab to BigQuery and diving into some hands-on Python work!

‍

Connecting Your BigQuery Data to Google Colab

Getting Started with Google Colab

To kick off our analysis, we first need to open Google Colab. If you’ve already done this before, great! All you need is a Gmail account to get started. If not, here’s a quick guide:

Open Google Colab: Head over to Google Colab in your web browser. If you’re not signed in yet, log in with your Gmail account.
Create a New Notebook: Once you’re on the Google Colab homepage, click on the "New Notebook" button. This will create a fresh notebook where we’ll do all our data analysis, visualization, and machine learning magic.

Google colab site

Your notebook should look something like this when you open it up—clean, fresh, and ready for action!

New Notebook Google colab

‍

Let's start ....

Python Code to Connect Google Colab to Bigquery

To connect your Google Colab notebook with BigQuery, we'll start by installing the necessary libraries and authenticating your Google account. This will allow us to access and query the data stored in BigQuery directly from our Python environment, setting the stage for all our upcoming data cleaning, visualization, and analysis.

To successfully connect your data from Bigquery to Google colab you need to get the job information from your bigquery result to get that you need to click on the "Job Information" tab in bigquery

Accessing "Job Information Tab" in Bigquery

From your job information you will need to grab you project name location and job you can locate all this information from the Job ID

‍

Job ID information in Bigquery

Project: python-bigquery
Location: 'US'
JOB: bquxjob_ ......... (Note: Job will ill use later to bring the query and the actual data)

Initializing

1# @title Setup
2from google.cloud import bigquery
3from google.colab import data_table
4import bigframes.pandas as bpd
5
6project = 'your_project_here' # Project ID inserted based on the query results selected to explore
7location = 'US' # Location inserted based on the query results selected to explore
8client = bigquery.Client(project=project, location=location)
9data_table.enable_dataframe_formatter()

Authenticate your Google Account

1from google.colab import auth
2auth.authenticate_user()

‍

after authenticating you account you will need to extract

Reference the SQL syntax from the original job

Use the jobs.query method to return the SQL syntax from the job. This can be copied from the output cell below to edit the query now or in the future.

# Running this code will display the query used to generate your previous job

job = client.get_job('bquxjob_60673b09_191b40daf90') # Job ID inserted based on the query results selected to explore
print(job.query)

this should print the query we worked before in Bigquery

Output client.get_job bigquery in Google Colab

Now we will convert the output from bigquery into a dataframe so we can start working with our python libraries

converting Bigquery output into dataframe


job = client.get_job('bquxjob_567ad7a7_191b966f30e') # Job ID inserted based on the query results selected to explore
df = job.to_dataframe()
df

Next we will import some of the python libraries we will use

‍

# Importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

‍

Renaming column title for an easier manipulation:

df = df.rename(columns={'session_default_channel_group': 'channel'})

‍

Deep Dive into Organic Search: Uncovering Insights with Time-Series Analysis

‍

While the code we've developed so far is versatile enough to analyze overall traffic and any marketing channel, we’re going to zoom in on one channel in particular: Organic Search. Why Organic Search? Because it often holds untapped insights about user behavior and engagement. But don’t worry—everything we’ll cover can be easily applied to any other channel. The principles and techniques we use to dissect Organic Search traffic are fully adaptable, allowing you to unlock meaningful insights from whichever channel matters most to you!

Visualizing Organic Search Traffic: A first Look

To kick things off, let’s start with a simple yet powerful visualization to get an initial sense of how our Organic Search traffic looks over time. We’ll use Python to create a time-series plot that displays the daily total users from this channel. Remember, while we’re focusing on Organic Search, you can easily adapt this code to visualize data for any other channel by modifying the filter conditions.

‍

import matplotlib.pyplot as plt

# Filter the data for 'Organic Search' channel
organic_search_data = tsa_analysis[tsa_analysis['channel'] == 'Organic Search']

# Create a plot for total users from Organic Search
plt.figure(figsize=(20, 6))
plt.plot(organic_search_data['total_users'], color='#1f77b4', linewidth=2, marker='o', markersize=6, linestyle='-', label='Total Users')

# Add titles and labels
plt.title('Daily Total Users from Organic Search', fontsize=18, fontweight='bold', color='#333')
plt.xlabel('Date', fontsize=14, fontweight='bold')
plt.ylabel('Total Users', fontsize=14, fontweight='bold')

# Customize the grid
plt.grid(visible=True, linestyle='--', linewidth=0.5, alpha=0.7)

# Add a legend
plt.legend(loc='upper right', fontsize=12)

# Show the plot
plt.tight_layout()
plt.show()

This code snippet filters the data to focus specifically on the Organic Search channel, extracting daily totals for user visits. It uses Matplotlib to create a clean, simple time-series plot that displays daily user counts with enhanced visual elements, like color, markers, and gridlines. The goal here is to get a quick visual sense of the traffic trends and identify any noticeable patterns, spikes, or drops in user behavior.

Feel free to replicate this approach for any other channel by changing the filter criteria, and you'll have a ready-to-use visualization tailored to your specific needs!

‍

Organic Search Behaviour

Converting to Datetime for Accurate Time-Series Analysis

To perform any serious time-series analysis, it’s crucial to ensure that our date data is in the correct format. By converting the date column to a datetime object, we enable Python to recognize it as a proper date, which is essential for accurate plotting, resampling, and other time-based manipulations. Here’s how to do it:

# Convert the 'date' column to datetime format
organic_search_data['date'] = pd.to_datetime(organic_search_data['date'])

Spotting Outliers: The Next Step in Time-Series Analysis

Before diving deeper into our analysis, it's crucial to check for outliers—those unexpected spikes or drops in traffic that could skew our results. Outliers in time-series data can be caused by a variety of factors, like technical issues, seasonal events, or marketing campaigns, and they can significantly affect our insights if not properly accounted for.

While we’ll cover outlier detection in more depth in an upcoming blog post, here’s a quick example of how to identify potential outliers in our Organic Search data:

sns.boxplot(organic_search_data['total_users'], whis=1.5)

‍

Amplifying Insights into Web Traffic Using Google Analytics and Time Series Modeling

What is time-series analysis?

TSA in Web Analytics

Tools and Techniques for Applying TSA in Web Analytics

Overcoming Challenges in TSA for Web Analytics

Extracting GA4 (Google Analytics 4) Web Analytics data using BigQuery

WITH Clause: Temporary Table Preparation

Main Query: Transforming and Aggregating the Data

Connecting Your BigQuery Data to Google Colab

Getting Started with Google Colab

Python Code to Connect Google Colab to Bigquery

Authenticate your Google Account

Reference the SQL syntax from the original job

‍

Deep Dive into Organic Search: Uncovering Insights with Time-Series Analysis

Visualizing Organic Search Traffic: A first Look

Converting to Datetime for Accurate Time-Series Analysis

Spotting Outliers: The Next Step in Time-Series Analysis

Related posts

Your Ultimate Guide to Finding the Perfect Google Analytics Consultant

The Power of Data: Understanding Marketing Analytics

How to track leads in Google Analytics via CRM

Amplifying Insights into Web Traffic Using Google Analytics and Time Series Modeling

Subscribe to receive articles right in your inbox

Related posts

Your Ultimate Guide to Finding the Perfect Google Analytics Consultant

The Power of Data: Understanding Marketing Analytics

How to track leads in Google Analytics via CRM