Cloud/GCP Big Data Analytics
This project implements a cloud-based data analytics and machine learning pipeline using Google Cloud Platform (GCP). The solution processes large-scale e-commerce data using Cloud Storage and BigQuery, performs analytical querying and predictive modelling with BigQuery ML, and visualises insights through Looker Studio dashboards.
The system demonstrates how cloud-native services can be used to build scalable, cost-effective analytics workflows without managing infrastructure, while supporting real-time querying via Cloud Run.
Problem Statement
Organisations handling large-scale transactional data require scalable platforms to store, analyse, and extract insights efficiently. Traditional local data processing approaches struggle with performance, cost, and maintainability at scale.
This project addresses these challenges by designing a cloud-native analytics solution that enables efficient querying, customer behaviour analysis, and purchase prediction using managed GCP services.
Approach
Raw and processed datasets were stored in Google Cloud Storage.
Analytical queries were executed using BigQuery, including custom SQL for customer segmentation and revenue analysis.
A BigQuery ML logistic regression model was trained to predict future purchase behaviour based on historical activity.
Insights were visualised using Looker Studio dashboards for business interpretation.
A lightweight Cloud Run (Flask) service was deployed to query BigQuery tables and expose results through a web interface.
Results
Identified gender-based purchasing patterns across product categories, supporting targeted marketing strategies.
Built a predictive model to estimate the likelihood of future purchases using historical customer behaviour.
Designed interactive Looker Studio dashboards presenting revenue, profit, and prediction confidence metrics.
Enabled on-demand querying of analytics results through a Cloud Run web service.
Tech Stack
Google Cloud Platform (GCP) · BigQuery · BigQuery ML · Google Cloud Storage · Looker Studio · Cloud Run · Python · Flask · SQL