Advanced Transaction Analytics and Customer Segmentation for Optimized Sales Strategies

The main purpose of this project is to perform customer segmentation and generate personalized product recommendations based on transaction data. By leveraging RFM analysis, clustering techniques, and association rule mining, the code aims to identify key customer segments, understand their purchasing behavior, and recommend products tailored to each segment. Additionally, it provides a comprehensive overview of the sales data, uncovering trends and patterns that can inform business decisions.

Here’s a breakdown of everything I did

Data Loading and Initial Exploration Libraries

Loaded Dataset: I began by loading the dataset from a MySQL database using SQLAlchemy. After establishing the connection to the database, I retrieved the necessary data into a pandas DataFrame , and into a CSV file for further analysis.

Imported: I imported essential libraries including numpy, pandas, seaborn, matplotlib, and plotly for data manipulation, visualization, and analysis.

Data Import: I loaded the dataset using pd.read_csv() from the data.csv file, which includes sales transaction details.

Basic Inspection: I used functions like df.head(10), df.describe().T, and a custom summary() function to inspect the dataset’s structure, identify missing values, and compute basic statistics.

Data Cleaning and Feature Engineering

Handling Missing Values: I dropped rows with missing values in critical columns such as CustomerID and Description. Transaction Status: I created a new feature, Transaction_Status, to categorize transactions as ‘Cancelled’ or ‘Completed’ based on the InvoiceNo.

Correcting Stock Codes: I filtered out incorrect or anomalous stock codes by identifying codes with too few digits. Removing Negative Prices: I removed rows with non-positive UnitPrice to ensure accuracy in subsequent analyses.

Exploratory Data Analysis

(EDA) Top Products and Sales: I identified and visualized top-selling products and total sales by country using Plotly to create interactive charts.

Customer Behavior: I performed RFM (Recency, Frequency, Monetary) analysis to segment customers based on their purchasing behavior.

Product Recommendations: I developed a recommendation system based on customer segmentation to suggest products to customers.

Customer Segmentation and Clustering

RFM Analysis: I segmented customers using RFM metrics to understand their purchasing behavior.

K-Means Clustering: I applied K-Means clustering to group customers into distinct segments, determining the optimal number of clusters using silhouette scores and the elbow method.

Customer Profiling: I analyzed clusters to identify behavior patterns and highlighted the top products for each segment.

Time-Series Analysis Sales Trends: I conducted a time-series analysis of daily sales using seasonal decomposition to identify trends, seasonality, and residuals.

Final Insights

Pareto Analysis: I applied the Pareto principle to identify the small percentage of customers contributing to the majority of sales.

RFM Merging: I merged the results of RFM analysis with customer purchase data to create a comprehensive view of customer segments.

Visualization and Reporting: I created various visualizations to summarize key insights, including bar charts for sales by country, daily sales trends, and customer segmentation.