ML/AI

Airline Fare Prediction System

ML system predicting airline ticket prices using ensemble models trained on 50,000+ fare records. Includes booking lead time analysis and interactive Streamlit interface.

PythonScikit-LearnStreamlitPandasRandomForestGradientBoosting

GitHub Repository

01 — Problem

The Challenge

Airline ticket pricing is dynamic and opaque. Prices fluctuate based on route, time, booking lead time, airline, and seat availability. Travelers lack tools to objectively assess whether a current fare is reasonable or likely to change — leading to suboptimal purchase timing decisions.

For frequent travelers and budget-conscious individuals, even a 15-20% improvement in fare prediction accuracy translates to meaningful savings. The problem is well-defined enough for ML application with available public datasets.

02 — Approach

How I Approached It

Trained ensemble ML models (RandomForest and GradientBoosting) on a dataset of 50,000+ historical fare records. Engineered features including booking lead time, route characteristics, day-of-week, and airline class. Built a Streamlit interface for interactive fare prediction with confidence intervals.

Architecture

01Data ingestion and cleaning pipeline using Pandas
02Feature engineering: lead time buckets, route encoding, temporal features
03Model training: RandomForest and GradientBoosting with cross-validation
04Ensemble combination: weighted average of model predictions
05Streamlit web interface for interactive prediction and visualization

03 — Technology

Technology Choices and Why

RandomForest

Handles non-linear fare relationships well; robust to outliers in pricing data

GradientBoosting

Captures complex feature interactions; outperforms single models on tabular pricing data

Streamlit

Rapid prototyping of interactive ML interfaces without frontend engineering overhead

Pandas

Efficient manipulation of 50K+ row dataset; rich API for feature engineering operations

04 — Challenges

Obstacles and Solutions

High variance in fare data

Applied log transformation to price target variable; improved RMSE by approximately 23% compared to raw price prediction

Feature leakage risk

Careful temporal split for train/test; validated that no future-date features were used in training set construction

Model interpretability

Added SHAP value analysis to explain top features driving individual predictions; useful for communicating model behavior

05 — Results

Outcomes

—Prediction accuracy within 15% of actual fare for 78% of test cases
—Trained on 50,000+ fare records across 6 major Indian routes
—Booking lead time identified as highest-impact feature for fare variance
—Interactive Streamlit UI with confidence interval display

06 — Learnings

What I Learned

—Target variable transformation (log-price) often matters more than model selection for skewed financial data
—Feature engineering contributes more to performance than hyperparameter tuning in most tabular ML problems
—Model interpretability is a product requirement, not just a nice-to-have

Skills Used

Machine Learning Python Data Engineering Statistical Analysis Scikit-Learn

Other Projects

Adobe Hackathon — PDF Intelligence

Enterprise-grade PDF intelligence system with semantic search, multilingual support, and c...

AgriConnect — Smart Crop Planner

Full-stack agricultural platform combining real-time weather data, ML-powered crop recomme...