AI-MLPythonpandasscikit-learnPlotly

SpaceX Falcon 9 Landing Prediction

Predicts Falcon 9 first-stage landing success from launch parameters using scraped flight data and classical ML.

Problem

A Falcon 9 launch is dramatically cheaper than competitors largely because SpaceX recovers the first stage — so whether a booster will land successfully is, in effect, a price prediction problem. Given only pre-launch parameters (payload mass, orbit type, launch site, booster version, previous flights of that core), how accurately can landing success be predicted? The project covers the entire data science lifecycle: there is no clean dataset to download — it has to be assembled, audited, explored, and modeled from scratch.

Architecture

Launch records come from two sources — the SpaceX REST API for structured launch data and BeautifulSoup scraping of Wikipedia launch tables for what the API lacks. After wrangling and one-hot encoding in pandas, exploratory analysis runs through both SQL queries (SQLite) and visual EDA, including Folium maps of launch-site geography and landing outcomes. Four classifiers are trained and compared, and the results feed an interactive Plotly Dash dashboard deployed on Render.

Tech decisions & trade-offs

Why two data sources instead of one

The SpaceX API is authoritative but incomplete for older flights, and Wikipedia's launch tables carry outcomes the API omits — neither alone covers the full flight history. Merging them means dealing with mismatched names, dates, and booster designations, which is exactly the unglamorous work that dominates real data science projects. The trade-off is a more fragile ingestion step (scrapers break when page structure changes) in exchange for a dataset that's actually complete enough to model.

Why compare four classifiers instead of picking one

Logistic regression, SVM, KNN, and decision trees were each tuned with GridSearchCV over cross-validation folds rather than crowning a favorite upfront. On a small dataset (~90 flights), model variance is high and the ranking between algorithms is genuinely unstable — the honest approach is to measure, not assume. The comparison also surfaces what matters more than the algorithm choice: feature engineering (booster flight count, orbit grouping) moved accuracy more than swapping models ever did.

Why a deployed dashboard instead of a notebook

Notebooks demonstrate analysis; dashboards demonstrate communication. The Dash app on Render lets anyone slice launch outcomes by site, payload range, and booster version without reading a single cell of code. The cost is keeping a deployment alive (cold starts on Render's free tier are real), but a recruiter clicking a link and exploring the data in ten seconds is worth more than a perfect README.

Repositories