I Almost Picked the Wrong Dataset for My Project

Starting the search

I wanted to start a new project — something with enough data for real EDA, some charts, maybe a bit of modeling. So I went to Kaggle and started looking through tourism datasets.

The first option: real, but limited

First one I found was a UNWTO dataset, 6 votes, real tourism numbers. Looked good at first. But the data stopped at 2022, and I wanted to look at recovery after COVID. 2022 just felt too early.

Then I found something better-looking

Next: "Global Tourism & Travel Trends Dataset (2019-2024)." 24 upvotes, 345 downloads. 2019 to 2024 was exactly the range I wanted — before COVID, during, and after.

I almost picked it on the spot.

The catch

Then I went back and actually read the description properly. Turns out it's synthetic — 10,000 generated records, not real recorded numbers, though the creator says it's "calibrated against UNWTO, Statista, and Booking.com reports."

That kind of changes the whole project. I can't call this a "recovery" analysis anymore — it's not showing real recovery, just realistic-looking patterns. But honestly, the dataset is still solid for what I actually want to practice: 33 features, zero missing values, covers spending, trip satisfaction, eco-friendly choices, carbon footprint, and transport modes. Plenty to dig into.

Where I landed

The project has a new name now — "Travel Behavior & Satisfaction Trends (2019-2024)" instead of "Travel Recovery Analysis." Same data, just a more honest title.

Next: opening it up and seeing what's actually in there.

#dataanalysis #python #datascience #buildinpublic #kaggle

I Almost Picked the Wrong Dataset — Here's What Changed My Mind

Starting the search

The first option: real, but limited

Then I found something better-looking

The catch

Where I landed

Comments

Travel Behavior & Satisfaction Trends

5 Things I Wish I Knew Before Doing EDA on a 33-Feature Dataset

More from this blog

5 Things I Wish I Knew Before Doing EDA on a 33-Feature Dataset

How I Built a Live NLP Dashboard That Analyzes 20,000+ Hotel Reviews

How I Built a Live AI Travel Chatbot as a CS Graduate from Pakistan

Command Palette

Starting the search

The first option: real, but limited

Then I found something better-looking

The catch

Where I landed

Comments

Travel Behavior & Satisfaction Trends

5 Things I Wish I Knew Before Doing EDA on a 33-Feature Dataset

More from this blog