Skip to main content

Command Palette

Search for a command to run...

I Almost Picked the Wrong Dataset — Here's What Changed My Mind

A quick lesson in reading dataset descriptions properly before you commit to a project name

Updated
2 min read
I Almost Picked the Wrong Dataset — Here's What Changed My Mind
Z
I'm Zeba — an AI & Data Science specialist from Lahore, Pakistan 🇵🇰 I build ML models, NLP systems, and full-stack AI applications. Here I write about what I build, what I learn, and how I think about data and AI. Projects: Travel AI Chatbot, Sentiment Analytics Dashboard, LSTM Energy Forecasting & more.

I wanted to start a new project — something with enough data for real EDA, some charts, maybe a bit of modeling. So I went to Kaggle and started looking through tourism datasets.

The first option: real, but limited

First one I found was a UNWTO dataset, 6 votes, real tourism numbers. Looked good at first. But the data stopped at 2022, and I wanted to look at recovery after COVID. 2022 just felt too early.

Then I found something better-looking

Next: "Global Tourism & Travel Trends Dataset (2019-2024)." 24 upvotes, 345 downloads. 2019 to 2024 was exactly the range I wanted — before COVID, during, and after.

I almost picked it on the spot.

The catch

Then I went back and actually read the description properly. Turns out it's synthetic — 10,000 generated records, not real recorded numbers, though the creator says it's "calibrated against UNWTO, Statista, and Booking.com reports."

That kind of changes the whole project. I can't call this a "recovery" analysis anymore — it's not showing real recovery, just realistic-looking patterns. But honestly, the dataset is still solid for what I actually want to practice: 33 features, zero missing values, covers spending, trip satisfaction, eco-friendly choices, carbon footprint, and transport modes. Plenty to dig into.

Where I landed

The project has a new name now — "Travel Behavior & Satisfaction Trends (2019-2024)" instead of "Travel Recovery Analysis." Same data, just a more honest title.

Next: opening it up and seeing what's actually in there.

#dataanalysis #python #datascience #buildinpublic #kaggle

Travel Behavior & Satisfaction Trends

Part 1 of 2

A behind-the-scenes look at building a travel data analysis project — from dataset selection and planning to exploration, visualization, and final insights. Following the full process from start to finish.

Up next

5 Things I Wish I Knew Before Doing EDA on a 33-Feature Dataset

Lessons from analyzing 10,000 travel records — mid-project, messy, and honest

More from this blog

Z

Zeba Builds

4 posts

I'm Zeba — an AI & Data Analyst from Lahore, Pakistan, building projects in public. Here I share my process: dataset exploration, EDA, visualization, and what I learn along the way. Currently working on travel behavior analysis, sentiment dashboards, and forecasting projects using Python, Power BI, and Tableau.