Kaggle Competitions & Machine Learning

Today I’m working on a predictive data analysis and machine learning project using Jupyter Notebooks, Python, pandas and scikit-learn. The goal is to “…build a predictive model that answers the question: “what sorts of people were more likely to survive (the Titanic)?” It’s a small practice project for a Python programming course I’m taking at General Assembly, and it’s been great learning about these techniques.

About the Challenge:

“The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).”


The project includes exploratory data analysis, training, tuning and ensembling different machine learning models to create a predictive model.

As part of the exercise, I’m also submitting my predictions to Kaggle to see how my results rank against the rest of the leaderboard. As I improve my understanding of the data, I can then tweak my predictive models, and resubmit my latest results. Then, repeat the whole cycle again. It’s been a great way to learn more about Kaggle, along with spending spend more time programming in Python and learning these libraries.

That’s it for today; gotta get back to learning and practicing. 🙂