Stop using Kaggle for your Data Science projects

Edward Johnson
3 min readJun 16, 2022
Photo by Boitumelo Phetla on Unsplash

You’ve finished your data science bootcamp and want to show off your first projects. You’ve used Kaggle datasets but the interviewers aren’t impressed.

You say “But everyone uses Kaggle, so why shouldn’t I “?
The reason is that interviewers have seen too many projects based on Kaggle.

In fact, using Kaggle may actually prevent your projects from standing out above the rest. The interviewers just can’t see why you are different to the other 100s of job candidates all striving for that first machine learning position.

The answer is to have an approach which makes you, your portfolio and your project stand out as unique.

So how do you do this?

  1. Find one of your authentic passions, or even a hobby.
    Everyone has something even if they don’t realize it. Something that you enjoy, or you daydream about. Stuck? Here’s a list from Wikipedia.
    https://en.wikipedia.org/wiki/List_of_hobbies
  2. Think about the data science area/skill that is important to you. Maybe it’s an area in a set of jobs you are applying for… like NLP, Computer Vision or TimeSeries. Or perhaps you want to broaden your Data Science skills with other techniques.
  3. Think about the data aspects of your chosen passion/hobby
    if you were building up a dataset in order to do some analysis or find the answer to a question/or hypothesis. Let’s take one of my hobbies, Street Photography as an example. My 8,000 pictures of Toronto and London are a great resource for demonstrating my knowledge of clustering, GANs, Image recognition, Big data processing, outliers and much more. Or another example, the sport of baseball and the huge quantity of stats, social media, video and imaging data.
  4. Build your unique project around points 1, 2 and 3
    Ideally you should be answering a business or social question.
    However the fact that this is about YOUR passion/hobby will keep you interested and motivated. The process of building your own dataset(s) will stimulate and stretch you to use those data preparation/EDA skills. Of course you can augment your own built dataset with external data sources (even from Kaggle), but the key point here is that your data and project will be unique.
  5. Think of great questions that need to be answered
    Again, the beauty of this approach is that you can decide on the question.
    For example, I created a project based around Toronto City Open Data parking tickets issued over a 10 year period. The data was perfect for a TimeSeries analysis combined with geographical aspects. But I decided to make it more interesting by augmenting my datasets with highly granular socio-economic data and asked the question “Are parking tickets more intensively issued in socially underprivileged areas of the city than others over the same times of day?”

And don’t forget when you are building that portfolio, try to make it fun as well.

--

--

Edward Johnson

Ikinique Ltd — Passionate about data science, mentorship, fintech, blockchain, Hyperledger, AI, Ethics ,…Agile… MIT Future Commerce…#IKEAization