Data Science competitions can be daunting for someone who has never participated in one. Some of them have hundreds of competitors with top notch industry knowledge and splendid past record in such hackathons.
Thus a lot of beginners are apprehensive about getting started with these hackathons
The top 3 questions that are commonly asked:
Is it even worth it if I have minimal chance of winning?
How do I start?
How can I improve my rank in the future?
Let’s answer the first question before we go further.
Is participating in hackathons worth it?
Hackathons have some differences from "typical" data science, but they still provide valuable experience and help you learn new skills by tackling variety of problems
No need for data collection: There is no need of playing around with databases and combining them to define the problem statement as that is something which the organizers do for you
Practice: The basic premise of a hackathon is that you learn by doing stuff, by building a model, by exploring that dataset etc. etc.
Community Support: Each hackathon has its own discussion forum which gives you an excellent opportunity to peek into the thought-processes of other data scientists.
For helping you answer the next 2 questions, Analytics Vidhya brings you its newest offering to guide you and to enable you to take that leap and participate in a live hackathon going through a step by step process on not only how to approach a hackathon problem statement, make your first submission but also to improve your performance to crack those top positions at the leaderboard!
So What’s the plan?
Just like we did last weekend, this time we are back with a new problem statement. This time we will work on a regression problem and go through the steps utilised to solve a regression based ML Hackathon.
Live Hackathon Learning Experience!
During the extended Weekend starting 2nd October, we will do 2 live streams led by top hackers from Analytics Vidhya with the following plan:
First Live Stream: Build your first model & make that first Submission! (2nd October)
Problem Statement, Data Dictionary & Hypothesis Generation The first step for a hackathon or any data science project is to understand the problem statement and the possible hypothesis related to the target variable
Exploratory Data Analysis The ability to load, navigate, and plot your data (i.e. exploratory analysis) is the second step in data science because it informs the various decisions you'll make throughout model training.
Basic Rule Based Benchmark Models & making your first submission Benchmark prediction algorithm provides a set of predictions that you can evaluate as you would any predictions for your problem, such as classification accuracy or RMSE. The scores from these algorithms provide the required point of comparison when evaluating all other machine learning algorithms on your problem.
Second Live Stream: Get Serious and do feature engineering to improve performance and set final submission (3rd October)
Recap from Stream 1
Basic Preprocessing and building first ML Model Data preprocessing is an integral step in Machine Learning as the quality of data and the useful information that can be derived from it directly affects the ability of our model to learn; therefore, in this step we preprocess our data before feeding it into our model.
Identify feature engineering ideas and do feature Selection to check performance The features in your data will directly influence the predictive models you use and the results you can achieve. You can say that the better the features that you prepare and choose, the better the results you will achieve. Here, the mentor will discuss various ways of thinking about engineering features that might give you better performance and then test them out to do feature selection
Build Multiple Models and Do Grid Search to find the best set of hyperparameters In this step mentor will discuss various ways of selecting the right model for the problem and also cover how you can use grid search or other such methods to build improved models and jump up on the leaderboard
Ensemble Model to improve performance Rarely do we see a winning solution without using ensemble modeling which is nothing but combining multiple diverse base models to predict an outcome.
Make and Set Final Submission with Code file Learn how to choose your final submission and submit code file to complete your participation in the hackathon
What’s Next & QnA There is always a scope of improvement when it comes to a machine learning, here the mentor will share some tips on how to go forward and ways to improve the model even further
The prerequisites you really need to have is a basic understanding of Python Data Science Stack such as Pandas and sklearn & basic understanding of ML algorithms. For a super beginner friendly and short course on Python you may enrol here
The live stream links will be updated on this page itself when the hackathon goes live. Stay tuned!
1. Where can I find the dataset and the problem statement for the hackathon?
The contest and the live session will start on the designated contest start date and time. There is a timer that is shown at the top of this page which shows the remaining time before the contest goes live. This is when you can access the problem statement and datasets from the problem statement tab and
2. Can I share my approach/code?
Absolutely. You are encouraged to share your approach and code file with the community. There is even a facility at the leaderboard to share the link to your code/solution description.
3. I am facing a technical issue with the platform/have a doubt regarding the problem statement. Where can I get support?
You may use the discuss tab to post your technical issues or any other issue with the problem statement