Transitions into data science are tough, even scary! And it is not because you need to learn maths, statistics, and programming. You need to do that, but you must also battle the myths you hear from people around you and find your own path through them! I had my share of these “popular” myths, and they made my life difficult. Here are the few I have heard:
I had promised myself that once I had seen them through – I’d help others by debunking these myths. And there are multiple myths floating around that add a false aura around data science roles. Don’t fall for them!
These myths often make you feel like only geniuses can work in data science. This is just not true. Whether you’re a recent graduate, an experienced professional, or a leader, it’s important to understand how data science works and you will find your place in the industry. From mastering data analysis to collaborating with data engineers and handling raw data, there are various roles and tasks within data science that cater to a wide range of skills and expertise.
This article is my revenge on those myths!
If you’re looking for a role in data science and are struggling to break through, make sure you check out this awesome Ascend Pro program! It’s a perfect blend of learning from experienced instructors and practical hands-on projects – an unmissable opportunity. This program covers essential topics such as data science methods, data engineering, and data modeling, providing you with the skills and knowledge needed to excel in the field.
Holding a Ph.D. degree is an amazing achievement. It requires years of hard work and dedication. I have utmost respect for people who are willing to put in that amount of effort.
But is it compulsory to do a Ph.D in order to become a data scientist? This is a heavily role dependent question. There are several layers to peel off here so let’s get down to it.
To understand this, let’s broadly divide the role of a data scientist into two categories:
It’s important to understand the distinction between these two roles. The Applied Data Science is primarily about working with existing algorithms and understanding how they work. In other words, it’s all about applying these techniques in your project. You DO NOT need a Ph.D for this role.
Most folks fit into the above category. Most of the openings and job descriptions you see or hear about are for these roles.
But what if you are more interested in a research role? Then yes, you might need a Ph.D. Creating new algorithms from scratch, researching them, writing scientific papers, etc. – these fit a Ph.D candidate’s mindset. It also helps if the Ph.D adds to the domain you want to work in. For example, a Ph.D in linguistics will be immensely helpful for a career in NLP.
Another misunderstood aspect of a Ph.D is the opportunity cost. It’s a massive commitment from your side – both mentally and financially. Rachel Thomas wrote about this question here and I suggest taking a look. It’s a well-balanced point of view from a leading researcher in this field.
As Rachel mentions in her post, there are tons of data science pioneers who don’t hold a Ph.D:
So which role do you see yourself in? That’s a critical question to answer before you jump into data science.
Much like the Ph.D dilemma, this is another myth I’ve seen aspiring data science professionals fall for. With the amount of interest data science has generated in the last few years, how would you differentiate yourself from the competition? Splashing out money to get a degree seems like a good starting point. It’s an understandable reaction.
Here’s the good news – this is a myth perpetuated predominantly by institutes.
In a vast and complex field like data science, practical experience is king. There are numerous projects you can pick up and work on right now. Or find a problem you are passionate about solving and see if data science techniques can be applied there.
There are plenty of resources available online for learning. Books, MOOCs, blogs, videos, and so on. You can download the data science roadmap which will give you a broad structure for the milestones in Data Science. You can also start from Analytics Vidhya’s learning path. It’s completely free, comprehensive in nature (contains all the aforementioned resources), and provides a structure around your learning – an invaluable feature.
Because of the lack of formal education in this field, transitioning into data science boils down to sheer hard work, discipline and practical experience. Those are the differentiating factors a recruiting manager will look at.
Greg Brockman, the co-founder and CTO of OpenAI, doesn’t even hold a college degree!
You can also check out my article on the 13 common mistakes amateur data scientists make. It’s full of resources so you don’t want to miss that!
You have a solid 5-10 years of experience in *some* industry. You are a well-respected professional who’s calling the cards. But you’ve recently become enamoured with data science and all it can do for your business and career. You can’t wait to bring all that experience to your new field.
Sounds familiar? Good. But if you are of the thought that your entire experience will translate to your new role, I suggest you re-think.
There are two sides to this story:
Let’s understand the implications of each of these points.
If you are changing your domain entirely (for example, coming into data science after years of software testing), your work experience will most likely count for nothing. Not only are you switching to an entirely new line of work, you are also looking for a new role. When the recruiter looks at your resume, the first thought is – “what value will he/she add to the organization/project?”. Unfortunately, the answer in this case is usually close to zero.
Why? Because as a newcomer, you don’t have any experience of how that domain works. When you’re given real-world data, how will you work with it if you don’t understand how those features impact the final decision?
This is a reality most people skim over or prefer not to face. It’s an entirely wrong way to go about your career switch and will only end up harming your prospects. Understand the situation, talk to people who have made this switch, and align your expectations accordingly. Going in blind into such a big decision is a one-way ticket to failure.
Coming to scenario #2 – what should you expect if you’re staying in the same domain but switching to data science? Then your prospects look a lot sunnier. You have the added advantage of knowing the industry. You should already be aware of the nuances present in the domain so you will understand the data you’re working with. That is a MASSIVE benefit. The recruiting manager will factor that into the final decision.
I highly recommend the second scenario if at all possible. Stay in the same field you have always worked with and understand how you can apply data science there.
This article by Kunal Jain contains plenty of practical tips and tricks to help you overcome a lack of experience in this field.
This is essentially a continuation of the first two myths we covered. Most of the folks you’ll come across in data science will have an engineering/computer science background. They’ll have experience in at least have one, if not more, of the below fields:
Does this mean someone coming from a completely different background can’t get into data science?
Read these articles to start your career transition:
Absolutely not! Trust me, I speak from experience. I spent 5 years working in a learning and development role before transitioning into data science. It can be done.
However, there are certain things you’ll need to consider that people coming from this background already have. Data science is a nuanced field comprising of several aspects. As a beginner, you will be required to learn concepts from scratch. It will often be a frustrating experience. Your technical colleagues might know more. They might be ahead of you initially at every turn.
And that’s where the dedication and discipline characteristics we spoke about earlier come into play.
For example, folks with a computer science background will already have a handle on how programming works. They can switch between languages relatively easily. I, on the other hand, struggled initially with learning R. I didn’t know left from right when it came to coding. But I persisted and eventually broke through.
There is nothing to suggest you can’t do that too. Once you have decided this is the field for you, you should pull out all the stops. Still sceptical? Then check out these inspiring stories from folks with a non-technical background who successfully made the transition:
Python or R – which tool should you learn? If I got a penny each time I came across this question..
There is a widely held belief that mastering data science is about learning how to apply techniques in Python or R. Or any other tool. That tool has become the central point around which all other data science functions revolve.
The assumption (or myth) is that being able to write code using existing libraries (numpy, scikit-learn, caret, etc.) should be enough to label yourself an expert. Why aren’t the job offers rolling in yet?
That one really drives recruiting managers right up the wall.
Data science requires a combination of multiple skills. Programming is not at the centre of the data science spectrum – it is just one part of a whole. Let’s divide the spectrum of skills into two parts:
Understanding how a certain technique works will help you become a better data scientist. This is why we encourage everyone to learn algorithms from scratch. Learn how changing a certain parameter will impact the final model. This will eventually pay off when you’re working on a large-scale project in the industry.
The margin for error and experimentation is slim where stakeholders come into the picture. We have plenty of articles on our blog explaining machine learning and deep learning techniques from the ground up. Go through them and try to understand and replicate the code yourself.
It will be an invaluable addition to your skillset.
Soft skills often get overlooked by aspiring data scientists. They certainly aren’t taught in any online courses or offline classrooms. And yet these are qualities interviewers look for.
How can you pick up these skills?! By following a disciplined approach and practising every chance you get. Here are a few resources for your perusal:
Deep learning seems to propagate more myths than any other branch of data science I have come across, including:
But the most common myth I’ve heard – you need a considerable amount of hardware to perform deep learning tasks. When I first heard about deep learning, I pictured a room full of IBM supercomputers being operated by dozens of data scientists.
Don’t get me wrong, a deep learning model will always perform more efficiently when it has a powerful hardware setup to run on. But you don’t need a supercomputer to work with deep learning. It just might take longer than expected to train the model on your machine.
What if the data you’re working with is huge? Running it locally might not work. Google, as always, has the answer to that. Google Colab is a free cloud service that doubles as a coding notebook. But here’s the best part – Colab comes with FREE GPU support!
That’s right, you can leverage the power of a GPU for free and run your deep learning model there. Colab runs through your web browser, so there’s no computational cost to your machine. What else could a deep learning enthusiast ask for?
Here are a couple of resources to acquaint you with aspects of deep learning most people won’t teach:
Hollywood has done its best to showcase AI systems in the form of robots who possess human-level intelligence. Movies like Blade Runner and Terminator became cult classics for this very reason. The consensus they portray is that once an AI system has been built, it will continue to work and evolve by itself.
So once you build a model for, say, fraud detection, utilizing advanced machine learning models and predictive modeling techniques, the model will adapt to any changes thrown at it. If the entire financial landscape changed, or new features were added to the data, it’s expected that the system will continue to function equally well. This adaptability is crucial for data science companies relying on predictive modeling to maintain the effectiveness of their solutions over time.
This state, that AI systems evolve by themselves, is called Artificial General Intelligence (AGI). Unfortunately, we are not at that state yet. Not even close.
We are very much in the narrow stage right now. The models we build cannot generalize to other tasks, or even to big changes in the data. Let’s take the example of last year’s GDPR policy. Would an existing system adapt to the changes without manual human intervention? Are the systems we build smart enough to incorporate the ethics aspect? Can an AI system, built to recommend products to the customer, integrate a new product without any prior information about it?
That’s why labelling images in an object detection problem is such a crucial task. That level of intelligence isn’t available to machines yet.
Yes, DeepMind and other similar high-level research organizations have definitely made progress in this regard. You might have come across the news about neural networks creating their own neural networks. But these developments are way too few and far between. And we haven’t figured out how to get them into commercial applications.
I strongly recommend watching the below video by Michael B. Jordan regarding the state of AI currently and the challenges we face. It’s a great reference point if you can bring it up in an interview situation. 🙂
What comes to your mind when you think about the perfect artificial intelligence team? The most common answer I have heard is to get as many top data scientists as possible. Who wouldn’t like a bunch of top talent working on the same project?
But that just isn’t a practical solution for most of us. One “unicorn” data scientist is difficult to source, let alone tons of them. And will your team of data scientists be aware of the non-AI parts of a project? Do they understand how hardware works?
Simply put – an artificial intelligence project has a universe of jobs attached to it. It isn’t only limited to the role of a data scientist.
Applied artificial intelligence is a complex field. It requires working with different disciplines across the length and breadth of the project. A plethora of interdisciplinary roles exist:
Note that the roles and number of folks staffed will vary depending on the project. The idea I’m trying to get across is that AI isn’t a cut and dry field. It’s not a straightforward path. If someone tries to sell you on a project that is staffed with just data scientists, it might be time to sound the alarm bells.
This is especially relevant for people operating in a senior role (team leaders, managers, CxOs, etc.). It’s VERY important to understand each role in order to create a successful project.
You can also read about the jobs that might be impacted as AI continues to grow.
Being able to predict an event is a powerful thing. And that’s what stands out to newcomers in data science. Building models that can predict what a customer will buy next sounds like a must-have skill, right?
In fact, when I describe data science or machine learning to a non-technical person, their first reaction is quite similar. The hype around this field is unprecedented. Apparently, a data scientist is only building predictive models all day at work.
Is that not what DJ Patil meant when he described the role of a data scientist as the “sexiest job of the 21st century”? Well, not quite.
There are multiple layers in a data science project. The model building part is just a speck in the overall data science lifecycle (I’ll cover the different roles in data science in the next section). To give you a general idea, the steps involved in a typical data science lifecycle are:
Nothing is as straightforward as they teach you in a classroom or a course. Experience is the best way to learn how a project works. Try talking to someone who has seen the end-to-end process. Even better, get an internship and get a first-hand account of what makes a data science project tick.
Additionally, data science isn’t limited to simply making predictions. I’m sure you’ve come across the market-basket analysis concept. It’s a combination of clustering techniques and association rules. Or how about anomaly detection? The ability to figure out outliers in the data. There is so much to learn!
Data science competitions are an excellent stepping stone in your data science career journey. You get to practice your skills on a dataset, explore how data scientist works, showcase your expertise to the world, and immerse yourself in the dynamic data science environment. Plus, there’s always the thrilling possibility of winning prizes as a testament to your proficiency and innovation.
These hackathons and competitions have increased multi-fold in the last 4-5 years as more and more people want a piece of the data science cake. Most aspiring data science professionals include these competitions on their resume.
The problem from an interview standpoint? Recruiters have started paying less and less attention to this aspect of your portfolio.
There are plenty of reasons why recruiters don’t consider your competition experience. I’ll laser that down to one:
Real-world projects are an entirely different beast compared to what you see in competitions.
Data science competitions have clean and almost spotless datasets. If there are missing values, you can impute them using a plethora of techniques. What matters is the accuracy of your model, not the way you got there.
Real-world projects have end-to-end pipelines which involve working with a bunch of people. Most of us will always have to work with messy and untidy data. The old saying about spending 70-80% of your time just collecting and cleaning data is true. Tasks like data cleaning and feature engineering will take up the majority of your time.
This LinkedIn post is an excellent read on the standard methodology one can use for analytical models. You can also refer to the section above where we spoke about the different stages involved in a typical data science project.
Additionally, we can’t just build a stacked complex ensemble model. Clients demand transparency so the simpler model usually wins out. Interpretability is key in a corporate setting. The project is accountable for the model behaving poorly.
As I mentioned in this article, getting a good score on a competition leaderboard is excellent for measuring your learning progress, but interviewers will want to know how you can optimize your algorithm for impact, not for the sake of increasing accuracy. Talk to data science experts, try to understand how these projects work, build your network in the domain of your choice, and try to structure your thoughts to align accordingly.
We’ll wrap up this article with another myth around building models. This is a conversation I had with a fresher data scientist recently:
- Pranav Dar: What’s your favorite part of a data science role apart from designing models?
- Fresher DS: I like the feature engineering part.
- PD: Sounds fair. How do you usually collect data for your projects?
- Fresher DS: Um, I usually just download it from one of the open-source platforms.
- PD: OK..but what if the data is skewed or biased? How do you verify the identity of the data? And what will you do when you’re asked to collect data from multiple sources that require database skills?
- Fresher DS: I hadn’t thought about that..
That, unfortunately, is a conversation I have on a far too regular basis. Most experienced data science professionals are well aware of this situation as well. Expect to be tested on this subject thoroughly in an interview.
Data is being generated at an unprecedented pace but collecting and cleaning it isn’t getting any easier. Without building a pipeline to collect the data, your data science project is going nowhere. Typically, this is the role of a data engineer (but data scientists are expected to know this function as well).
I cannot overstate the importance of the data collection step. Collecting honest and accurate data is imperative to your final model working well. As Wikipedia puts it, “The goal for all data collection is to capture quality evidence that allows analysis to lead to the formulation of convincing and credible answers to the questions that have been posed”.
There are too many sources of data available. How do you connect to each? What data format do you receive from each? What’s the cost of data collection from each of these sources? This is a microcosm of the kind of questions you’ll need to ask in a real-world setting.
Roles like database manager, database architect and data engineer have taken on a new level of importance. Maintaining the integrity of the data and the aforementioned pipeline is as important as any other task that succeeds it.
What began as a concise endeavor transformed into a substantial exploration, highlighting the paramount significance of dispelling these pervasive data science myths. Beyond the myths dissected here, many misperceptions loom on the data science horizon.
Your insights on this discourse are invaluable. Feel free to share your perspectives on this article and any additional myths you’ve encountered or once subscribed to. Let’s collectively equip the emerging cohort of data science professionals with the tools to navigate past the pitfalls we once encountered.
As we set forth to debunk myths, expanding our knowledge base is pivotal. The Analytics Vidhya Blackbelt Program is a formidable resource for honing data science expertise. Enroll today, and empower yourself to challenge myths and unravel the true potential of data science. Together, let’s foster a more informed and empowered data science community.
A. Ensuring data quality and reliability remains a substantial challenge in data science, affecting the accuracy and validity of insights and predictions.
A. Balancing the interdisciplinary demands of data science, encompassing mathematics, programming, domain knowledge, and communication, can be challenging.
A. Yes, hackers leverage data science techniques for malicious purposes, such as identifying vulnerabilities, launching attacks, and manipulating data to breach systems and networks.
A. While data science holds immense potential, it’s occasionally overhyped. Some portrayals overlook the complexities and uncertainties inherent in real-world applications, leading to inflated expectations.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
A very well written article which has nicely documented facts about some of the most important doubts that beginners to the field face. I also like the fact that you have subtly covered what to expect when coming from different backgrounds.
Thanks, Sanad. :)
Great read, Thanks Pranav!
All doubts are perfectly cleared! Thankyou! Looking forward for more insights!
Excellent article! What I find the most challenging is if you are brand new to this field, it's tough to break into.
That's one of the toughest things when it comes to data science. Thanks for reading, Julia!
This is builds some confidence to become Data Scientist. Thank for the article
Very nice detail
Nice article, it really depends what work you are doing. I have seen many companies do API fitting, copy paste some code and get the things done, without understanding the math behind it. I strongly believe the lead should be Ph.D. or highly experienced postgraduate > undergraduate. Coz postgraduate is the first class of research. Course a curriculum at the graduate level is not enough and you can't change/escape basic things.
Hi Raj, Thanks for reading the article and for your insight. I tend to agree with your thoughts. The one thing I'll add is that a Ph.D will face certain challenges intrgrating into a corporate/industry setting if he/she spent 4-5 years in research. That kind of mindset to lead a team can be challenging for everyone concerned. Like you mentioned, experience is the key to unlocking the potential of any data science team.
All the myths are very well illustrated. Thanks for sharing.
Many thanks for your very helpful article!
Thank you so much for your Inputs, this is the most relevant article for all the block I had in mind.
Very Very Eye opening lookout. I underestimated importance of quality data collection and cleaning.
A well written article and very informative. I salute you for taking time and consideration to share your experience. Thank you so much for doing that!!
Glad you liked it, Desbois!
Hi Pranav..I have done civil engineering from a tier 1 college (2008-2012). But my experience in private IT sector is only 1.5 years.. Since, last 5 years I am in a government job..But I want to switch back to corporate jobs in the data science role. I have no experience in data science role..But I have good knowledge of mathematics of undergraduate level. How difficult would it be for me to make the switch..Or is it even worth considering the switch. I am ready to do all the hard work required and if needed I am ready to enroll in some course. But the question boils down to finding a job given I would be over 30 by the time I gain some knowledge about data science. Please provide some insights. I am
Nice article, it's very helpful.. Your article is well documented for any of the fresher in this field... Thank you so mush for this...
It’s been 4 weeks now since I started reading extensively on data science and analytics as I’m working on a career change. Thank God I didn’t misunderstand the myth about prior experience. I’m seeking to apply my data science skills in public policy and Econ-tech spaces for Africa. Now to the salient point: I gave myself the goal of reading 3-4 articles a day on data science but as time went on I begun to see poorly written articles Making baseless claims and often wondered if this is all I’ll be reading. I also found very good ones which helped me to understand the relationship between data science and analytics. To become a great data scientists one has to be curious, have storytelling skills and love analytics and all the preliminary work that comes with sourcing data. Last week, I decided that I will start looking for only high quality articles to read because “garbage in, garbage out” as a data practitioner. Thank God that google suggested your article to me today. I’ve read and saved it, shared it with some good friends and also emailed it to myself to read again tomorrow morning. Repetition they say “is the mother of learning”. I’ll follow all the links in your article to see where it leads me for the next two weeks. Thanks.
:) Thanks for reading it, Kingsley. I'm glad you found it useful!