How to Become a Data Scientist

I am a great fan of I spend sometime, everyday to check out what’s going on on my area and related areas. Past two days, I have been intersecting with some data scientist questions. I would like to plot some answers that I loved and found very useful.

But before starting to say these are necessary to become a data scientist, I would like to define a data science and data scientist:

Data science is an art of helping people with analyzing and building things with respect to those analysis. Data Scientists are people with some mix of coding and statistical skills who work on making data useful in various ways. There are 2 types of data scientists in general:

Type A is data scientists who are working mostly on analysis part. Type A Data Scientist is very similar to a statistician (and may be one) but knows all the practical details of working with data that aren’t taught in the statistics curriculum:  data cleaning, methods for dealing with very large data sets, visualization, deep knowledge of a particular domain, writing well about data, and so on.  They can code enough to work, not necessarily an expert on coding. But they may have some significant skills on experimental design, forecasting, modeling, statistical inference, or other things typically taught in statistics departments.

Type B is a data scientist who are working mostly on building part. Type B Data Scientists share some statistical background with Type A, but they are also very strong coders and may be trained software engineers.  The Type B Data Scientist is mainly interested in using data “in production.”  They build models which interact with users, often serving recommendations (products, people you may know, ads, movies, search results).

How we can be one of the type that we discussed above?


  • Math, Algorithms and Databases:
    • Calculus-3, Linear Algebra, Algorithms, Database Systems
  • Statistics:
    • Probability and Statistics
    • Data Analysis
  • Programming:
    • R programming
    • Scientific Python
    • pandas library

Acquire and Scrub Data:

Filter and Mine Data:

Represent and Refine Data:

 Domain Knowledge:

This skill is developed through experience working in an industry. Each dataset is different and comes with certain assumptions and industry knowledge. For example, a data analyst specializing in stock market data would need time to develop knowledge in analyzing transactional data for restaurants.

Combining all the above:

Data Literacy Course — IAP
UC Berkeley Introduction to Data Science
Coursera-Introduction to Data Science
Teach Data Science-Syracuse University

Apply the knowledge:
Harvard Data Science Course Homework
Kaggle: The Home of Data Science
Analyzing Big Data with Twitter
Analyzing Twitter Data with Apache Hadoop


Thanks to Pronojit Saha for this amazing answer on Quora.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s