Should you teach Python or R for data science?

Here is an amazing post in r-bloggers, which I thought would be great to think about it…

Comparing Python and R language is very hard in terms of usage in data science field. I personally love to work in Python, but R has some amazing features that makes it faster then Python to solve problems. Let’s go and  compare with in different perspective.

Having a familiarity is important?

If you have some experience in programming learning Python may be better, because the syntax is somewhat similar to other languages.(Not quite much, but more similar then the R.) R has very weird syntax rules which gives you cramps most of the time, but when you get used to it -I assure you- you will love it.  If you don’t have programming experience and you want to start from very beginning, R would be good, because once you get used to the syntax, you could change your knowledge into high-level programming languages. Lastly, as I Python lover, I would like to pointed out that Python is the best programming language for the starters. most Ivy League schools change their introductory language to Python…

Which area you want to work? (Academic or Industry)

In academic areas , because of statistics area, R is more widely used than Python. But in industry productivity is very important, and Python makes every job done faster.

 

Machine Learning or Statistical Learning?

The line between these two terms is blurry, but machine learning is concerned primarily with predictive accuracy over model interpretability, whereas statistical learning places a greater priority on interpretability and statistical inference. scikit-learn, by far the most popular machine learning package for Python, is more concerned with predictive accuracy. Thus, R is probably the better choice if you are teaching statistical learning, though Python also has a nice package for statistical modeling (Statsmodels) that duplicates some of R’s functionality.

Are you looking for your language to look sexy?

If so, R is not very sexy language. As I said above, syntax of R is weird and looks creepy. It feels old, and its website looks like it was created around the time the web was invented. Python is the “new kid” on the data science block, and has far more sex appeal. From a marketing perspective, Python may be the better choice simply because it will attract more students.

And more Information…

Installing of both language is quite simple.

Installing R is a simple process, and installing RStudio (the de facto IDE for R) is just as easy. Installing new packages or upgrading existing packages from CRAN (R’s package management system) is a trivial process within RStudio, and even installing packages hosted on GitHub is a simple process thanks to the devtools package.

By comparison, Python itself may be easy to install, but installing individual Python packages can be much more challenging. In my classroom, we encourage students to use the Anaconda distribution of Python, which includes nearly every Python package we use in the course and has a package management system similar to CRAN. However, Anaconda installation and configuration problems are still common in my classroom, whereas these problems were much more rare when using R and RStudio. As such, R may be the better choice if your students are not computer savvy.

Data cleaning (also known as “data munging”) is the process of transforming your raw data into a more meaningful form. I find data cleaning to be easier in Python because of its rich set of data structures, as well as its far superior implementation of regular expressions (which are often necessary for cleaning text).

The pandas package in Python is an extremely powerful tool for data exploration, though its power and flexibility can also make it challenging to learn. R’s dplyr is more limited in its capabilities than pandas (by design), though I find that its more focused approach makes it easier to figure out how to accomplish a given task. As well, dplyr’s syntax is more readable and thus is easier for me to remember. Although it’s not a clear differentiator, I would consider R a slightly easier environment for getting started in data exploration due to the ease of learning dplyr.

R’s ggplot2 is an excellent package for data visualization. Once you understand its core principles (its “grammar of graphics”), it feels like the most natural way to build your plots, and it becomes easy to produce sophisticated and attractive plots. Matplotlib is the de facto standard for scientific plotting in Python, but I find it tedious both to learn and to use. Alternatives like Seaborn and pandas plotting still require you to know some Matplotlib, and the alternative that I find most promising (ggplot for Python) is still early in development. Therefore, I consider R the better choice for data visualization.

We chose it because we deal with huge amounts of data. Besides, it sounds really cool.

Larry Page, founder of Google

I hope, You love the topic… Leave comment below, Share your thoughts with me, I will pleased to hear them

One thought on “Should you teach Python or R for data science?

  1. I would keep most of my analysis in R, the fancy about it is that you can easily become with simple coding to amazing results with good graphical explanations on reports to people do not know about statistics (econometrics in my case). Packages are extensive solves almost every need, from health sciences to forex!

    Liked by 1 person

Leave a comment