Tag Archives: Data

R Language – Newbie tricks

Disclaimer: I just completed Data Analysis course on Coursera, so this tricks could look very naїve.

First trick is to use RStudio.

Main reason is that RStudio makes it more visible of what are you doing now. Plot something and this plot will be right behind your eyes. Load data and you will see list of your variables. Invoke help and voilà.. help will not go away just because you typed few commands. And what was most important for me is data view, when you can see you data, you can feel it. You can also consider:

  1. You can type View(yourData) to show data view. You can always use something like View(yourData[1:200,]), if there is too much data.;
  2. Ctrl+1, Ctrl+2 will switch you between console and script view;
  3. There is window with history of the commands.

Second trick is not to use data name in models.

It is kind of unreadable when your model looks like:

lm(superData$AgeOfTheElephant ~ superData$SizeOfTheTail + superData$NumberOfLegs + superData$Location)

Better to use:

attach(superData)
lm(AgeOfTheElephant ~ SizeOfTheTail + NumberOfLegs + Location)

Use detach to detach data. Attach works good, you can use it in any operations, but I do not like such side effect heavy tricks. My favorite is just to use:

lm(AgeOfTheElephant ~ SizeOfTheTail + NumberOfLegs + Location, data=superData)

Third trick is to use help for anything.

In console just type “?lm” to get help for lm, if nothing appear try “??lm”, this is full text search for the help. If you load data packages, you can also use help. In most cases help provides valuable information on the structure of the data, its roots or even history.

Trick #4 use RMD.

You might know what Markdown is. If you don’t it is simple write and read friendly syntax for formatted text. RMD is Markdown with R extensions. Very useful to generate report of the analysis.

Trick #5, take a look but do not use ProjectTemplate.

This is project scaffolding system for R. Looks very good, but as for me too havy on newbie stage. Also RStudio have it is own project system, but honestly I do not understand how this two things play together. I hope to play more with this stuff later.

Trick #6, use str.

Most important function.

Trick #7, use http://stats.stackexchange.com/,

ask questions there or on Stackoverflow. There are plenty information about R language on internet.

Trick #8, complete some courses Посмішка,

just now it is Computing for Data Analysis on Coursera (deadline for first week is Mon 13 Jan). In two weeks Stanford will lunch related course: StatLearning: Statistical Learning.

Misc

This is just suggestion, but use some style of the code. In R there are many naming styles, looks quite dirty. So you can follow Google’s R language code style guide as a base.

Also you might need R .gitignore file here you are.

Advertisements
Tagged , , , ,