Quantitative finance collector
C++ Matlab VBA/Excel Java Mathematica R/Splus Net Code Site Other
Sep 6

R Reshape Package

Posted by abiao at 08:08 | Code » R/Splus | Comments(3) | Reads(13814)
Some of you may know this R reshape package already, I have started to play with it after the post Handling Large CSV Files in R. It is really an excellent one worthing a new post to introduce formally.

What is reshape package? reshape: Flexibly reshape data, Reshape lets you flexibly restructure and aggregate data using just two functions: melt and cast. Therefore basically it allows us to massage, re-organize our data as the hierarchy we need with only two steps: first melt the data into a form suitable for easy casting, then cast a molten data frame into the reshaped or aggregated form you want. Sounds tongue twisters? A small example will help you feel clearer.

Suppose you have a matrix of bond data
original data for melt
you are interested in the total amount of bonds of each rating, of each industry, or of each time to maturity, how to proceed? you may be thinking of lapply, sapply or even for loop, that's OK but at the cost of efficiency (coding time & running time) and possible error (personally I often have to modify twice for my sapply code to work, sad...).

It becomes much easier with the R Reshape package,
first, melt the data, newdata <- melt(data, id=c("RATING", "TIME_TO_MATURITY", "INDUSTRY_CODE", "BOND_TYPE"));
second, cast the data based on your needs, for instance, to get the total amount of each industry, cast(newdata, INDUSTRY_CODE ~ variable, sum) returns you a data.frame like
data after cast

That's it, easy to use, efficient, right? Download the R Reshape package at http://cran.r-project.org/web/packages/reshape/index.html

Tags: ,
Why not just using R's very own tapply?

tapply(data$TOTAL_AMT, data$INDUSTRY_CODE, sum)
Hi, BrenoNeri, thanks for your comment. Yes, for this example, tapply is great, but it is outperformed by reshape package if more organizations are required, for example, if further I need to know the sum of industry 11 with different rating, tapply becomes less obvious to write, while we need only cast(newdata, RATING ~ INDUSTRY_CODE ~ variable, sum).
Pretty insightful post. Never thought that it was this simple after all. I had spent a good deal of my time looking for someone to explain this subject clearly and you’re the only one that ever did that. Keep it up. armor games unblocked
Pages: 1/1 First page 1 Final page
Add a comment
Enable HTML
Enable UBB
Enable Emots
Nickname   Password   Optional
Site URI   Email   [Register]