Quantitative finance collector
C++ Matlab VBA/Excel Java Mathematica R/Splus Net Code Site Other
Nov 10
Long ago I shared a post for nearest correlation matrix calculation, with the main aim to compute the nearest correlation matrix to an approximate correlation matrix when i.e. the correlation matrix is not positive semidefinite. However, the Matlab codes in that post requires a call of C++ function, specifically, eig_mex(), which brings a problem for some users.

Therefore I re-introduce a Matlab-only file for nearest correlation matrix in case you are interested @ http://www.math.nus.edu.sg/~matsundf/#Codes
Oct 27
I am a newbie on MySql, have googled 2 hours but without a convincing answer, could you please recommend a good way to import a large CSV file into MySql? say a file of 6GB with 40 million rows? no matter through a client software or simple command line.

Cheers,
Biao


PS: Nick, thanks a lot for your reply, I have tried your way & it took me 1 hour and 16 minutes to import my 40 million lines CSV into MySQL on my humble laptop. That's great. My next task then is to check the performance of RMySQL package.
LOAD DATA INFILE 'data.csv' INTO TABLE tbl_name
  FIELDS TERMINATED BY ',' ENCLOSED BY '"'
  LINES TERMINATED BY '\r\n'
  IGNORE 1 LINES;

mysql large csv load performance
Tags: ,
Oct 26
Handling large dataset in R, especially CSV data, was briefly discussed before at Excellent free CSV splitter and Handling Large CSV Files in R. My file at that time was around 2GB with 30 million number of rows and 8 columns. Recently I started to collect and analyze US corporate bonds tick data from year 2002 to 2010, and the CSV file I got is 6.18GB with 40 million number of rows, even after removing biases data as in Biases in TRACE Corporate Bond Data.

How to proceed efficiently? Below is an excellent presentation on handling large datasets in R by Ryan Rosario at http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/, a short summary of the presentation:
1, R has a few packages for big data support. The presentation covers the following: bigmemory and ff; and also some uses of parallelism to accomplish the same goal using Hadoop and MapReduce;
2, the data used in the presentation is 11GB comma-separated values with 120 million rows, 29 columns;
3, For datasets with size in the range 10GB, bigmemory and ff handle themselves well;
4, For larger datasets, use Hadoop;
Tags: ,
Oct 25
Interactive Brokers via Matlab was mentioned at the old post Matlab trading code, IBrokers: R API to Interactive Brokers Trader Workstation is the R package I realize for algo trading API. Should you are also interested, you can watch the following short video about algo trading in R. You may watch from 15:00 directly.
Tags:
Oct 21
This post is for those researchers using TRACE US corporate bond data as me. NASD introduced TRACE (Trade Reporting and Compliance Engine) in July 2002 in an effort to increase price transparency in the U.S. corporate debt market. The system captures and disseminates consolidated information on secondary market transactions in publicly traded TRACE-eligible securities (investment grade, high yield and convertible corporate debt) - representing all over-the-counter market activity in these bonds.

However the more I use the data, the more I realize its problem, one of the big issues is the repetitive order with the same amount and price, which definately brings trouble when the data is used for trading volume calculation, such as for Amihud's liquidity measure. Besides the duplicate issue, reversals & the same-day corrections are two major errors of TRACE data, as noted in the paper Liquidity biases in TRACE by Jens Dick-Nielsen,
Tags: , ,
Pages: 7/69 First page Previous page 2 3 4 5 6 7 8 9 10 11 Next page Final page [ View by Articles | List ]