Quantitative Finance Collector is a blog on Quantitative finance analysis, methods in mathematical finance focusing on derivative pricing, quantitative trading and quantitative risk management.
Jan
6
I once wrote a small VBA function to download multiple stock quotes from Yahoo finance, here is another Matlab file pointed out by sjev, you can download the M file @ http://quantum.meplaza.nl/get_yahoo_quote.m, it downloads quotes from yahoo and returns the data in a struct. The function supports a bunch of symbols and special tags, however, unlike the VBA function, returns the latest record only.
For instance, data = get_yahoo_quote({'MSFT','IBM','GOOG','GE'}) returns
For instance, data = get_yahoo_quote({'MSFT','IBM','GOOG','GE'}) returns
1x4 struct array with fields:
symbol
desc
lastTrade
lastTradeTime
dividendDate
open
lastClose
symbol
desc
lastTrade
lastTradeTime
dividendDate
open
lastClose
Nov
22
The book "Data Mining with R, Learning with Case Studies" received wonderful feedback from the blogger @ http://intelligenttradingtech.blogspot.com/2010/11/finally-practical-r-book-on-data-mining.html, especially the chapter on 'Predicting Stock Market Returns'. I have been looking for such a R book on data mining with detailed examples & codes for long time, and feel this book may be the one after viewing its chapters.
Unlike several other books focusing on theoretical side, this book follows a "learn by doing it" approach to data mining, this is accomplished by presenting a series of illustrative case studies for which all necessary steps, code and data are provided to the reader. Most of the main data mining processes and techniques are covered in the book by means of the presentation of four detailed case studies:
Predicting algae blooms
Predicting stock market returns
Detecting fraudulent transactions
Classifying microarray samples
I just bought @ Data Mining with R: Learning with Case Studies
and haven't got yet, should you are also interested, read the content outline and download the accompanying R codes @ the author's main page http://www.liaad.up.pt/~ltorgo/DataMiningWithR/
Unlike several other books focusing on theoretical side, this book follows a "learn by doing it" approach to data mining, this is accomplished by presenting a series of illustrative case studies for which all necessary steps, code and data are provided to the reader. Most of the main data mining processes and techniques are covered in the book by means of the presentation of four detailed case studies:
Predicting algae blooms
Predicting stock market returns
Detecting fraudulent transactions
Classifying microarray samples
I just bought @ Data Mining with R: Learning with Case Studies
Nov
18
This is an off-topic post. As a researcher I have always worried about my collected data & paper & programming codes, it is absolutely a nightmare if suddenly my data lost one day, given the huge effort I have put. How to protect yourself then? Below are three excellent tools I feel safe.
1. Dropbox
This is the best backup tool I have ever used, it allows you to sync your files in a pre-specified folder almost instantly, any change of file will be synced automatically. It is free, fast, and even better, you can install it in a computer without administrative privilege, such as the computer in your office. So I simply store all my recent files into the folder, and the first thing is to run dropbox every morning in the office, open dropbox at my own computer after back home in the evening, all changes I made that day are automatically updated on both computers.
Every user can get 2GB FREE space after sign-up, you can get extra space free by inviting your friends, for every friend who joins Dropbox, you & your friend will both get 250MB of bonus space (up to a limit of 8GB)! Therefore if you don't mind, please sign-up using my referral link :)
1. Dropbox
This is the best backup tool I have ever used, it allows you to sync your files in a pre-specified folder almost instantly, any change of file will be synced automatically. It is free, fast, and even better, you can install it in a computer without administrative privilege, such as the computer in your office. So I simply store all my recent files into the folder, and the first thing is to run dropbox every morning in the office, open dropbox at my own computer after back home in the evening, all changes I made that day are automatically updated on both computers. Every user can get 2GB FREE space after sign-up, you can get extra space free by inviting your friends, for every friend who joins Dropbox, you & your friend will both get 250MB of bonus space (up to a limit of 8GB)! Therefore if you don't mind, please sign-up using my referral link :)
Oct
26
Handling large dataset in R, especially CSV data, was briefly discussed before at Excellent free CSV splitter and Handling Large CSV Files in R. My file at that time was around 2GB with 30 million number of rows and 8 columns. Recently I started to collect and analyze US corporate bonds tick data from year 2002 to 2010, and the CSV file I got is 6.18GB with 40 million number of rows, even after removing biases data as in Biases in TRACE Corporate Bond Data.
How to proceed efficiently? Below is an excellent presentation on handling large datasets in R by Ryan Rosario at http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/, a short summary of the presentation:
1, R has a few packages for big data support. The presentation covers the following: bigmemory and ff; and also some uses of parallelism to accomplish the same goal using Hadoop and MapReduce;
2, the data used in the presentation is 11GB comma-separated values with 120 million rows, 29 columns;
3, For datasets with size in the range 10GB, bigmemory and ff handle themselves well;
4, For larger datasets, use Hadoop;
How to proceed efficiently? Below is an excellent presentation on handling large datasets in R by Ryan Rosario at http://www.bytemining.com/2010/08/taking-r-to-the-limit-part-ii-large-datasets-in-r/, a short summary of the presentation:
1, R has a few packages for big data support. The presentation covers the following: bigmemory and ff; and also some uses of parallelism to accomplish the same goal using Hadoop and MapReduce;
2, the data used in the presentation is 11GB comma-separated values with 120 million rows, 29 columns;
3, For datasets with size in the range 10GB, bigmemory and ff handle themselves well;
4, For larger datasets, use Hadoop;
Oct
21
This post is for those researchers using TRACE US corporate bond data as me. NASD introduced TRACE (Trade Reporting and Compliance Engine) in July 2002 in an effort to increase price transparency in the U.S. corporate debt market. The system captures and disseminates consolidated information on secondary market transactions in publicly traded TRACE-eligible securities (investment grade, high yield and convertible corporate debt) - representing all over-the-counter market activity in these bonds.
However the more I use the data, the more I realize its problem, one of the big issues is the repetitive order with the same amount and price, which definately brings trouble when the data is used for trading volume calculation, such as for Amihud's liquidity measure. Besides the duplicate issue, reversals & the same-day corrections are two major errors of TRACE data, as noted in the paper Liquidity biases in TRACE by Jens Dick-Nielsen,
However the more I use the data, the more I realize its problem, one of the big issues is the repetitive order with the same amount and price, which definately brings trouble when the data is used for trading volume calculation, such as for Amihud's liquidity measure. Besides the duplicate issue, reversals & the same-day corrections are two major errors of TRACE data, as noted in the paper Liquidity biases in TRACE by Jens Dick-Nielsen,




