Ed Seykota, 2003 - 2004 ... Write for permission to reprint.

Ed Seykota's

Frequently Asked Questions

FAQ Index & Ground Rules  ...  Tribe Directory - How to Join

TTP - The Trading Tribe Process  ...  Glossary

  TTP Workshop  ...  Resources  ...  The Trading Tribe Book

TSP: Trading System Project  ...  Breathwork


Data Verification


Many Data Sources


These days, traders can obtain, from various vendors, a reasonably accurate historical data base for about $1,000 and keep it current, by subscription for about $500 / year.  Traders can also find, on the net, many resources for free historical data.


Like many traders, I use various sources, including subscription data services.  For this study I am using data from CRB.



Scan The Data to Verify Consistency


Before I begin system testing I like to to verify my data is OK.  I have a scan program that checks my data for basic problems such as missing days, open or close outside the high/low range and open interest change exceeding volume.


Most futures data has inconsistencies between volume and open interest, particularly in the early days of a delivery when the trading is thin.


Sometimes firms under-report or over-report open interest.  For example, if a firm has 10 lots long for one client and 10 short for another, it reports open interest of 10 lots.  If a firm has 10 lots long and short for the same client, it report open interest as zero.  Thus, for the same position, the firm may report various values for open interest.


When firms mis-report or correct a previous mis-report, open interest can change without any corresponding volume.  The CME reports  volume and open interest once per day.  CME does not go back and correct prior numbers.



Cross-Check With Other Data Sources


Different data vendors use different conventions for reporting Open, High and Low. Some vendors report all-session range while others report day-session only. This can impact systems that signal on new highs and lows.


Different vendors have different conventions for computing the open from the opening range and / or creating an open if no trades occur during the open.  This can impact systems that compute volatility from a delta from the opening price.


Gibbons Burke compares CRB data against MJK data.  His study shows  the difference between all-session and day-session-only series.  For similar sessions, he gets a good match, even to containing the same volume / open interest inconsistencies.


Jake Carriker and David Meyer also provide comparison studies.