Data Comparison


July 19, 2005


Comparison of Ed's Data with Other Sources


From: Gibbons Burke


Attached is the spreadsheet highlighting the differences between CRB and MJK data. Instead of using the pit-only series which I had been using before, I grabbed the data for the composite, which is alleged to contain the combined session range. I am pleased to see that in the Close column there was only one point of difference and same with the Volume column. The open interest data matches yours exactly. I removed your Saturday and Sunday dates, and changed [no data] values to zeros for matching purposes.

I added another sheet which takes the spread between the cells in the two underlying data sheets to observe more directly the point differences between the two data sets. I double checked the closing price difference I found against another source and it agrees with CRB. In the case of the volume difference, the other source agrees with MJK.

My analysis of what we are seeing in the open-high-low fields is that it reflects a difference in vendor reporting conventions. MJK is the primary source for our data. So our series flow from the exchanges by different routes.

I believe MJK assigns the official settlement price for the day to all fields for those days where there is no trading, whereas CRB records the quotes in the bid-ask spread in its calculation of the high-low range for those days. This belief is supported by the observation that in the comparison report the MJK data has the same value for all the fields.

In my opinion the MJK convention is more "accurate" in the sense that the day's range, which is zero, exactly reflects the volatility of that contract on that day, given it saw no trading. The "phantom" range reported by CRB may indicate quote interest where there was no activity, and is subject to manipulation - anyone can put in a bid or an offer and it doesn't have to be hit to be reflected in the day's range.

The MJK policy about the Open price is also well-considered. On days where there is trading (as opposed to the zero volume days, where the open reflects the settlement price) they use the first number in the opening range reported by the exchanges (for those exchanges that report a range rather than a single number for the open) which good analysts I know have found to be the better reflection of the first printed trade of the day.

Regarding the volume / open-interest change anomalies you have found, it is significant that your series matches our series so closely in the volume and open interest fields. This fact suggests to me that the likely source of those anomalies is the exchange itself.  Is it possible that the OI changes may be recorded after the volume is published, and reflect out-trade settlements, which take place the following morning?






The Smoking Spreadsheet


During an email exchange with Gibbons Burke,

I mention I normally keep contributors to FAQ anonymous

and only publish private names with permission.


Gibbons points out that Microsoft products now include

file tracking technology, to detect the author

of an excel spreadsheet.



This screenshot from the UNIX more command

clearly identifies Gibbons Burke

as the creator of the smoking spreadsheet.