- Why are your B/M and momentum long-short portfolios different than HML and momentum factors from Ken French’s website?
- Why does BMdec (Book to market using December ME following Fama-French 1992) have updates in both June and December? Shouldn’t it only update in June, as FF 1992 say they “form portfolios each June?”
- Why do some signals have data going out more than a year into the future?
- Fama-French define “Other industry” as 4950-4959, 4960-4961, 4970-4971, 4990-4991, but
ffind.ado
also includes 900, 3990, 6797, 9995, 9997 as “Other.” Why is that? - Why do some portfolio data files have three portfolios instead of the usual five or ten?
- Why are there missing values in the portfolio return data?
- Why do some signal returns (e.g. idiosyncratic volatility) have the wrong sign?
- In the data, the date is in the following format: “1986m1.” Should I assume it is based on the end or beginning of Jan in 1986?
- How are signal data categories (Cat.Data in signaldoc.csv) defined?
Q: Why are your B/M and momentum long-short portfolios different than HML and momentum factors from Ken French’s website?
A: Our B/M long-short portfolios are simple single sorts based on Fama-French’s (1992) univariate regression rather than the size-balanced 2 x 3 two-step constructions from Fama-French (1993). That is, FF 1993 form HML by (1) forming 6 portfolios by sorting on size and book-to-market and then (2) going long-short the extreme book-to-market portfolios with equal-weighting. Our code allows for this more complicated construction but we haven’t built it yet. Contributors are welcome! We have implemented this alternative construction and you can find those 2 x 3 portfolios here.
Nevertheless, our data is highly consistent with Ken French’s, once you look into the details of the implementation:
To replicate this figure, use this self-contained code.
Large deviations should not be surprising, since anomaly performance varies considerably by the details of the implementation (Fama and French 2008, Novy-Marx and Velikov 2016, Chen and Velikov 2020, etc).
To replicate HML exactly, you can use this code. The HML factor that it produces aligns with HML on Kenneth French’s website:
Q: Why does BMdec (Book to market using December ME following Fama-French 1992) have updates in both June and December? Shouldn’t it only update in June, as FF 1992 say they “form portfolios each June?”
A: BMdec is constructed with the most recent book equity with datadate from six months ago and with market capitalization from the most recent December. BMdec can update in months other than June due to firms having fiscal year end dates other than December.
Q: Why do some signals have data going out more than a year into the future?
A: This happens because we follow the Fama-French (1992) convention of being very conservative in assumptions about data availability. We implement FF’s convention by lagging annual accounting data by 6 months past the datadate, and then using the variable for 12 months after that, implying that an accounting number released today (April 2021) could be used to predict returns 18 months from today (October 2022).
This kind of convention may not be the best nowadays, with how fast information moves compared to in 1992. Indeed, Bowles, Reed, Ringgenberg, and Thornock (2020) show that anomaly profits are getting more and more concentrated in the days after earnings announcements. But our reading of the lit is that there is still not yet a convention to replace the Fama-French one.
Q: Fama-French define “Other industry” as 4950-4959, 4960-4961, 4970-4971, 4990-4991, but ffind.ado
also includes 900, 3990, 6797, 9995, 9997 as “Other.” Why is that?
A: via @jacaskey: “The 48-industry file on Kenneth French’s website does not define these industries, at all, so the code I wrote put them in the “other” category rather than leave them unassigned to any of the industry groupings.”
Q: Why do some portfolio data files have three portfolios instead of the usual five or ten?
A: For our baseline data, we follow the original paper in how they split stocks into portfolios. Please see the paper. In addition, we provide standardized quantiles. Please see the Data release notes. The table that describes the split and the choice of the split is found in https://github.com/OpenSourceAP/CrossSection/blob/master/SignalDocumentation.xlsx
Q: Why are there missing values in the portfolio return data?
A: A common reason for missing values is that the stock level characteristics are not well-behaved enough to sort into a particular set of quantiles. For example, Excluded Expenses (ExclExp) is the difference between Non-Gaap and Gaap earnings, and for many firms there is no difference, leading to a large mode at zero. This results in the interior quintiles in a quintile sort being poorly defined. We did not try to implement any special tiebreaker rules, leading to missing values for the interior portfolios. However, we use non-strict inequality constraints in the extreme quantiles to try to make sure the long-short portfolios are well-behaved.
Q: Why do some signal returns (e.g. idiosyncratic volatility) have the wrong sign?
A: We sign portfolio returns so that the long-short portfolio has a positive mean return (according to the original papers). Please see the sign column in the AddInfo sheet in our Signal Documentation.
Q: In the data, the date is in the following format: “1986m1.” Should I assume it is based on the end or beginning of Jan in 1986?
A: Data are always end-of-month, i.e. you are able to trade on a signal by the end of the month in the year-month column.
Q: How are signal data categories (Cat.Data in signaldoc.csv) defined?
A: The signaldoc.csv Cat.Data entries categorize each predictor by the type of data involved into the following categories:
- 13F: Based on LSEG / Refinitive / TR’s 13F data
- Accounting: Based on accounting variables from Compustat. Includes accounting valuations (f(accounting variables) / [market equity])
- Analyst: Based on analyst forecast and recommendation data from IBES. Includes analyst-based valuations (f(analyst variables)/[market equity])
- Event: Based on discrete firm events like dividend initiation, IPO, exchange switches.
- Options: Based on option market data (including option volume) from OptionMetrics.
- Other: Based on assorted random data, like BEA IO Tables and Patent Office data.
- Price: Based on past and current stock market prices.
- Trading: Based on volume, positioning, and microstructure data.