We analyse three US equity factors from the Global Factor Data repository — gross profits-to-assets, tax expense surprise, and asset growth. Using monthly value-weighted returns, we interpret their economic link to investment and profitability, measure risk and return with rolling 36-month windows (including pre- and post-publication subsamples), and compare them to Fama–French benchmarks.
Result. all three earn positive, relatively low-volatility premia; gross profitability has the best risk-adjusted profile; the quality pair correlates positively with each other and negatively with asset growth, as expected. Spanning regressions show that CAPM and the Fama–French three-factor model leave economically large alphas, while the Hou et al. (2021) q5 model explains all three — the appropriate benchmark for factors built on profitability and investment.
packages <- c(
"tidyverse", # data wrangling + ggplot2
"lubridate", # dates
"zoo", # rolling windows
"broom", # tidy() / glance() on models
"sandwich", # Newey-West HAC covariance
"lmtest", # coeftest() with a supplied vcov
"scales", # percent labels
"knitr", # kable tables
"kableExtra" # nicer tables (optional, degrades gracefully)
)
for (p in packages) {
if (!requireNamespace(p, quietly = TRUE)) {
install.packages(p, repos = "https://cloud.r-project.org")
}
}
invisible(lapply(setdiff(packages, "kableExtra"), library, character.only = TRUE))
have_kex <- requireNamespace("kableExtra", quietly = TRUE)
# A small wrapper so tables look the same everywhere.
show_tbl <- function(df, caption = NULL, digits = 3) {
k <- knitr::kable(df, caption = caption, digits = digits, format.args = list(big.mark = ","))
if (have_kex) kableExtra::kable_styling(k, full_width = FALSE, bootstrap_options = c("striped", "hover")) else k
}
Three statistics recur throughout, so we define them once. Factor returns are fat-tailed and path-dependent, so skewness, excess kurtosis and maximum drawdown matter as much as the mean and volatility.
# Largest peak-to-trough loss of $1 compounded through the series.
max_drawdown <- function(x) {
wealth <- cumprod(1 + ifelse(is.na(x), 0, x))
drawdown <- wealth / cummax(wealth) - 1
min(drawdown, na.rm = TRUE)
}
skewness <- function(x) {
x <- x[!is.na(x)]; m <- mean(x); s <- sd(x)
mean(((x - m) / s)^3)
}
excess_kurtosis <- function(x) {
x <- x[!is.na(x)]; m <- mean(x); s <- sd(x)
mean(((x - m) / s)^4) - 3
}
We work with monthly returns, all expressed as decimals (0.01 = 1%). There are three blocks:
A note on signs. The JKP
retcolumn is already signed so that each factor is long its historically higher-returning leg (thedirectioncolumn records the sign that was applied). We therefore useretas-is and never flip it. This is the single most common mistake in this assignment.
# CSVs live in ../data relative to this code/ folder. Override if needed.
DATA_DIR <- if (dir.exists("../data")) "../data" else "."
data_file <- "group_20_data.RData"
if (file.exists(data_file)) {
load(data_file)
message("Loaded cleaned data from ", data_file)
} else {
## 1. JKP factor returns ------------------------------------------------
factor_files <- file.path(DATA_DIR, c(
"[usa]_[gp_at]_[monthly]_[vw_cap].csv", # Gross profits-to-assets
"[usa]_[tax_gr1a]_[monthly]_[vw_cap].csv", # Tax expense surprise
"[usa]_[at_gr1]_[monthly]_[vw_cap].csv" # Asset growth
))
factors <- factor_files %>%
map_dfr(read_csv, show_col_types = FALSE) %>%
mutate(date = as.Date(date), ret = as.numeric(ret)) %>%
select(date, name, ret) %>%
pivot_wider(names_from = name, values_from = ret) %>%
arrange(date)
## 2. Fama-French factors ----------------------------------------------
# The CSV has header text + an annual block; keep only the YYYYMM rows.
ff_lines <- read_lines(file.path(DATA_DIR, "F-F_Research_Data_Factors.csv"))
ff_monthly_lines <- ff_lines[str_detect(ff_lines, "^[0-9]{6},")]
ff_factors <- read_csv(
I(ff_monthly_lines),
col_names = c("date", "Mkt_RF", "SMB", "HML", "RF"),
col_types = cols(.default = "c"), show_col_types = FALSE
) %>%
mutate(
date = ceiling_date(ymd(paste0(date, "01")), "month") - days(1),
across(c(Mkt_RF, SMB, HML, RF), ~ as.numeric(.) / 100) # % -> decimal
)
## 3. Hou et al. q5 factors --------------------------------------------
q5_factors <- read_csv(file.path(DATA_DIR, "q5_factors_monthly_2024.csv"),
show_col_types = FALSE) %>%
mutate(
date = ceiling_date(ymd(paste0(year, "-", month, "-01")), "month") - days(1),
across(c(R_F, R_MKT, R_ME, R_IA, R_ROE, R_EG), ~ as.numeric(.) / 100)
) %>%
select(date, R_F, R_MKT, R_ME, R_IA, R_ROE, R_EG)
save(factors, ff_factors, q5_factors, file = data_file)
message("Built and saved ", data_file)
}
stopifnot(exists("factors"), exists("ff_factors"), exists("q5_factors"))
A quick sanity check — coverage and missing values. The three factors start in the early 1950s; q5 starts in 1967, which will define the common sample for the head-to-head model comparison later.
tibble(
series = c("gp_at", "tax_gr1a", "at_gr1", "FF (Mkt/SMB/HML)", "q5"),
start = c(format(min(factors$date)), format(min(factors$date[!is.na(factors$tax_gr1a)])),
format(min(factors$date[!is.na(factors$at_gr1)])),
format(min(ff_factors$date)), format(min(q5_factors$date))),
end = c(format(max(factors$date)), format(max(factors$date)), format(max(factors$date)),
format(max(ff_factors$date)), format(max(q5_factors$date)))
) %>% show_tbl(caption = "Sample coverage")
| series | start | end |
|---|---|---|
| gp_at | 1950-11-30 | 2025-12-31 |
| tax_gr1a | 1951-11-30 | 2025-12-31 |
| at_gr1 | 1951-11-30 | 2025-12-31 |
| FF (Mkt/SMB/HML) | 1926-07-31 | 2026-04-30 |
| q5 | 1967-01-31 | 2024-12-31 |
colSums(is.na(factors))
## date gp_at tax_gr1a at_gr1
## 0 0 12 12
Group 20 factors, with their JKP codes and source papers.
| Factor | JKP code | Type | Source paper |
|---|---|---|---|
| Gross profits-to-assets | gp_at |
Profitability / quality | Novy-Marx (2013) |
| Tax expense surprise | tax_gr1a |
Earnings-quality / news | Thomas & Zhang (2011) |
| Asset growth | at_gr1 |
Investment | Cooper, Gulen & Schill (2008) |
Gross profits-to-assets (gp_at).
Novy-Marx (2013) shows that gross profitability — revenue minus
cost of goods sold, scaled by assets — is “the other side of value”.
Gross profit sits at the top of the income statement, so it is the
cleanest measure of true economic productivity, before the accounting
noise of depreciation, R&D and special items further down.
Productive firms earn higher average returns; the strategy is long
high-productivity, short low-productivity firms. Economically it is a
quality premium: a claim that the market under-prices
durable operating efficiency.
Tax expense surprise (tax_gr1a). Thomas
& Zhang (2011) document that the year-on-year change in tax expense
predicts future returns. The intuition is an information /
earnings-quality story: taxes are hard to manage, so a jump in
the tax accrual is a credible signal that taxable income — and
therefore true profitability — is rising faster than the headline
earnings number suggests. Investors under-react to this less-salient
line item, and the mispricing corrects over the following months. It is
a cousin of post-earnings-announcement drift, using the tax line as the
surprise.
Asset growth (at_gr1). Cooper, Gulen
& Schill (2008) show that firms which aggressively expand their
asset base subsequently under-perform. The factor is therefore
long low-growth, short high-growth
firms (the JKP sign is set accordingly). Two complementary explanations:
(i) a q-theory / investment story — firms optimally invest more
when their cost of capital (expected return) is low, so high investment
mechanically lines up with low future returns; and (ii) a
behavioural story — managers over-invest and markets
over-extrapolate growth, so empire-building is punished ex-post.
Are they related? Yes, and in an economically
coherent way. Profitable, well-run firms (gp_at,
tax_gr1a) tend to be financially conservative and to grow
their asset base slowly — exactly the firms the asset-growth factor is
long. We therefore expect the two profitability/quality factors
to be positively correlated with each other and
negatively correlated with the raw asset-growth return
series. Correlations confirms this. The
deeper point, which drives the whole story in spanning regressions: all three are essentially
investment and profitability phenomena, which is
precisely what the q-factor model is built to price.
Annualised figures (mean × 12, volatility × √12). Read the table as a risk profile, not just a ranking of returns.
summary_stats <- factors %>%
pivot_longer(c(gp_at, tax_gr1a, at_gr1), names_to = "factor", values_to = "ret") %>%
group_by(factor) %>%
summarise(
n_months = sum(!is.na(ret)),
ann_mean = mean(ret, na.rm = TRUE) * 12,
ann_vol = sd(ret, na.rm = TRUE) * sqrt(12),
sharpe = ann_mean / ann_vol,
skewness = skewness(ret),
excess_kurtosis = excess_kurtosis(ret),
max_drawdown = max_drawdown(ret),
.groups = "drop"
) %>%
arrange(desc(sharpe))
summary_stats %>%
mutate(across(c(ann_mean, ann_vol, max_drawdown), ~ percent(.x, 0.01)),
across(c(sharpe, skewness, excess_kurtosis), ~ round(.x, 2))) %>%
show_tbl(caption = "Annualised risk-return profile, full sample")
| factor | n_months | ann_mean | ann_vol | sharpe | skewness | excess_kurtosis | max_drawdown |
|---|---|---|---|---|---|---|---|
| gp_at | 902 | 3.49% | 7.90% | 0.44 | 0.02 | 0.92 | -33.56% |
| at_gr1 | 890 | 2.68% | 9.18% | 0.29 | 1.21 | 13.54 | -34.33% |
| tax_gr1a | 890 | 1.65% | 6.32% | 0.26 | -0.20 | 1.68 | -28.35% |
Story. gp_at is the standout on a
risk-adjusted basis — the highest Sharpe ratio with the lowest
volatility and near-zero skew, i.e. a steady quality premium.
at_gr1 earns a higher raw mean but pays for it with the
highest volatility and a large positive skew and fat tails
(high excess kurtosis): the asset-growth strategy occasionally delivers
big positive months (think sharp reversals when over-extended growth
firms crash), which is attractive but lumpy. tax_gr1a is
the quietest factor — modest mean, lowest volatility, mild
negative skew — consistent with a slow-drip information
premium. All three are positive-mean, low-volatility long–short
strategies, as expected of published anomalies.
Growth of $1, log scale (a straight line = constant compound rate). We align all three to the first month where every series is available.
cumulative <- factors %>%
filter(if_all(c(gp_at, tax_gr1a, at_gr1), ~ !is.na(.))) %>%
transmute(date,
`Gross profits-to-assets` = cumprod(1 + gp_at),
`Tax expense surprise` = cumprod(1 + tax_gr1a),
`Asset growth` = cumprod(1 + at_gr1)) %>%
pivot_longer(-date, names_to = "factor", values_to = "wealth")
ggplot(cumulative, aes(date, wealth, colour = factor)) +
geom_line(linewidth = 0.8) +
scale_y_log10() +
labs(title = "Cumulative performance of the three factors",
subtitle = "$1 invested in each long-short strategy, log scale",
x = NULL, y = "Cumulative wealth (log)", colour = NULL) +
theme_minimal() + theme(legend.position = "bottom")
The slopes echo the table: gp_at compounds most
smoothly, at_gr1 is the most jagged, and the gentle upward
drift of all three over 70+ years is the visual signature of a genuine
risk/mispricing premium rather than noise.
A 60-month (5-year) rolling window shows that “the premium” is not constant — it has regimes. This directly addresses the task’s request to examine “periodic breaks in performance”.
roll_w <- 60
rolling <- factors %>%
filter(if_all(c(gp_at, tax_gr1a, at_gr1), ~ !is.na(.))) %>%
pivot_longer(c(gp_at, tax_gr1a, at_gr1), names_to = "factor", values_to = "ret") %>%
group_by(factor) %>% arrange(date) %>%
mutate(
roll_mean = rollapply(ret, roll_w, \(x) mean(x) * 12, fill = NA, align = "right"),
roll_vol = rollapply(ret, roll_w, \(x) sd(x) * sqrt(12), fill = NA, align = "right"),
roll_sharpe = roll_mean / roll_vol
) %>% ungroup() %>%
mutate(factor = recode(factor,
gp_at = "Gross profits-to-assets", tax_gr1a = "Tax expense surprise",
at_gr1 = "Asset growth"))
ggplot(rolling, aes(date, roll_sharpe, colour = factor)) +
geom_hline(yintercept = 0, linewidth = 0.3, colour = "grey50") +
geom_line(linewidth = 0.8, na.rm = TRUE) +
labs(title = "60-month rolling Sharpe ratio",
subtitle = "Risk-adjusted performance is regime-dependent, not constant",
x = NULL, y = "Sharpe ratio", colour = NULL) +
theme_minimal() + theme(legend.position = "bottom")
ggplot(rolling, aes(date, roll_mean, colour = factor)) +
geom_hline(yintercept = 0, linewidth = 0.3, colour = "grey50") +
geom_line(linewidth = 0.8, na.rm = TRUE) +
scale_y_continuous(labels = percent) +
labs(title = "60-month rolling annualised mean return",
x = NULL, y = "Annualised mean", colour = NULL) +
theme_minimal() + theme(legend.position = "bottom")
Story. Every factor spends multi-year stretches below zero — there is no free lunch. Note the cluster of weakness for the quality/profitability factors around the late-1990s tech run-up (when junk and growth dominated) and again in the 2010s, the period highlighted by the “replication crisis” literature (Jensen, Kelly & Pedersen 2023): many anomalies decayed after they became widely known. We test that decay formally next.
If a factor is genuine compensation for risk it should survive
publication; if it was a mispricing, arbitrage should erode it once the
paper is out. We split each series at its publication year and test
whether the post-publication mean is significantly different using a
post dummy with Newey–West (HAC, 6 lags) standard errors —
appropriate because monthly factor returns are mildly
autocorrelated.
pub_dates <- tibble(
factor = c("gp_at", "tax_gr1a", "at_gr1"),
publication_date = as.Date(c("2013-01-01", "2011-01-01", "2008-01-01"))
)
pre_post <- factors %>%
pivot_longer(c(gp_at, tax_gr1a, at_gr1), names_to = "factor", values_to = "ret") %>%
left_join(pub_dates, by = "factor") %>%
filter(!is.na(ret)) %>%
mutate(period = factor(if_else(date < publication_date, "Pre", "Post"),
levels = c("Pre", "Post")))
pre_post %>%
group_by(factor, period) %>%
summarise(n = n(), ann_mean = mean(ret) * 12, ann_vol = sd(ret) * sqrt(12),
sharpe = ann_mean / ann_vol, .groups = "drop") %>%
mutate(across(c(ann_mean, ann_vol), ~ percent(.x, 0.01)), sharpe = round(sharpe, 2)) %>%
show_tbl(caption = "Performance before vs after publication")
| factor | period | n | ann_mean | ann_vol | sharpe |
|---|---|---|---|---|---|
| at_gr1 | Pre | 674 | 2.92% | 9.65% | 0.30 |
| at_gr1 | Post | 216 | 1.96% | 7.51% | 0.26 |
| gp_at | Pre | 746 | 3.44% | 7.89% | 0.44 |
| gp_at | Post | 156 | 3.74% | 7.95% | 0.47 |
| tax_gr1a | Pre | 710 | 1.75% | 6.74% | 0.26 |
| tax_gr1a | Post | 180 | 1.26% | 4.26% | 0.30 |
# Formal test: is the post-publication mean different?
pre_post %>%
mutate(post = as.integer(period == "Post")) %>%
group_by(factor) %>%
group_modify(~ broom::tidy(lmtest::coeftest(
lm(ret ~ post, .x),
vcov. = sandwich::NeweyWest(lm(ret ~ post, .x), lag = 6, prewhite = FALSE)))) %>%
filter(term == "post") %>%
transmute(factor, delta_ann = percent(estimate * 12, 0.01),
t = round(statistic, 2), p_value = round(p.value, 3)) %>%
show_tbl(caption = "Change in mean after publication (post dummy)")
| factor | delta_ann | t | p_value |
|---|---|---|---|
| at_gr1 | -0.96% | -0.37 | 0.714 |
| gp_at | 0.30% | 0.11 | 0.915 |
| tax_gr1a | -0.49% | -0.32 | 0.747 |
Story. Means drift a little after publication — down
for at_gr1 and gp_at’s Sharpe actually edges
up — but for none of the three is the change
statistically significant (all p-values well above 0.10). So we cannot
reject that these premia survived publication. That is mild evidence for
a risk-based rather than purely-arbitraged interpretation, though the
short post-publication windows limit the test’s power.
The natural benchmark is the Fama–French market excess return
Mkt_RF. It is comparable to our factors because all four
are zero-cost / excess-return series.
fac_mkt <- factors %>%
inner_join(ff_factors %>% select(date, Mkt_RF), by = "date") %>%
filter(if_all(c(gp_at, tax_gr1a, at_gr1, Mkt_RF), ~ !is.na(.)))
fac_mkt %>%
pivot_longer(c(gp_at, tax_gr1a, at_gr1, Mkt_RF), names_to = "series", values_to = "ret") %>%
group_by(series) %>%
summarise(ann_mean = mean(ret) * 12, ann_vol = sd(ret) * sqrt(12),
sharpe = ann_mean / ann_vol, max_dd = max_drawdown(ret), .groups = "drop") %>%
mutate(series = recode(series, gp_at = "Gross profits-to-assets",
tax_gr1a = "Tax expense surprise", at_gr1 = "Asset growth",
Mkt_RF = "Market excess return"),
across(c(ann_mean, ann_vol, max_dd), ~ percent(.x, 0.01)), sharpe = round(sharpe, 2)) %>%
arrange(desc(sharpe)) %>%
show_tbl(caption = "Factors vs the market, common sample")
| series | ann_mean | ann_vol | sharpe | max_dd |
|---|---|---|---|---|
| Market excess return | 7.86% | 15.00% | 0.52 | -55.71% |
| Gross profits-to-assets | 3.74% | 7.86% | 0.48 | -33.56% |
| Asset growth | 2.68% | 9.18% | 0.29 | -34.33% |
| Tax expense surprise | 1.65% | 6.32% | 0.26 | -28.35% |
fac_mkt %>%
transmute(date,
`Gross profits-to-assets` = cumprod(1 + gp_at),
`Tax expense surprise` = cumprod(1 + tax_gr1a),
`Asset growth` = cumprod(1 + at_gr1),
`Market excess` = cumprod(1 + Mkt_RF)) %>%
pivot_longer(-date, names_to = "series", values_to = "wealth") %>%
ggplot(aes(date, wealth, colour = series)) +
geom_line(linewidth = 0.8) + scale_y_log10() +
labs(title = "Factors vs market excess return",
subtitle = "Common sample, log scale", x = NULL,
y = "Cumulative wealth (log)", colour = NULL) +
theme_minimal() + theme(legend.position = "bottom")
Story. The market earns the highest raw
return but at ~15% volatility and a brutal ~56% drawdown. Our long–short
factors deliver lower raw returns with a fraction of the volatility and
half the drawdown, so gp_at actually edges out the market
on a Sharpe basis. The key diversification point: these
are market-neutral by construction, so their value to a
portfolio is not the standalone Sharpe but their low (even negative)
correlation with market risk — which is exactly what the spanning
regressions quantify.
corr <- factors %>% select(gp_at, tax_gr1a, at_gr1) %>% cor(use = "pairwise.complete.obs")
corr %>% as.data.frame() %>% rownames_to_column("factor") %>%
pivot_longer(-factor, names_to = "factor_2", values_to = "rho") %>%
mutate(across(c(factor, factor_2), ~ recode(.,
gp_at = "Gross profits-to-assets", tax_gr1a = "Tax expense surprise",
at_gr1 = "Asset growth"))) %>%
ggplot(aes(factor, factor_2, fill = rho)) +
geom_tile() + geom_text(aes(label = round(rho, 2))) +
scale_fill_gradient2(low = "#c0392b", mid = "white", high = "#2c3e50",
midpoint = 0, limits = c(-1, 1)) +
labs(title = "Correlation matrix of factor returns", x = NULL, y = NULL, fill = "ρ") +
theme_minimal() + theme(axis.text.x = element_text(angle = 25, hjust = 1))
Story. Exactly the pattern predicted in (b) Factor interpretation: the two profitability/quality factors are positively correlated with each other (ρ ≈ +0.52) and both are negatively correlated with the asset-growth series (ρ ≈ −0.40 and −0.64). Profitable, high-quality firms are the financially conservative, slow-growing firms the asset-growth factor is long. The three are not redundant, but they are clearly facets of one underlying investment-and-profitability theme.
We regress each factor on three competing models and read the intercept (alpha) as the part of the premium the model fails to explain, and R² as how much of the factor’s variation it captures. All t-statistics use Newey–West (6 lags) standard errors. JKP factors are already excess/long–short, so we do not subtract RF.
To compare the models fairly we estimate all three on the same sample (the q5 sample, which starts in 1967).
reg <- factors %>%
inner_join(ff_factors %>% select(date, Mkt_RF, SMB, HML), by = "date") %>%
inner_join(q5_factors %>% select(date, R_MKT, R_ME, R_IA, R_ROE, R_EG), by = "date") %>%
filter(if_all(c(gp_at, tax_gr1a, at_gr1, Mkt_RF, SMB, HML,
R_MKT, R_ME, R_IA, R_ROE, R_EG), ~ !is.na(.)))
cat("Common sample:", format(min(reg$date)), "to", format(max(reg$date)),
" (", nrow(reg), "months )\n")
## Common sample: 1967-01-31 to 2024-12-31 ( 696 months )
nw <- function(m) lmtest::coeftest(m, vcov. = sandwich::NeweyWest(m, lag = 6, prewhite = FALSE))
fit_all <- function(fname) {
forms <- list(
CAPM = reformulate("Mkt_RF", fname),
FF3 = reformulate(c("Mkt_RF", "SMB", "HML"), fname),
q5 = reformulate(c("R_MKT", "R_ME", "R_IA", "R_ROE", "R_EG"), fname)
)
imap_dfr(forms, function(f, model) {
m <- lm(f, reg)
a <- nw(m)["(Intercept)", ]
tibble(factor = fname, model = model,
alpha_ann = a["Estimate"] * 12, t_alpha = a["t value"],
p_alpha = a["Pr(>|t|)"], r2 = summary(m)$r.squared)
})
}
spanning <- bind_rows(fit_all("gp_at"), fit_all("tax_gr1a"), fit_all("at_gr1")) %>%
mutate(model = factor(model, levels = c("CAPM", "FF3", "q5")))
spanning %>%
mutate(factor = recode(factor, gp_at = "Gross profits-to-assets",
tax_gr1a = "Tax expense surprise", at_gr1 = "Asset growth"),
alpha_ann = percent(alpha_ann, 0.01),
across(c(t_alpha, p_alpha, r2), ~ round(.x, 3))) %>%
show_tbl(caption = "Spanning regressions: annualised alpha, t-stat, p-value, R²")
| factor | model | alpha_ann | t_alpha | p_alpha | r2 |
|---|---|---|---|---|---|
| Gross profits-to-assets | CAPM | 3.00% | 2.465 | 0.014 | 0.035 |
| Gross profits-to-assets | FF3 | 4.88% | 5.426 | 0.000 | 0.361 |
| Gross profits-to-assets | q5 | 1.07% | 0.926 | 0.355 | 0.308 |
| Tax expense surprise | CAPM | 0.73% | 0.817 | 0.414 | 0.040 |
| Tax expense surprise | FF3 | 2.13% | 2.661 | 0.008 | 0.315 |
| Tax expense surprise | q5 | 1.10% | 1.336 | 0.182 | 0.496 |
| Asset growth | CAPM | 5.37% | 3.418 | 0.001 | 0.160 |
| Asset growth | FF3 | 2.68% | 3.201 | 0.001 | 0.642 |
| Asset growth | q5 | -0.22% | -0.180 | 0.857 | 0.597 |
spanning %>%
mutate(factor = recode(factor, gp_at = "Gross profits-to-assets",
tax_gr1a = "Tax expense surprise", at_gr1 = "Asset growth"),
sig = p_alpha < 0.05) %>%
ggplot(aes(model, alpha_ann, fill = sig)) +
geom_col() + facet_wrap(~ factor) +
geom_hline(yintercept = 0, linewidth = 0.3) +
scale_y_continuous(labels = percent) +
scale_fill_manual(values = c(`TRUE` = "#c0392b", `FALSE` = "#95a5a6"),
labels = c(`TRUE` = "Sig. at 5%", `FALSE` = "Not sig.")) +
labs(title = "Annualised alpha by model",
subtitle = "Red = the model fails to explain the factor (alpha ≠ 0)",
x = NULL, y = "Annualised alpha", fill = NULL) +
theme_minimal() + theme(legend.position = "bottom")
Story — this is the punchline of the assignment.
gp_at ≈ 4.9%, t ≈ 5.4). FF3
has no profitability or investment leg, so it cannot
price a profitability or investment anomaly — value (HML) is the wrong
tool, and controlling for it can even sharpen the residual premium.gp_at ≈ 1.1%,
p ≈ 0.36; tax_gr1a ≈ 1.1%, p ≈ 0.18; at_gr1 ≈
−0.2%, p ≈ 0.86), with R² comparable to or above FF3. This is
exactly what theory predicts: gp_at and
tax_gr1a load on the q5 profitability legs
(ROE, EG) and at_gr1 loads on the
investment leg (IA). Our three characteristics are, in
the language of Hou et al. (2021), manifestations of the investment and
profitability dimensions that the q-factor model is purpose-built to
span.Which model is most suitable? The augmented q5 model. It both fits the data (high R²) and explains the premium (alpha ≈ 0). FF3 fits the comovement but mis-prices the level; CAPM does neither. The economically right control for an investment/profitability factor is an investment/profitability model — not a size/value one.
| factor | best model (alpha→0) | FF3 alpha | q5 alpha | q5 p-value |
|---|---|---|---|---|
| Asset growth | q5 | 2.68% | -0.22% | 0.857 |
| Gross profits-to-assets | q5 | 4.88% | 1.07% | 0.355 |
| Tax expense surprise | q5 | 2.13% | 1.10% | 0.182 |
gp_at is the
best risk-adjusted standalone bet. Premia are regime-dependent but
survive publication (no significant post-publication
decay).tibble(
file = c("group_20_code.R", "group_20_data.RData", "group_20_slides.pdf"),
status = c("knit/export this Rmd to .R via knitr::purl",
if (file.exists(data_file)) "present" else "run once to build",
"to be built from this analysis (next iteration)")
) %>% show_tbl()
| file | status |
|---|---|
| group_20_code.R | knit/export this Rmd to .R via knitr::purl |
| group_20_data.RData | present |
| group_20_slides.pdf | to be built from this analysis (next iteration) |