We analyse three US equity factors from the Global Factor Data repository — gross profits-to-assets, tax expense surprise, and asset growth. Using monthly value-weighted returns, we interpret their economic link to investment and profitability, measure risk and return with rolling 36-month windows (including pre- and post-publication subsamples), and compare them to Fama–French benchmarks.

Result. all three earn positive, relatively low-volatility premia; gross profitability has the best risk-adjusted profile; the quality pair correlates positively with each other and negatively with asset growth, as expected. Spanning regressions show that CAPM and the Fama–French three-factor model leave economically large alphas, while the Hou et al. (2021) q5 model explains all three — the appropriate benchmark for factors built on profitability and investment.

1 Packages and helper functions

packages <- c(
  "tidyverse",  # data wrangling + ggplot2
  "lubridate",  # dates
  "zoo",        # rolling windows
  "broom",      # tidy() / glance() on models
  "sandwich",   # Newey-West HAC covariance
  "lmtest",     # coeftest() with a supplied vcov
  "scales",     # percent labels
  "knitr",      # kable tables
  "kableExtra"  # nicer tables (optional, degrades gracefully)
)

for (p in packages) {
  if (!requireNamespace(p, quietly = TRUE)) {
    install.packages(p, repos = "https://cloud.r-project.org")
  }
}
invisible(lapply(setdiff(packages, "kableExtra"), library, character.only = TRUE))
have_kex <- requireNamespace("kableExtra", quietly = TRUE)

# A small wrapper so tables look the same everywhere.
show_tbl <- function(df, caption = NULL, digits = 3) {
  k <- knitr::kable(df, caption = caption, digits = digits, format.args = list(big.mark = ","))
  if (have_kex) kableExtra::kable_styling(k, full_width = FALSE, bootstrap_options = c("striped", "hover")) else k
}

Three statistics recur throughout, so we define them once. Factor returns are fat-tailed and path-dependent, so skewness, excess kurtosis and maximum drawdown matter as much as the mean and volatility.

# Largest peak-to-trough loss of $1 compounded through the series.
max_drawdown <- function(x) {
  wealth   <- cumprod(1 + ifelse(is.na(x), 0, x))
  drawdown <- wealth / cummax(wealth) - 1
  min(drawdown, na.rm = TRUE)
}

skewness <- function(x) {
  x <- x[!is.na(x)]; m <- mean(x); s <- sd(x)
  mean(((x - m) / s)^3)
}

excess_kurtosis <- function(x) {
  x <- x[!is.na(x)]; m <- mean(x); s <- sd(x)
  mean(((x - m) / s)^4) - 3
}

2 Data

We work with monthly returns, all expressed as decimals (0.01 = 1%). There are three blocks:

  1. Our three factors — from the Global Factor Data repository (Jensen, Kelly & Pedersen 2023): USA, monthly, capped value-weighted long–short portfolios.
  2. Fama–French factors — Ken French’s data library, for CAPM and the FF3 model.
  3. Hou et al. q5 factors — for the augmented q-factor spanning regression.

A note on signs. The JKP ret column is already signed so that each factor is long its historically higher-returning leg (the direction column records the sign that was applied). We therefore use ret as-is and never flip it. This is the single most common mistake in this assignment.

# CSVs live in ../data relative to this code/ folder. Override if needed.
DATA_DIR  <- if (dir.exists("../data")) "../data" else "."
data_file <- "group_20_data.RData"
if (file.exists(data_file)) {
  load(data_file)
  message("Loaded cleaned data from ", data_file)
} else {
  ## 1. JKP factor returns ------------------------------------------------
  factor_files <- file.path(DATA_DIR, c(
    "[usa]_[gp_at]_[monthly]_[vw_cap].csv",     # Gross profits-to-assets
    "[usa]_[tax_gr1a]_[monthly]_[vw_cap].csv",  # Tax expense surprise
    "[usa]_[at_gr1]_[monthly]_[vw_cap].csv"     # Asset growth
  ))

  factors <- factor_files %>%
    map_dfr(read_csv, show_col_types = FALSE) %>%
    mutate(date = as.Date(date), ret = as.numeric(ret)) %>%
    select(date, name, ret) %>%
    pivot_wider(names_from = name, values_from = ret) %>%
    arrange(date)

  ## 2. Fama-French factors ----------------------------------------------
  # The CSV has header text + an annual block; keep only the YYYYMM rows.
  ff_lines <- read_lines(file.path(DATA_DIR, "F-F_Research_Data_Factors.csv"))
  ff_monthly_lines <- ff_lines[str_detect(ff_lines, "^[0-9]{6},")]

  ff_factors <- read_csv(
    I(ff_monthly_lines),
    col_names = c("date", "Mkt_RF", "SMB", "HML", "RF"),
    col_types = cols(.default = "c"), show_col_types = FALSE
  ) %>%
    mutate(
      date = ceiling_date(ymd(paste0(date, "01")), "month") - days(1),
      across(c(Mkt_RF, SMB, HML, RF), ~ as.numeric(.) / 100)  # % -> decimal
    )

  ## 3. Hou et al. q5 factors --------------------------------------------
  q5_factors <- read_csv(file.path(DATA_DIR, "q5_factors_monthly_2024.csv"),
                         show_col_types = FALSE) %>%
    mutate(
      date = ceiling_date(ymd(paste0(year, "-", month, "-01")), "month") - days(1),
      across(c(R_F, R_MKT, R_ME, R_IA, R_ROE, R_EG), ~ as.numeric(.) / 100)
    ) %>%
    select(date, R_F, R_MKT, R_ME, R_IA, R_ROE, R_EG)

  save(factors, ff_factors, q5_factors, file = data_file)
  message("Built and saved ", data_file)
}

stopifnot(exists("factors"), exists("ff_factors"), exists("q5_factors"))

A quick sanity check — coverage and missing values. The three factors start in the early 1950s; q5 starts in 1967, which will define the common sample for the head-to-head model comparison later.

tibble(
  series = c("gp_at", "tax_gr1a", "at_gr1", "FF (Mkt/SMB/HML)", "q5"),
  start  = c(format(min(factors$date)), format(min(factors$date[!is.na(factors$tax_gr1a)])),
             format(min(factors$date[!is.na(factors$at_gr1)])),
             format(min(ff_factors$date)), format(min(q5_factors$date))),
  end    = c(format(max(factors$date)), format(max(factors$date)), format(max(factors$date)),
             format(max(ff_factors$date)), format(max(q5_factors$date)))
) %>% show_tbl(caption = "Sample coverage")
Sample coverage
series start end
gp_at 1950-11-30 2025-12-31
tax_gr1a 1951-11-30 2025-12-31
at_gr1 1951-11-30 2025-12-31
FF (Mkt/SMB/HML) 1926-07-31 2026-04-30
q5 1967-01-31 2024-12-31
colSums(is.na(factors))
##     date    gp_at tax_gr1a   at_gr1 
##        0        0       12       12

3 (a) The three factors

Group 20 factors, with their JKP codes and source papers.

Factor JKP code Type Source paper
Gross profits-to-assets gp_at Profitability / quality Novy-Marx (2013)
Tax expense surprise tax_gr1a Earnings-quality / news Thomas & Zhang (2011)
Asset growth at_gr1 Investment Cooper, Gulen & Schill (2008)

4 (b) Factor interpretation and intuition

Gross profits-to-assets (gp_at). Novy-Marx (2013) shows that gross profitability — revenue minus cost of goods sold, scaled by assets — is “the other side of value”. Gross profit sits at the top of the income statement, so it is the cleanest measure of true economic productivity, before the accounting noise of depreciation, R&D and special items further down. Productive firms earn higher average returns; the strategy is long high-productivity, short low-productivity firms. Economically it is a quality premium: a claim that the market under-prices durable operating efficiency.

Tax expense surprise (tax_gr1a). Thomas & Zhang (2011) document that the year-on-year change in tax expense predicts future returns. The intuition is an information / earnings-quality story: taxes are hard to manage, so a jump in the tax accrual is a credible signal that taxable income — and therefore true profitability — is rising faster than the headline earnings number suggests. Investors under-react to this less-salient line item, and the mispricing corrects over the following months. It is a cousin of post-earnings-announcement drift, using the tax line as the surprise.

Asset growth (at_gr1). Cooper, Gulen & Schill (2008) show that firms which aggressively expand their asset base subsequently under-perform. The factor is therefore long low-growth, short high-growth firms (the JKP sign is set accordingly). Two complementary explanations: (i) a q-theory / investment story — firms optimally invest more when their cost of capital (expected return) is low, so high investment mechanically lines up with low future returns; and (ii) a behavioural story — managers over-invest and markets over-extrapolate growth, so empire-building is punished ex-post.

Are they related? Yes, and in an economically coherent way. Profitable, well-run firms (gp_at, tax_gr1a) tend to be financially conservative and to grow their asset base slowly — exactly the firms the asset-growth factor is long. We therefore expect the two profitability/quality factors to be positively correlated with each other and negatively correlated with the raw asset-growth return series. Correlations confirms this. The deeper point, which drives the whole story in spanning regressions: all three are essentially investment and profitability phenomena, which is precisely what the q-factor model is built to price.

5 (c) Historic performance and risk

5.1 Summary statistics

Annualised figures (mean × 12, volatility × √12). Read the table as a risk profile, not just a ranking of returns.

summary_stats <- factors %>%
  pivot_longer(c(gp_at, tax_gr1a, at_gr1), names_to = "factor", values_to = "ret") %>%
  group_by(factor) %>%
  summarise(
    n_months        = sum(!is.na(ret)),
    ann_mean        = mean(ret, na.rm = TRUE) * 12,
    ann_vol         = sd(ret, na.rm = TRUE) * sqrt(12),
    sharpe          = ann_mean / ann_vol,
    skewness        = skewness(ret),
    excess_kurtosis = excess_kurtosis(ret),
    max_drawdown    = max_drawdown(ret),
    .groups = "drop"
  ) %>%
  arrange(desc(sharpe))

summary_stats %>%
  mutate(across(c(ann_mean, ann_vol, max_drawdown), ~ percent(.x, 0.01)),
         across(c(sharpe, skewness, excess_kurtosis), ~ round(.x, 2))) %>%
  show_tbl(caption = "Annualised risk-return profile, full sample")
Annualised risk-return profile, full sample
factor n_months ann_mean ann_vol sharpe skewness excess_kurtosis max_drawdown
gp_at 902 3.49% 7.90% 0.44 0.02 0.92 -33.56%
at_gr1 890 2.68% 9.18% 0.29 1.21 13.54 -34.33%
tax_gr1a 890 1.65% 6.32% 0.26 -0.20 1.68 -28.35%

Story. gp_at is the standout on a risk-adjusted basis — the highest Sharpe ratio with the lowest volatility and near-zero skew, i.e. a steady quality premium. at_gr1 earns a higher raw mean but pays for it with the highest volatility and a large positive skew and fat tails (high excess kurtosis): the asset-growth strategy occasionally delivers big positive months (think sharp reversals when over-extended growth firms crash), which is attractive but lumpy. tax_gr1a is the quietest factor — modest mean, lowest volatility, mild negative skew — consistent with a slow-drip information premium. All three are positive-mean, low-volatility long–short strategies, as expected of published anomalies.

5.2 Cumulative performance

Growth of $1, log scale (a straight line = constant compound rate). We align all three to the first month where every series is available.

cumulative <- factors %>%
  filter(if_all(c(gp_at, tax_gr1a, at_gr1), ~ !is.na(.))) %>%
  transmute(date,
            `Gross profits-to-assets` = cumprod(1 + gp_at),
            `Tax expense surprise`    = cumprod(1 + tax_gr1a),
            `Asset growth`            = cumprod(1 + at_gr1)) %>%
  pivot_longer(-date, names_to = "factor", values_to = "wealth")

ggplot(cumulative, aes(date, wealth, colour = factor)) +
  geom_line(linewidth = 0.8) +
  scale_y_log10() +
  labs(title = "Cumulative performance of the three factors",
       subtitle = "$1 invested in each long-short strategy, log scale",
       x = NULL, y = "Cumulative wealth (log)", colour = NULL) +
  theme_minimal() + theme(legend.position = "bottom")

The slopes echo the table: gp_at compounds most smoothly, at_gr1 is the most jagged, and the gentle upward drift of all three over 70+ years is the visual signature of a genuine risk/mispricing premium rather than noise.

5.3 Rolling-window analysis

A 60-month (5-year) rolling window shows that “the premium” is not constant — it has regimes. This directly addresses the task’s request to examine “periodic breaks in performance”.

roll_w <- 60
rolling <- factors %>%
  filter(if_all(c(gp_at, tax_gr1a, at_gr1), ~ !is.na(.))) %>%
  pivot_longer(c(gp_at, tax_gr1a, at_gr1), names_to = "factor", values_to = "ret") %>%
  group_by(factor) %>% arrange(date) %>%
  mutate(
    roll_mean   = rollapply(ret, roll_w, \(x) mean(x) * 12,  fill = NA, align = "right"),
    roll_vol    = rollapply(ret, roll_w, \(x) sd(x) * sqrt(12), fill = NA, align = "right"),
    roll_sharpe = roll_mean / roll_vol
  ) %>% ungroup() %>%
  mutate(factor = recode(factor,
    gp_at = "Gross profits-to-assets", tax_gr1a = "Tax expense surprise",
    at_gr1 = "Asset growth"))

ggplot(rolling, aes(date, roll_sharpe, colour = factor)) +
  geom_hline(yintercept = 0, linewidth = 0.3, colour = "grey50") +
  geom_line(linewidth = 0.8, na.rm = TRUE) +
  labs(title = "60-month rolling Sharpe ratio",
       subtitle = "Risk-adjusted performance is regime-dependent, not constant",
       x = NULL, y = "Sharpe ratio", colour = NULL) +
  theme_minimal() + theme(legend.position = "bottom")

ggplot(rolling, aes(date, roll_mean, colour = factor)) +
  geom_hline(yintercept = 0, linewidth = 0.3, colour = "grey50") +
  geom_line(linewidth = 0.8, na.rm = TRUE) +
  scale_y_continuous(labels = percent) +
  labs(title = "60-month rolling annualised mean return",
       x = NULL, y = "Annualised mean", colour = NULL) +
  theme_minimal() + theme(legend.position = "bottom")

Story. Every factor spends multi-year stretches below zero — there is no free lunch. Note the cluster of weakness for the quality/profitability factors around the late-1990s tech run-up (when junk and growth dominated) and again in the 2010s, the period highlighted by the “replication crisis” literature (Jensen, Kelly & Pedersen 2023): many anomalies decayed after they became widely known. We test that decay formally next.

5.4 Pre- vs post-publication

If a factor is genuine compensation for risk it should survive publication; if it was a mispricing, arbitrage should erode it once the paper is out. We split each series at its publication year and test whether the post-publication mean is significantly different using a post dummy with Newey–West (HAC, 6 lags) standard errors — appropriate because monthly factor returns are mildly autocorrelated.

pub_dates <- tibble(
  factor = c("gp_at", "tax_gr1a", "at_gr1"),
  publication_date = as.Date(c("2013-01-01", "2011-01-01", "2008-01-01"))
)

pre_post <- factors %>%
  pivot_longer(c(gp_at, tax_gr1a, at_gr1), names_to = "factor", values_to = "ret") %>%
  left_join(pub_dates, by = "factor") %>%
  filter(!is.na(ret)) %>%
  mutate(period = factor(if_else(date < publication_date, "Pre", "Post"),
                         levels = c("Pre", "Post")))

pre_post %>%
  group_by(factor, period) %>%
  summarise(n = n(), ann_mean = mean(ret) * 12, ann_vol = sd(ret) * sqrt(12),
            sharpe = ann_mean / ann_vol, .groups = "drop") %>%
  mutate(across(c(ann_mean, ann_vol), ~ percent(.x, 0.01)), sharpe = round(sharpe, 2)) %>%
  show_tbl(caption = "Performance before vs after publication")
Performance before vs after publication
factor period n ann_mean ann_vol sharpe
at_gr1 Pre 674 2.92% 9.65% 0.30
at_gr1 Post 216 1.96% 7.51% 0.26
gp_at Pre 746 3.44% 7.89% 0.44
gp_at Post 156 3.74% 7.95% 0.47
tax_gr1a Pre 710 1.75% 6.74% 0.26
tax_gr1a Post 180 1.26% 4.26% 0.30
# Formal test: is the post-publication mean different?
pre_post %>%
  mutate(post = as.integer(period == "Post")) %>%
  group_by(factor) %>%
  group_modify(~ broom::tidy(lmtest::coeftest(
    lm(ret ~ post, .x),
    vcov. = sandwich::NeweyWest(lm(ret ~ post, .x), lag = 6, prewhite = FALSE)))) %>%
  filter(term == "post") %>%
  transmute(factor, delta_ann = percent(estimate * 12, 0.01),
            t = round(statistic, 2), p_value = round(p.value, 3)) %>%
  show_tbl(caption = "Change in mean after publication (post dummy)")
Change in mean after publication (post dummy)
factor delta_ann t p_value
at_gr1 -0.96% -0.37 0.714
gp_at 0.30% 0.11 0.915
tax_gr1a -0.49% -0.32 0.747

Story. Means drift a little after publication — down for at_gr1 and gp_at’s Sharpe actually edges up — but for none of the three is the change statistically significant (all p-values well above 0.10). So we cannot reject that these premia survived publication. That is mild evidence for a risk-based rather than purely-arbitraged interpretation, though the short post-publication windows limit the test’s power.

5.5 Comparison with the market

The natural benchmark is the Fama–French market excess return Mkt_RF. It is comparable to our factors because all four are zero-cost / excess-return series.

fac_mkt <- factors %>%
  inner_join(ff_factors %>% select(date, Mkt_RF), by = "date") %>%
  filter(if_all(c(gp_at, tax_gr1a, at_gr1, Mkt_RF), ~ !is.na(.)))

fac_mkt %>%
  pivot_longer(c(gp_at, tax_gr1a, at_gr1, Mkt_RF), names_to = "series", values_to = "ret") %>%
  group_by(series) %>%
  summarise(ann_mean = mean(ret) * 12, ann_vol = sd(ret) * sqrt(12),
            sharpe = ann_mean / ann_vol, max_dd = max_drawdown(ret), .groups = "drop") %>%
  mutate(series = recode(series, gp_at = "Gross profits-to-assets",
           tax_gr1a = "Tax expense surprise", at_gr1 = "Asset growth",
           Mkt_RF = "Market excess return"),
         across(c(ann_mean, ann_vol, max_dd), ~ percent(.x, 0.01)), sharpe = round(sharpe, 2)) %>%
  arrange(desc(sharpe)) %>%
  show_tbl(caption = "Factors vs the market, common sample")
Factors vs the market, common sample
series ann_mean ann_vol sharpe max_dd
Market excess return 7.86% 15.00% 0.52 -55.71%
Gross profits-to-assets 3.74% 7.86% 0.48 -33.56%
Asset growth 2.68% 9.18% 0.29 -34.33%
Tax expense surprise 1.65% 6.32% 0.26 -28.35%
fac_mkt %>%
  transmute(date,
            `Gross profits-to-assets` = cumprod(1 + gp_at),
            `Tax expense surprise`    = cumprod(1 + tax_gr1a),
            `Asset growth`            = cumprod(1 + at_gr1),
            `Market excess`           = cumprod(1 + Mkt_RF)) %>%
  pivot_longer(-date, names_to = "series", values_to = "wealth") %>%
  ggplot(aes(date, wealth, colour = series)) +
  geom_line(linewidth = 0.8) + scale_y_log10() +
  labs(title = "Factors vs market excess return",
       subtitle = "Common sample, log scale", x = NULL,
       y = "Cumulative wealth (log)", colour = NULL) +
  theme_minimal() + theme(legend.position = "bottom")

Story. The market earns the highest raw return but at ~15% volatility and a brutal ~56% drawdown. Our long–short factors deliver lower raw returns with a fraction of the volatility and half the drawdown, so gp_at actually edges out the market on a Sharpe basis. The key diversification point: these are market-neutral by construction, so their value to a portfolio is not the standalone Sharpe but their low (even negative) correlation with market risk — which is exactly what the spanning regressions quantify.

5.6 Correlations

corr <- factors %>% select(gp_at, tax_gr1a, at_gr1) %>% cor(use = "pairwise.complete.obs")

corr %>% as.data.frame() %>% rownames_to_column("factor") %>%
  pivot_longer(-factor, names_to = "factor_2", values_to = "rho") %>%
  mutate(across(c(factor, factor_2), ~ recode(.,
    gp_at = "Gross profits-to-assets", tax_gr1a = "Tax expense surprise",
    at_gr1 = "Asset growth"))) %>%
  ggplot(aes(factor, factor_2, fill = rho)) +
  geom_tile() + geom_text(aes(label = round(rho, 2))) +
  scale_fill_gradient2(low = "#c0392b", mid = "white", high = "#2c3e50",
                       midpoint = 0, limits = c(-1, 1)) +
  labs(title = "Correlation matrix of factor returns", x = NULL, y = NULL, fill = "ρ") +
  theme_minimal() + theme(axis.text.x = element_text(angle = 25, hjust = 1))

Story. Exactly the pattern predicted in (b) Factor interpretation: the two profitability/quality factors are positively correlated with each other (ρ ≈ +0.52) and both are negatively correlated with the asset-growth series (ρ ≈ −0.40 and −0.64). Profitable, high-quality firms are the financially conservative, slow-growing firms the asset-growth factor is long. The three are not redundant, but they are clearly facets of one underlying investment-and-profitability theme.

6 (d) Spanning regressions

We regress each factor on three competing models and read the intercept (alpha) as the part of the premium the model fails to explain, and as how much of the factor’s variation it captures. All t-statistics use Newey–West (6 lags) standard errors. JKP factors are already excess/long–short, so we do not subtract RF.

  • CAPM. \(r = \alpha + \beta\,\text{Mkt\_RF} + \varepsilon\)
  • FF3 (Fama & French 1993): adds SMB, HML.
  • q5 (Hou, Mo, Xue & Zhang 2021): market, size (ME), investment (IA), profitability (ROE), expected growth (EG).

To compare the models fairly we estimate all three on the same sample (the q5 sample, which starts in 1967).

reg <- factors %>%
  inner_join(ff_factors %>% select(date, Mkt_RF, SMB, HML), by = "date") %>%
  inner_join(q5_factors %>% select(date, R_MKT, R_ME, R_IA, R_ROE, R_EG), by = "date") %>%
  filter(if_all(c(gp_at, tax_gr1a, at_gr1, Mkt_RF, SMB, HML,
                  R_MKT, R_ME, R_IA, R_ROE, R_EG), ~ !is.na(.)))

cat("Common sample:", format(min(reg$date)), "to", format(max(reg$date)),
    "  (", nrow(reg), "months )\n")
## Common sample: 1967-01-31 to 2024-12-31   ( 696 months )
nw <- function(m) lmtest::coeftest(m, vcov. = sandwich::NeweyWest(m, lag = 6, prewhite = FALSE))

fit_all <- function(fname) {
  forms <- list(
    CAPM = reformulate("Mkt_RF", fname),
    FF3  = reformulate(c("Mkt_RF", "SMB", "HML"), fname),
    q5   = reformulate(c("R_MKT", "R_ME", "R_IA", "R_ROE", "R_EG"), fname)
  )
  imap_dfr(forms, function(f, model) {
    m  <- lm(f, reg)
    a  <- nw(m)["(Intercept)", ]
    tibble(factor = fname, model = model,
           alpha_ann = a["Estimate"] * 12, t_alpha = a["t value"],
           p_alpha = a["Pr(>|t|)"], r2 = summary(m)$r.squared)
  })
}

spanning <- bind_rows(fit_all("gp_at"), fit_all("tax_gr1a"), fit_all("at_gr1")) %>%
  mutate(model = factor(model, levels = c("CAPM", "FF3", "q5")))

spanning %>%
  mutate(factor = recode(factor, gp_at = "Gross profits-to-assets",
           tax_gr1a = "Tax expense surprise", at_gr1 = "Asset growth"),
         alpha_ann = percent(alpha_ann, 0.01),
         across(c(t_alpha, p_alpha, r2), ~ round(.x, 3))) %>%
  show_tbl(caption = "Spanning regressions: annualised alpha, t-stat, p-value, R²")
Spanning regressions: annualised alpha, t-stat, p-value, R²
factor model alpha_ann t_alpha p_alpha r2
Gross profits-to-assets CAPM 3.00% 2.465 0.014 0.035
Gross profits-to-assets FF3 4.88% 5.426 0.000 0.361
Gross profits-to-assets q5 1.07% 0.926 0.355 0.308
Tax expense surprise CAPM 0.73% 0.817 0.414 0.040
Tax expense surprise FF3 2.13% 2.661 0.008 0.315
Tax expense surprise q5 1.10% 1.336 0.182 0.496
Asset growth CAPM 5.37% 3.418 0.001 0.160
Asset growth FF3 2.68% 3.201 0.001 0.642
Asset growth q5 -0.22% -0.180 0.857 0.597
spanning %>%
  mutate(factor = recode(factor, gp_at = "Gross profits-to-assets",
           tax_gr1a = "Tax expense surprise", at_gr1 = "Asset growth"),
         sig = p_alpha < 0.05) %>%
  ggplot(aes(model, alpha_ann, fill = sig)) +
  geom_col() + facet_wrap(~ factor) +
  geom_hline(yintercept = 0, linewidth = 0.3) +
  scale_y_continuous(labels = percent) +
  scale_fill_manual(values = c(`TRUE` = "#c0392b", `FALSE` = "#95a5a6"),
                    labels = c(`TRUE` = "Sig. at 5%", `FALSE` = "Not sig.")) +
  labs(title = "Annualised alpha by model",
       subtitle = "Red = the model fails to explain the factor (alpha ≠ 0)",
       x = NULL, y = "Annualised alpha", fill = NULL) +
  theme_minimal() + theme(legend.position = "bottom")

Story — this is the punchline of the assignment.

  • CAPM explains almost nothing (R² ≈ 0.04–0.16). These are not market-beta bets; large, significant alphas remain. Unsurprising — the market factor knows nothing about profitability or investment.
  • FF3 lifts R² sharply (to ~0.36–0.64) because SMB and HML are correlated with our characteristics. But look at the alphas: for the profitability factors FF3 alpha actually gets larger and more significant (gp_at ≈ 4.9%, t ≈ 5.4). FF3 has no profitability or investment leg, so it cannot price a profitability or investment anomaly — value (HML) is the wrong tool, and controlling for it can even sharpen the residual premium.
  • q5 is the model that fits. It drives every alpha toward zero and statistically insignificant (gp_at ≈ 1.1%, p ≈ 0.36; tax_gr1a ≈ 1.1%, p ≈ 0.18; at_gr1 ≈ −0.2%, p ≈ 0.86), with R² comparable to or above FF3. This is exactly what theory predicts: gp_at and tax_gr1a load on the q5 profitability legs (ROE, EG) and at_gr1 loads on the investment leg (IA). Our three characteristics are, in the language of Hou et al. (2021), manifestations of the investment and profitability dimensions that the q-factor model is purpose-built to span.

Which model is most suitable? The augmented q5 model. It both fits the data (high R²) and explains the premium (alpha ≈ 0). FF3 fits the comovement but mis-prices the level; CAPM does neither. The economically right control for an investment/profitability factor is an investment/profitability model — not a size/value one.

7 Takeaways

One-line summary per factor
factor best model (alpha→0) FF3 alpha q5 alpha q5 p-value
Asset growth q5 2.68% -0.22% 0.857
Gross profits-to-assets q5 4.88% 1.07% 0.355
Tax expense surprise q5 2.13% 1.10% 0.182
  1. Economics (b). all three are facets of one investment-and-profitability theme — quality firms invest conservatively — which their correlation structure confirms (+0.52 between the quality factors, negative against asset growth).
  2. Performance (c). all three are positive-premium, low-volatility, market-neutral strategies; gp_at is the best risk-adjusted standalone bet. Premia are regime-dependent but survive publication (no significant post-publication decay).
  3. Spanning (d). CAPM and FF3 leave large alphas; the Hou et al. (2021) q5 model spans all three factors (alphas insignificant), making it the most suitable model — because it contains the investment and profitability legs the factors are built on.

8 Deliverable checklist

tibble(
  file = c("group_20_code.R", "group_20_data.RData", "group_20_slides.pdf"),
  status = c("knit/export this Rmd to .R via knitr::purl",
             if (file.exists(data_file)) "present" else "run once to build",
             "to be built from this analysis (next iteration)")
) %>% show_tbl()
file status
group_20_code.R knit/export this Rmd to .R via knitr::purl
group_20_data.RData present
group_20_slides.pdf to be built from this analysis (next iteration)

8.1 References

  • Cooper, M. J., Gulen, H., & Schill, M. J. (2008). Asset growth and the cross-section of stock returns. Journal of Finance, 63(4), 1609–1651.
  • Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56.
  • Hou, K., Mo, H., Xue, C., & Zhang, L. (2021). An augmented q-factor model with expected growth. Review of Finance, 25(1), 1–41.
  • Jensen, T. I., Kelly, B., & Pedersen, L. H. (2023). Is there a replication crisis in finance? Journal of Finance, 78(5), 2465–2518.
  • Novy-Marx, R. (2013). The other side of value: The gross profitability premium. Journal of Financial Economics, 108(1), 1–28.
  • Thomas, J., & Zhang, F. X. (2011). Tax expense momentum. Journal of Accounting Research, 49(3), 791–821.