Emily L. Norton - Population Attributable Fraction (PAF)

1. The set-up

In the last section, we discussed individual risk and realized that it was really hard to estimate! Fortunately, we found that population-level estimates were much easier to make. We also discussed how risk estimates were based on a model with specific covariates comprising a set of risk factors and potential confounding variables. In this page, we ask questions surrounding how risk for a disease would change at a population level if a risk factor were eliminated - e.g. imagine a world where no one smokes, everyone drinks clean water, we all exercise regularly, etc. Let’s explore this concept called “population attributable fraction” or PAF.

Show the code

# Turn off axes
Noax <- list(    
  title = "",
  zeroline = FALSE,
  showline = FALSE,
  showticklabels = FALSE,
  showgrid = FALSE
)

# Set up vector for diseased individuals to be colored differently
N <- 100
diseased_inds <- union(union(seq(8,100,10),seq(9,100,10)),seq(10,100,10))
diseased_vec <- vector(mode = "character", length = 100)
diseased_vec[diseased_inds] <- 'rgb(255, 65, 54)'
diseased_vec[-diseased_inds] <- 'rgb(66, 209, 209)'
risk_factor_inds <- c(1:20)
risk_factor_vec <- vector(mode = "character", length = 100)
risk_factor_vec[diseased_inds] <- 'rgb(255, 65, 54)'
risk_factor_vec[-diseased_inds] <- 'rgb(66, 209, 209)'
risk_factor_vec[risk_factor_inds] <- 'rgb(0,0,0)'  #'rgb(145, 0, 0)'


# Set up text vector
z_text <- vector(mode = "character", length = 100)
z_text[diseased_inds] <- "sick"
z_text[-diseased_inds] <- "healthy"
z_text[risk_factor_inds] <- 'risk factor'

data_100_diseased_RF <- data.frame(x = rep(seq(1,10,1), each = 10), y = rep(seq(1,10,1), times = 10), z = z_text)


fig_100_diseased_RF <- plot_ly(data_100_diseased_RF, x = ~x, y = ~y, text = ~z, type = 'scatter', mode = 'markers', 
        marker = list(size = 20, 
                      opacity = 0.7, 
                      color = diseased_vec, 
                      line = list(color = risk_factor_vec,
                                  width = 3)),
        hoverinfo = 'text')   # TODO: don't show location; change color

fig_100_diseased_RF <- fig_100_diseased_RF %>% 
  layout(title = 'What proportion of this disease is attributable to the risk factor?',
         xaxis = Noax,
         yaxis = Noax) %>%
  config(displayModeBar = FALSE)

fig_100_diseased_RF

2. What is population attributable fraction?

Background

Developed by Levin (1953),¹ the concept of population attributable fraction (PAF) was originally used to estimate the impact of smoking on lung cancer risk. PAF is known by many names, including “population attributable risk”, “population attributable risk proportion”, “population attributable risk percentage”, “etiologic fraction”, “excess fraction”, or sometimes just “attributable risk”.^2,3

Mathematical definition

\[ PAF = \frac{\Pr(E)(RR - 1)}{\Pr(E)(RR-1) + 1} \tag{1}\] where \(\Pr(E)\) is the probability of exposure and \(RR = \frac{\Pr(D | E)}{\Pr(D | \bar{E})}\) is the relative risk.

Interpretations

The PAF has two main interpretations. The first is the fractional reduction in average disease risk in a population for a given time interval that would be achieved by eliminating a (known disease-causing) exposure while all other risk factors stayed the same. A slightly more intuitive interpretation is the proportion of disease cases that would be prevented in a population during a specified time interval if no one were exposed to a given risk factor.³

Assumptions

The fundamental assumptions of PAF are:¹ ⁴

Causation: The exposure (aka risk factor) causes the disease.
Stable Unit Treatment Values (SUTVA): No interference between exposures and no hidden variations of exposure.
No confounding: We assume that the unexposed and exposed groups do not have underlying differences other than their exposure status. Thus, if an individual who is exposed had not been, they would have the same risk as an unexposed individual.

Utility

PAF is an epidemiologic measure of the potential impact a public health intervention could have on a disease by eliminating a specific risk factor.⁵ It is important to remember that PAF is the proportion that the disease burden could be reduced - not a direct measure of the number of cases that could be prevented.

3. Formulae for PAF

Now we will investigate several common equations used to calculate PAF (see derivations in Section 8). I have attempted to put them in common terms to aid understanding and cross-connections between them. Also, I have estimated the PAF for three disease/risk factor pairs where possible to help orient the reader and to demonstrate how the formulae may differ: BRCA mutation and breast cancer, smoking and lung cancer, and poor air quality and mortality.^2,6–11

Original equation

The original formula was developed by Levin (1953)¹ in a paper that discussed the relationship between smoking and lung cancer during a time in history when the causal relationship was just starting to take shape. Levin developed an index \(S\) to quantify the “indicated [proportion] of all lung cancer attributable to smoking.” The index, now known as the PAF, was derived as follows:

\[\begin{align} S &= \frac{\text{incidence of lung cancer attributable to smoking}}{\text{total incidence of lung cancer}} \notag \\ &= \frac{rXb \frac{r-1}{r}}{rXb + X(1-b)} \notag \\ &= \frac{rb\frac{r-1}{r}}{rb + (1-b)} \notag \\ &= \frac{b(r-1)}{b(r-1) + 1} \end{align}\]

where \[\begin{align*} X &= \text{incidence of lung cancer in non-smokers} \\ rX &= \text{incidence of lung cancer in smokers} \\ b &= \text{proportion of general population classified as smokers} \\ 1-b &= \text{proportion of general population classified as non-smokers} \end{align*}\]

Thus, a few useful quantities can be defined: \[\begin{align*} rXb + X(1-b) &= \text{incidence of lung cancer in the total population} \\ r = \frac{rX}{X} &= \frac{\text{incidence of lung cancer in smokers}}{\text{incidence of lung cancer in non-smokers}} \\ \frac{r-1}{r} &= \text{proportion of lung cancer in smokers attributable to smoking} \end{align*}\]

Notice that these are all estimable quantities (i.e. statistics) rather than theoretical quantities. However, we can easily translate the final expression to the theoretical quantities underpinning this index, and generalize them as follows:¹²

\[ PAF = \frac{\Pr(E)(RR - 1)}{\Pr(E)(RR-1) + 1} \] where \(\Pr(E)\) is the probability of exposure and \(RR = \frac{\Pr(D | E)}{\Pr(D | \bar{E})}\) is the relative risk.

This quantity can be estimated using observable quantities:

\[ \widehat{PAF} = \frac{(\text{prevalence of the exposure})(\widehat{RR} - 1)}{(\text{prevalence of the exposure})(\widehat{RR} - 1) + 1} \] where \(\widehat{RR} = \frac{\text{incidence in exposed group}}{\text{incidence in unexposed group}}\), usually estimated in a prospective cohort study or approximated by an odds ratio derived from a case-control study when a disease is rare. The prevalence of an exposure is often estimated using cross-sectional studies.

Show the code

# Build example disease dataframe

deaths_air_quality <- data.frame(disease = "mortality", exposure = "poor air quality", RR = 1.062, Prev_exp = 1, rate_pop = 1, rate_unexp = 1, pd = 1, num_prev_cases = 3279857)  # European Env Agency: https://www.eea.europa.eu/publications/assessing-the-risks-to-health#:~:text=Relative%20risks%20capture%20the%20increase,be%20assigned%20to%20specific%20individuals; num_prev_cases = total mortality in US in 2022 (https://www.cdc.gov/nchs/fastats/deaths.htm)
breast_cancer_BRCA <- data.frame(disease = "breast cancer", exposure = "BRCA mutation", RR = 30, Prev_exp = 0.0025, rate_pop = 0.13, rate_unexp = 0.1261, pd = 0.03, num_prev_cases = 240000)  # Table 3 (and Discussion), Antoniou et al., 2003; Nelson et al., 2013 - risk in women; rate_pop = lifetime incidence of BC = 13% in US women, and 3% of cancers are in women with BRCA mutation; num_prev_cases = breast cancer diagnosed per year in women in US,
lung_cancer_smoking <- data.frame(disease = "lung cancer", exposure = "smoking", RR = 7, Prev_exp = 0.12, rate_pop = 0.0625, rate_unexp = 0.0125, pd = 0.85, num_prev_cases = 238300) # RR: O'Keeffe et al., 2018; Prev_exp: CDC Fast Facts; rate_pop: SEER*Stat; pd: NCI: https://www.cancer.gov/news-events/press-releases/2021/lung-cancer-never-smokers#:~:text=Lung%20cancer%20is%20the%20leading,lung%20cancer%20have%20never%20smoked; rate_pop = lifetime incidence of lung cancer = 1/16, 20% of which occur in non-smokers; num_prev_cases =  estimated number of lung cancer cases each year (see calc: 
# 47,660 non-smokers diagnosed with lung cancer each year, accounting for 20% of lung cancer cases (https://www.lungcancerresearchfoundation.org/lung-cancer-facts/?gad_source=1&gclid=CjwKCAjwl4yyBhAgEiwADSEjePEa6HgTX7PzQYb8SNjQNxA6RIU4jvzLIr5VY7pT7uihX02s9tEI4hoCZBUQAvD_BwE)
# total_lung <- 47660/0.2  # --> 238300 total lung cancer cases each year?? 

example_diseases <- rbind(deaths_air_quality,breast_cancer_BRCA, lung_cancer_smoking) %>%
  mutate(descr = paste0(disease, " due to \n", exposure))

Show the code

# Equation 1
RR <- 1:35
Prev_exp <- seq(0,1,0.01)

calc_PAF_eqn1 <- function(RR, Prev_exp) {
  PAF <- (Prev_exp*(RR - 1))/(1 + Prev_exp*(RR-1))
  return(PAF)}

paf_df_eqn1 <- data.frame(RR = rep(RR, times = length(Prev_exp)), Prev_exp = rep(Prev_exp, each = length(RR)))

# Calculate PAF for example diseases
example_diseases <- example_diseases %>%
  mutate(PAF_eqn1 = calc_PAF_eqn1(RR,Prev_exp))
example_diseases$x_anno_1 <- c(5.5, 29, 10)
example_diseases$y_anno_1 <- c(0.94, 0.1, 0.17)

# Make contour plot with points for example diseases
paf_df_eqn1 %>%
  ggplot(aes(RR, Prev_exp, z = calc_PAF_eqn1(RR,Prev_exp))) +
  geom_contour_filled() +
  geom_point(data = example_diseases, shape = 21, size = 3, color = "black",  fill = "white") +
  annotate("text", x = example_diseases$x_anno_1, y = example_diseases$y_anno_1, label = example_diseases$descr, color = "white") +
  labs(x = "Relative risk", y = "Prevalence of exposure", title = "PAF: Original formula", fill = "PAF") +
  theme_minimal() +
  scale_fill_viridis(discrete=TRUE, guide = guide_legend(reverse=TRUE))

Common alternative formula

Using Bayes Theorem and some algebra, we can alternatively express the original equation in another commonly used form:³

\[ PAF = \frac{\Pr(D) - \Pr(D|\bar{E})}{\Pr(D)} \tag{2}\]

where \(\Pr(D)\) is the probability of disease in a population which may contain unexposed and exposed individuals, and \(\Pr(D|\bar{E})\) is the hypothetical probability of disease in that same population if no one were exposed to the risk factor. This quantity can be estimated using observable incidence rates, assuming the unexposed population is representative of the entire population:

\[ \widehat{PAF} = \frac{\text{Incidence}_{\text{population}} - \text{Incidence}_{\text{unexposed}}}{\text{Incidence}_{\text{population}}} \]

Show the code

# Equation 2
rate_pop <- seq(0,1,0.01)
rate_unexp <- seq(0,1,0.01)

calc_PAF_eqn2 <- function(rate_pop, rate_unexp) {
  PAF <- (rate_pop - rate_unexp)/rate_pop
  return(PAF)
}

paf_df_eqn2 <- data.frame(rate_pop = rep(rate_pop,times = length(rate_unexp)), rate_unexp = rep(rate_unexp, each=length(rate_pop)))

my_breaks <- c(-Inf, seq(0,1,0.1))

# Calculate PAF for example diseases
example_diseases <- example_diseases %>%
  mutate(PAF_eqn2 = calc_PAF_eqn2(rate_pop,rate_unexp))
example_diseases$x_anno_2 <- c(0.88, 0.2, 0.16)
example_diseases$y_anno_2 <- c(0.94, 0.22, 0.05)

paf_df_eqn2 %>%
  ggplot(aes(rate_pop, rate_unexp, z = calc_PAF_eqn2(rate_pop,rate_unexp))) +
  geom_contour_filled(breaks = my_breaks) +
  geom_point(data = example_diseases, shape = 21, size = 3, color = "black",  fill = "white") +
  annotate("text", x = example_diseases$x_anno_2, y = example_diseases$y_anno_2, label = example_diseases$descr, color = "white") +
  labs(x = "Incidence in population", y = "Incidence in unexposed", title = "PAF: Common alternative formula", fill = "PAF") +
  theme_minimal() +
  scale_fill_viridis(discrete=TRUE, guide = guide_legend(reverse=TRUE))

Rosen’s formula

Rosen (2013) aimed to develop a more “intuitive” formula to estimate \(PAF\), which relies on only two quantities:⁵

\[ \widehat{PAF} = \frac{A}{N_d} \tag{3}\] where \(A\) is the number of people who developed the disease because of the exposure and \(N_d\) is the total number of people with the disease. While this is a more intuitive formulation, \(A\) cannot be estimated directly.

Show the code

# Equation 3 (Rosen et al., 2013)

# Total number of people with the disease
N_d <- 1:1000  # roughly the population of the US

# People who developed the disease because of the exposure
A <- 1:1000   # roughly the population of the US

calc_PAF_eqn3 <- function(A,N_d) {
  PAF <- A/N_d
  return(PAF)
}

paf_df_eqn3 <- data.frame(x = rep(N_d,times = length(A)), y = rep(A, each=length(N_d)))

paf_df_eqn3 %>%
  ggplot(aes(x, y, z = calc_PAF_eqn3(y,x))) +
 geom_contour_filled(breaks = my_breaks) +
  labs(x = "Total number of cases", y = "Number of cases due to exposure", title = "PAF: Rosen's formula", fill = "PAF") +
  theme_minimal() +
  scale_fill_viridis(discrete=TRUE, guide = guide_legend(reverse=TRUE))

Miettinen’s formula

Another expression developed by Miettinen (1974) relies on the prevalence of the exposure only within the diseased population:¹³

\[ PAF = \Pr(E|D) \left( \frac{RR - 1}{RR} \right) \tag{4}\] where \(\Pr(E|D)\) is the probability of exposure among the cases, and \(RR\) is relative risk. Thus,

\[ \widehat{PAF} = (\text{proportion of cases who were exposed}) \times \left( \frac{\widehat{RR} - 1}{\widehat{RR}} \right) \]

Show the code

# Equation 4
# Relative risk
RR <- 1:35
  
# Proportion of cases exposed to risk factor
pd <- seq(0,1,0.01)

calc_PAF_eqn4 <- function(pd,RR) {
  PAF <- (pd*(RR - 1))/(RR)
  return(PAF)
}

paf_df_eqn4 <- data.frame(RR = rep(RR,times = length(pd)), pd = rep(pd, each=length(RR)))


example_diseases <- example_diseases %>%
  mutate(PAF_eqn4 = calc_PAF_eqn4(pd,RR))

example_diseases$x_anno_4 <- c(5.5, 29, 12)
example_diseases$y_anno_4 <- c(0.94, 0.12, 0.8)

  
paf_df_eqn4 %>%
  ggplot(aes(RR, pd, z = calc_PAF_eqn4(pd,RR))) +
  geom_contour_filled(breaks = my_breaks) + 
   geom_point(data = example_diseases, shape = 21, size = 3, color = "black",  fill = "white") +
  annotate("text", x = example_diseases$x_anno_4, y = example_diseases$y_anno_4, label = example_diseases$descr, color = "white") +
  labs(x = "Relative risk", y = "Proportion of cases exposed", title = "PAF: Miettinen's formula", fill = "PAF") +
  theme_minimal() +
  scale_fill_viridis(discrete=TRUE, guide = guide_legend(reverse=TRUE))

Elasticity formula

Finally, for people with a background in economics, it may be useful to think of PAF as the elasticity of incidence (\(\Pr(D)\)) for a change in prevalence of a risk factor (\(\Pr(E)\)). Elasticity here indicates how disease incidence changes, relative to a baseline incidence, for a given change in the prevalence of a risk factor, relative to a baseline prevalence for that risk factor:¹⁴

\[ PAF = \frac{\Pr(E)}{\Pr(D)}\left( \frac{d\Pr(D)}{d\Pr(E)} \right) \tag{5}\]

where \[ \begin{align*} & d \cdot = \text{derivative} \\ & \Pr(E) = \text{prevalence of exposure (or risk factor) in the population} \\ & \Pr(D) = \text{incidence of disease in the population} \end{align*} \] This formulation is not easily represented graphically.

4. Extensions of PAF

Multi-category risk factors

PAF can also be partitioned into different levels or categories of exposure, and then summed across all (disjoint) categories to get the total PAF. This is known as the “distributive property” of the PAF.¹⁵ Equation 4 can thus be adapted to estimate category-specific PAF for multiple levels of exposure:

\[ PAF_i = \Pr(E_i|D) \left( \frac{RR_i - 1}{RR_i} \right) \]

where \(i\) is the \(i^{th}\) disjoint category of exposure, and \(RR_i\) is the relative risk for the \(i^{th}\) category relative to the reference category, which by definition has \(RR_0 = 0\). By countable additivity, we can then calculate the total PAF as:

\[ \begin{align*} PAF &= \sum_i PAF_i \\ &= \sum_i \Pr(E_i|D) \left( \frac{RR_i - 1}{RR_i} \right) \\ &= 1 - \sum_i \frac{\Pr(E_i|D)}{RR_i} \end{align*} \]

Confounding and adjusted relative risks

If the primary assumption of no confounding is violated (see Section 2.4), then the PAF must be adjusted for confounders.^4,12 Equation 4 can be modified to include adjusted relative risks that account for potential confounders:¹⁶

\[ PAF_a = \Pr(E|D,Z) \left( \frac{RR_a - 1}{RR_a} \right) \\ \] where \(Z\) is the set of confounders that have been used to estimated the adjusted relative risk \(\widehat{RR_a}\).

Generalized attributable fraction

This extension allows a different baseline distribution for the risk factor rather than assuming that it is eliminated completely:¹⁷

\[ GIF = \frac{\Pr(D) - \Pr(D^*)}{\Pr(D)} \]

where \(\Pr(D)\) and \(\Pr(D^*)\) are the probability of disease under the current and modified distributions of exposure, respectively. The \(GIF\) allows for a more realistic real-world estimate of what proportion of diseases would be prevented if, for example, half of the overweight population reduced their BMI by a certain percent, fewer people smoked, etc.

Multiple risk factors

It is commonly recognized that most diseases have multiple risk factors, and if at least one could be eliminated, the disease would not develop. Thus, when you want to calculate \(PAF\) for several risk factors, it is best to calculate \(PAF\) when all of the risk factors are considered simultaneously, otherwise you could erroneously sum \(PAF\) for individual risk factors to obtain a total \(PAF > 1\).³ The only conditions where the total \(PAF = \sum_i PAF_i\) are when 1) no individuals are exposed to multiple risk factors, or 2) when the risks factors have an additive effect on disease incidence.⁴

Variance estimation

Variance can be estimated using the Delta method, and recent advances with bootstrapping and jackknifing have been made.¹²

5. Caution

Although there are relatively few assumptions required in the theoretical PAF, it is often challenging to estimate the it in practice. For example, rarely does one study estimate all of the necessary quantities, so studies often synthesize estimates from different sources. Under this scenario, portability of results becomes a real issue.¹⁸ Researchers must be vigilant to ensure that the definition of the exposure and outcome is consistent and that the relative risks are estimated without confounding, or in the case of adjusted relative risks, that the same covariates were adjusted for across studies.

From a utility standpoint, PAF is most useful to estimate for a risk factor that is modifiable.³ For example, calculating the PAF for a disease based on a risk factor of age, sex, race/ethnicity, genetics, other unchangeable quantities, or diagnostic results is not very helpful for preventing the disease.

6. Summary and Conclusions

PAF is a useful index to estimate the proportion of cases attributable to a specific risk factor. PAF has many common formulae. Although these equations are relatively simple and rely on few theoretical assumptions, researchers must be diligent in implementing them carefully using estimable quantities, often gleaned from diverse sources. Extensions have been developed to accommodate scenarios where multiple exposure levels exist or to deal with confounding by using adjusted relative risk estimates. Since PAF is a proportion, both common diseases with a low PAF or rare diseases with a high PAF could have a significant number of cases prevented by a public health intervention. Thus, in some situations, its useful to estimate the absolute number of preventable cases by multiplying the PAF by the number of prevalent cases.

Show the code

# TODO: can we make the plots interactive so that we get the PAF when we hover over?
PAF <- seq(0,1,0.01)
num_prev_cases <- seq(1,4000000,1000)

calc_abs_lives <- function(PAF, Prev_disease) {
  abs_lives <- PAF * Prev_disease
}

abs_lives_df <- data.frame(PAF = rep(PAF, times = length(num_prev_cases)), num_prev_cases = rep(num_prev_cases, each=length(PAF)))

# Calculate abs cases prevented for example diseases
example_diseases <- example_diseases %>%
  mutate(PAF = PAF_eqn1)
example_diseases$x_anno_cases <- c(0.2, 0.2, 0.55)
example_diseases$y_anno_cases <- c(3010857, 500000, 400000)

my_breaks <- c(0,200000,seq(500000,4000000,500000))

abs_lives_df %>%
  ggplot(aes(PAF, num_prev_cases, z = calc_abs_lives(PAF,num_prev_cases))) +
 geom_contour_filled(breaks = my_breaks) +
  geom_point(data = example_diseases, shape = 21, size = 3, color = "black",  fill = "white") +
  annotate("text", x = example_diseases$x_anno_cases, y = example_diseases$y_anno_cases, label = example_diseases$descr, color = "white") +
  labs(x = "Population attributable fraction", y = "Number of prevalent cases", title = "Absolute number of cases prevented", fill = "Cases prevented") +
  theme_minimal() +
  scale_fill_batlow(discrete=TRUE, guide=guide_legend(reverse=TRUE))

7. Acknowledgements

Thanks to Grace Ringlein, Ciprian Crainiceanu, and Nilanjan Chatterjee for discussion and feedback on this project.

8. Derivations

Equation 2

\[\begin{align*} PAF &= \frac{\Pr(E)(RR - 1)}{\Pr(E)(RR-1) + 1} \\ &= \frac{\Pr(E)(\frac{\Pr(D|E)}{\Pr(D|\bar{E})} - 1)}{\Pr(E)(\frac{\Pr(D|E)}{\Pr(D|\bar{E})}-1) + 1} \\ &= \frac{\frac{\Pr(D,E)}{\Pr(D|\bar{E})} - \Pr(E)}{\frac{\Pr(D,E)}{\Pr(D|\bar{E})} - \Pr(E) + 1} \times \frac{\Pr(D|\bar{E})}{\Pr(D|\bar{E})}\\ &= \frac{\Pr(D,E) - \Pr(D|\bar{E}) \Pr(E)}{\Pr(D,E) - \Pr(D|\bar{E}) \Pr(E) + \Pr(D|\bar{E})} \\ &= \frac{\Pr(D,E) - \Pr(D|\bar{E}) (1-\Pr(\bar{E}))}{\Pr(D,E) - \Pr(D|\bar{E}) (1-\Pr(\bar{E})) + \Pr(D|\bar{E})} \\ &= \frac{\Pr(D,E) - \Pr(D|\bar{E}) +\Pr(D,\bar{E})}{\Pr(D,E) - \Pr(D|\bar{E}) + \Pr(D,\bar{E}) + \Pr(D|\bar{E})} \\ &= \frac{\Pr(D) - \Pr(D|\bar{E})}{\Pr(D)} \end{align*}\]

Equation 3

\[\begin{align*} PAF &= \frac{\Pr(E)(RR - 1)}{\Pr(E)(RR-1) + 1} \\ &= \frac{\Pr(E) \left( \frac{\Pr(D|E)}{\Pr(D|\bar{E})} - 1 \right)}{\Pr(E) \left( \frac{\Pr(D|E)}{\Pr(D|\bar{E})}-1\right) + 1} \times \frac{\Pr(D|\bar{E})}{\Pr(D|\bar{E})}\\ &= \frac{\Pr(E)(\Pr(D|E) - \Pr(D|\bar{E}))}{\Pr(E)(\Pr(D|E) - \Pr(D|\bar{E})) + \Pr(D|\bar{E})} \\ &= \frac{\Pr(E)(\Pr(D|E) - \Pr(D|\bar{E}))}{\Pr(E)(\Pr(D|E) - \Pr(D|\bar{E})) + (1-\Pr(E) + \Pr(E))\Pr(D|\bar{E})} \times \frac{N}{N}\\ &= \frac{N \Pr(E)(\Pr(D|E) - \Pr(D|\bar{E}))}{N\Pr(E)(\Pr(D|E) - \Pr(D|\bar{E})) + N\Pr(E))\Pr(D|\bar{E}) + N(1-\Pr(E))\Pr(D|\bar{E})} \end{align*}\]

We can estimate this theoretical expression using observable quantities:

\[\begin{align*} \widehat{PAF} &= \frac{A}{A + B + C} \\ &= \frac{A}{N_d} \end{align*}\]

where \(N\) is the total number of individuals in the population, \(A\) is the number of exposed individuals whose disease can be attributed to their exposure, \(B\) is the number of exposed individuals whose disease is not attributed to their exposure, \(C\) is the number of non-exposed individuals with the disease, and \(N_d\) is the total number of individuals with disase.

Equation 4

\[\begin{align*} PAF &= \frac{\Pr(E)(RR - 1)}{\Pr(E)(RR-1) + 1} \\ &= \frac{\Pr(E)(RR - 1)}{\Pr(E) \left( \frac{\Pr(D|E)}{\Pr(D|\bar{E})}-1 \right) + 1} \times \frac{\Pr(D|\bar{E})}{\Pr(D|\bar{E})} \\ &= \frac{\Pr(D|\bar{E})\Pr(E)(RR - 1)}{\Pr(E)(\Pr(D|E) - \Pr(D|\bar{E})) + \Pr(D|\bar{E})} \\ &= \frac{\Pr(D|\bar{E})\Pr(E)(RR - 1)}{\Pr(D, E) - \Pr(D|\bar{E})) + \Pr(D, \bar{E}) + \Pr(D|\bar{E})} \\ &= \frac{\Pr(D|\bar{E})\Pr(E)(RR - 1)}{\Pr(D)} \times \frac{\frac{\Pr(D|E)}{\Pr(D|\bar{E})}}{\frac{\Pr(D|E)}{\Pr(D|\bar{E})}} \\ &= \frac{\Pr(D|E)\Pr(E)(RR - 1)}{\Pr(D) \frac{\Pr(D|E)}{\Pr(D|\bar{E})}} \\ &= \Pr(E|D) \left( \frac{RR - 1}{RR} \right) \end{align*}\]

Equation 5

First notice that disease incidence (\(\Pr(D)\)) can be stated as a function of the prevalance of exposure to a risk factor (\(\Pr(E)\)):

\[\begin{align*} \Pr(D) &= \Pr(D,E) + \Pr(D,\bar{E}) \\ &= \Pr(D|E) \Pr(E) + \Pr(D|\bar{E}) \Pr(\bar{E}) \\ &= \Pr(D|E) \Pr(E) + \Pr(D|\bar{E}) (1-\Pr(E)) \\ &= \Pr(D|\bar{E}) + \Pr(E) (\Pr(D|E) - \Pr(D|\bar{E})) \end{align*}\]

Then we can take the derivative of incidence with respect to prevalence of exposure:

\[\begin{align*} \frac{d\Pr(D)}{d\Pr(E)} &= \frac{d(\Pr(D|\bar{E}) + \Pr(E) (\Pr(D|E) - \Pr(D|\bar{E})))}{d\Pr(E)} \\ &= \Pr(D|E) - \Pr(D|\bar{E}) \end{align*}\]

Thus, we can derive the elasticity equation as follows:

References

Levin, M. L. The occurrence of lung cancer in man. Acta-Unio Internationalis Contra Cancrum. 9, 531–41 (1953).

Northridge, M. E. Public health methods–attributable risk as a link between causality and public health action. Am J Public Health 85, 1202–1204 (1995).

Rockhill, B., Newman, B. & Weinberg, C. Use and misuse of population attributable fractions. Am J Public Health 88, 15–19 (1998).

Walter, S. D. Effects of interaction, confounding and observational error on attributable risk estimation. Am J Epidemiol 117, 598–604 (1983).

Rosen, L. An intuitive approach to understanding the attributable fraction of disease due to a risk factor: The case of smoking. Int J Environ Res Public Health 10, 2932–2943 (2013).

O’Keeffe, L. et al. Smoking as a risk factor for lung cancer in women and men: A systematic review and meta-analysis. BMJ Open 8, (2018).

Antoniou, A. et al. Average risks of breast and ovarian cancer associated with BRCA1 or BRCA2 mutations detected in case series unselected for family history: A combined analysis of 22 studies. Am J Hum Genet 72, 1117–1130 (2003).

Nelson, H. D. et al. Risk assessment, genetic counseling, and genetic testing for BRCA-related cancer: Systematic review to update the u.s. Preventive services task force recommendation. (2013).

EEA. Assessing the risks to health from air pollution. https://www.eea.europa.eu/publications/assessing-the-risks-to-health#:~:text=Relative%20risks%20capture%20the%20increase,be%20assigned%20to%20specific%20individuals (2018).

10.

NCI. Surveillance, epidemiology, and end results program.

11.

CDC. Lung cancer statistics.

12.

Benichou, J. A review of adjusted estimators of attributable risk. Stat Methods Med Res 10, 195–216 (2001).

13.

Miettinen, O. S. Proportion of disease caused or prevented by a given exposure, trait or intervention. American journal of epidemiology 99 5, 325–32 (1974).

14.

Hill, G. B. The value of the population attributable risk percentage. Am J Public Health 86, 1483 (1996).

15.

Wacholder, S., Benichou, J., Heineman, E. F., Hartge, P. & Hoover, R. N. Attributable risk: Advantages of a broad definition of exposure. Am J Epidemiol 140, 303–309 (1994).

16.

Greenland, S. Bias in methods for deriving standardized morbidity ratio and attributable fraction estimates. Stat Med 3, 131–141 (1984).

17.

Morgenstern, H. & Bursic, E. S. A method for using epidemiologic data to estimate the potential impact of an intervention on the health status of a target population. J Community Health 7, 292–309 (1982).

18.

Steenland, K. & Armstrong, B. An overview of methods for calculating the burden of disease due to specific risk factors. Epidemiology 17, 512–519 (2006).