Wilcoxon-Mann-Whitney Archives • Statisticelle

Introduction

In a previous blogpost, I described how complex estimation of U-statistic variance can be simplified using a “structural component” approach introduced by Sen (1960). The structural component approach is very similar to the leave-one-out jackknife. Essentially, the idea behind both of these approaches is that we decompose the statistic into individual contributions. Here, these are referred to as “structural components,” and in the LOO jackknife, these are referred to as “pseudo-values” or sometimes “pseudo-observations.” Construction of these individual quantities differs conceptually somewhat, but in another blogpost, I discuss their one-to-one relationship for specific cases. We can then take the sample variance of these individual contributions to estimate the variance of the statistic.

Estimators for increasingly popular win measures, including the win probability, net benefit, win odds, and win ratio, are obtained using large-sample two-sample U-statistic theory. Variance estimators are complex for these measures, requiring the calculation of multiple joint probabilities.

Here, I demonstrate how variance estimation for win measures can be practically estimated in two-arm randomized trials using a structural component approach. Results and estimators are provided for the win probability, the net benefit, and the win odds. For simplicity, only a single outcome is considered. However, extension to hierarchical composite outcomes is immediate with use of an appropriate kernel function.

Consider a two-armed study comparing a placebo and treatment. In general, the probabilistic index (PI) is defined as,

$\text{PI} = P(X < Y) + \frac{1}{2} P(X = Y)$

and is interpreted as the probability that a subject in the treatment group will have an increased response compared to a subject in the placebo group. The probabilistic index is a particularly useful effect measure for ordinal data, where effects can be difficult to define and interpret owing to absence of a meaningful difference. However, it can also be used for continuous data, noting that when the outcome is continuous, $P(X = Y) = 0$ and the PI reduces to $P(X < Y)$ .

$PI = 0.5$ suggests an increased outcome is equally likely for subjects in the placebo and treatment group, while $PI > 0.5$ suggests an increased outcome is more likely for subjects in the treatment group compared to the placebo group, and the opposite is true when $PI < 0.5$ .

Simulation

Suppose $X \sim N(\mu_X, \sigma^{2}_{X})$ and $Y \sim N(\mu_Y, \sigma^{2}_{Y})$ represent the independent outcomes in the placebo and treatment groups, respectively and an increased value of the outcome is the desired response.

We simulate $n_X = n_Y = 50$ observations from each group such that treatment truly increases the outcome and the variances within each group are equal such that $\sigma^{2}_{X} = \sigma^{2}_{Y}$ .

# Loading required libraries
library(tidyverse)
library(gridExtra)

# Setting seed for reproducibility
set.seed(12345)

# Simulating data
n_X = n_Y = 50
sigma_X = sigma_Y = 1
mu_X = 5; mu_Y = 7

outcome_X = rnorm(n = n_X, mean = mu_X, sd = sigma_X)
outcome_Y = rnorm(n = n_Y, mean = mu_Y, sd = sigma_Y)

df <- data.frame(Group = c(rep('Placebo', n_X), rep('Treatment', n_Y)),
                 Outcome = c(outcome_X, outcome_Y))

Examining side-by-side histograms and boxplots of the outcomes within each group, there appears to be strong evidence that treatment increases the outcome as desired. Thus, we would expect a probabilistic index close to 1 as most outcomes in the treatment group appear larger than those of the placebo group.

# Histogram by group
hist_p <- df %>%
  ggplot(aes(x = Outcome, fill = Group)) +
    geom_histogram(position = 'identity', alpha = 0.75, bins = 10) + 
    theme_bw() +
    labs(y = 'Frequency')

# Boxplot by group
box_p <- df %>%
  ggplot(aes(x = Outcome, fill = Group)) +
    geom_boxplot() + 
    theme_bw() +
    labs(y = 'Frequency')

# Combine plots
grid.arrange(hist_p, box_p, nrow = 2)

plot of chunk unnamed-chunk-3

Continue reading The Probabilistic Index for Two Normally Distributed Outcomes

Tag: Wilcoxon-Mann-Whitney

Practical inference for win measures via U-statistic decomposition

Introduction

The Probabilistic Index for Two Normally Distributed Outcomes

Simulation