# Minor musings on measurement

This is the life story before a recipe that you can skip

Last year I hit my New Year’s resolution of writing 3 blog posts, but just barely. I think this year is going to be very similar. I’ve written 1 out of 3 blog posts, ahhh! Mid-November means I better get hustling.

There is a lot of writing I do for my research that never ends up seeing the light of day. And that’s a shame because the perfectionist in me agonized over it just the same. A lot of this cast-aside writing is the result of trying to teach myself about topics or methods that are already well-accepted and known. So, nothing novel that deserves publication, really…

… but it makes great blog post material! It seems to be reaching people as well, and that is cool. From my blog analytics, I can see that certain posts seem to be particularly handy from mid-October to early November. Hmmm… intro statistics, machine learning, or math stats midterms, methinks? Anyhow, here’s another lil’ ditty along those same lines. Yes, I am phoning in these last few blog posts of the year.

Overview

This blog post is about how we translate real-world phenomena into numbers that we can gain insight from. In other words, measurement! I think that measurement is both widely taken for granted and still hotly debated on several fronts. Measurement is something that has been around for much of human history in different forms, yet also something that is presently at the forefront of modern psychological and medical sciences (think patient-reported outcome measures, etc.) An interesting topic with a rich history, to say the least.

I’m going to focus specifically on different ways of classifying measurements that we commonly see in statistics. These classifications are not always widely accepted and, again, the source of debate in some cases. I also probably won’t get everything right. But it’s useful to think about what implications measurement carries for analysis, interpretation, and beyond. Hopefully, this inspires you to dig deeper into this topic! And if you disagree, promptly locate the comments section at the bottom of this page.

Stevens’ scale typology: nominal, ordinal, interval, and ratio scales

Formally, measurement is the enumeration of, or assignment of numeric labels to, objects, events, or characteristics according to a set of rules. Not all measurements are created equal, with different rules resulting in different scales with different mathematical and statistical properties. Several authors have tried to classify these rules or scales to make explicit: 1) the rules of enumeration; 2) permissible mathematical operations or transformations; and 3) applicable statistical methods.

The most famous of these attempts is probably Stevens’ scale typology. In 1946, S. S. Stevens arranged four measurement scales into a hierarchy ranging from least to most informative: nominal, ordinal, interval, and ratio. Traversing through the hierarchy, the rules of enumeration become more rigorous, ranging from being almost completely arbitrary to requiring the definition of both a meaningful distance and a true zero.

Nominal

Nominal means “existing in name only.” Nominal variables consist of arbitrary labels used to identify individuals (e.g., numbering of football jerseys) or denote group membership (e.g., 1 = Democrat, 2 = Republican, 3 = Independent). As nominal values capture descriptive aspects of participants or objects, they are also often referred to as “qualitative variables.” The rules of enumeration for nominal variables are few, only requiring that the same label is not assigned to different groups or that different labels are not assigned to the same group. When labels are numeric, this implies that any one-to-one function may be applied without impact.

Of the four measurement scales considered, nominal variables have the fewest rules for enumeration. As a result of this freedom, however, one can only really assess of the equality of two nominal observations. That is, we can tell if two people share the same characteristic or belong to the same group. Thus, statistical methods for variables on a nominal scale are also very limited. Statistically, it is possible to: 1) determine the frequency associated with each label; 2) determine the mode; and 3) test hypotheses regarding the distribution of the labels (i.e., chi-square goodness-of-fit test).

Ordinal

Ordinal means “relating to a thing’s position in a series.” Ordinal scales correspond to named and ordered values, inheriting the properties of nominal variables and then some. Many scales used in psychology, and now commonly in medicine, are ordinal scales. The 5-point Likert scale is one example, with levels such as: 1 = very unhappy, 2 = unhappy, 3 = neutral, 4 = happy, and 5 = very happy. The labels of ordinal variables are also somewhat arbitrary but must maintain the natural ordering of the observations. For example, A < B < C, 1 < 2 < 3, and 2 < 5 < 6 are all appropriate enumerations of an ordinal variable with three ordered levels. Any order-preserving transformation f: x > y => f(x) > f(y) may be applied to ordinal scales without impact, including the common log and square-root transforms.

In addition to assessing the equality of two observations, variables on an ordinal scale also permit the assessment of “greater” or “less.” But it is often not possible to assess the distance or calculate a meaningful difference between two ordinal values. Without further validation, we can’t be sure what the distance between 1 = very unhappy and 2 = unhappy is and whether that distance is equivalent to the distance between 4 = happy and 5 = very happy. In fact, a one-point change often varies in meaning across an ordinal scale. When we can map an ordinal scale to the true underlying phenomena, the relationship is often nonlinear or S-shaped, with values in the middle of the scale being more similar than those at the extremes.

As a result of these properties, addition or subtraction with ordinal scales is often neither consistent nor meaningful. It follows that computing the mean or standard deviation is also ill-advised. Furthermore, inference should also be invariant to order-preserving transformations of ordinal scales, as the information contained by the scale does not change. This is not always true of mean-based statistical methods. Statistically, it is therefore possible to use all methods applicable to nominal scales as well as: 1) compute medians or percentiles; and 2) use rank-based or nonparametric methods.

Interval

Interval means “a space between two things.” In addition to the features of ordinal scales, interval scales feature a meaningful, proportionate distance between any two observations. However, interval scales do not have an “absolute zero.” Zero is only defined as a matter of convention, so the ratio of two observations is not meaningful. The most used example of an interval measurement scale is the temperature in Celsius (C): 20C is ten degrees warmer than 10C, but 20C is not twice as hot as 10C.

For any interval observation x, any linear transformation of the form ax + c can be applied, shifting zero by c units but preserving the relative distance between observations, scaled by a factor of a. We can apply a linear transform to convert temperature from Celsius to Fahrenheit: F = (9/5)*C + 32. As distances are proportionate and meaningful, addition and subtraction are valid operations for variables on the interval scale, making them well-suited to traditional mean-based inference methods. Statistically, it is possible to use all methods applicable to ordinal scales as well as: 1) compute (central) moments, including the mean and variance, and 2) use traditional mean-based or parametric methods.

Ratio

Ratio means “the relative relationship between two things.” In addition to a meaningful distance, ratio scales feature a meaningful absolute zero. For any ratio scaled observation x, a transformation of the form ax is applicable. Like interval scales, nonlinear transformations are not permissible for ratio scales. Counts are a common example of data on the ratio scale, with zero corresponding to the absence of objects and “twice as many” being a meaningful statement. Other examples include height and weight. Statistically, it is possible to use all methods applicable to interval scales as well as: 1) compute ratios; and 2) compute statistics unchanged by scaling such as the geometric mean and coefficient of variation.

Continuous and discrete variables, and somewhere in between

Continuous

An outcome, or variable, is considered continuous if there are infinite possible realizations, or observations, within a given interval. In essence, continuous variables are defined on a subset of the real line R = (-inf, +inf) such that there are an infinite number of points between any two observations x and y. Given the infinite number of possible realizations, the probability of realizing any specific value x is zero such that P(X=x)=0, and likewise, ties occur with probability zero such that P(X=y) = 0. Continuous variables are interval or ratio scaled.

Discrete

A variable is considered discrete if it can only assume specific, named values. Discrete variables are commonly, but not always, restricted to subsets of the non-negative integers Z = {0, 1, 2, 3, …}, with no possible realizations between any two integers. As the number of possible realizations is finite, the probability of realizing any specific value x is positive, unlike continuous variables, such that P(X=x)>0. Furthermore, ties can also occur with positive probability such that P(X=y)>0. Discrete variables are typically nominal or ordinal scaled but may also be ratio scaled, as is the case with counts.

Discrete continuous

A common example of a continuous variable is weight. In theory, it would be possible to measure the weight of any two objects with arbitrary precision, yielding, for example, x = 1.1112334… kg and y = 1.1123567… kg. However, the precision of continuous variables is often limited by the instruments used to measure them. Instead, it is much more common to observe weights such as x = 1.11 kg and y = 1.11 kg. In such an instance, variables are referred to as “discretized continuous,” and any specific value x or a tie x = y may have positive probability. However, discretized continuous variables are often treated as continuous variables in practice.

Metric and nonmetric variables

Metric

Brunner, Bathke, and Konietschke (2018) present an alternative classification of variables as “metric” or “nonmetric,” which serves as a means of readily identifying which variables are conducive to mean-based statistical methods or not. Simply, a variable is considered metric if a meaningful distance metric can be constructed and used to order observations. Thus, addition and subtraction are valid operations for metric variables, and moments such as the mean and standard deviation can be calculated and interpreted. Traditional, mean or moment-based statistical methods, such as Z and t-tests, can then be applied to metric variables. Interval and ratio scaled continuous variables are metric, while a subset of discrete variables, such as binary, count, and discretized continuous variables, are metric.

Nonmetric

If a meaningful distance metric cannot be constructed, the variable is considered nonmetric and does not permit the use of mean-based statistical methods. Variables on a nominal scale can only be used to assess the equality of two observations and the distance between any two observations is undefined. Thus, nominal variables are nonmetric. Variables on an ordinal scale can be used to assess whether one observation is “less than,” “equal to,” or “greater than” another so that ranking is possible. However, ordinal variables often feature disproportionate or indeterminate spacing between observations, so that distance metrics are ill-defined for ordinal variables. In this case, nonparametric or rank-based statistical methods are generally suggested for analysis.

Some concluding thoughts

The classifications presented in this blog post themselves range from broader to more specific. These are also just a few examples that I’ve run into myself during my studies. The nonmetric and metric classification is probably the most obscure of the bunch, while discrete and continuous and Stevens’ typology is commonly covered in introductory statistics courses. Stevens’ typology can be split even further into “qualitative” versus “quantitative” measurements. There are certainly other classifications that exist, and there are also measurements that don’t seem to fit neatly into these classifications. If you know of any, please leave a comment below telling me about it!

In modern times, the current debate seems to revolve around whether ordinal measurements can or should be analyzed using means. Serious research has been conducted into the conditions under which ordinal measurements can be treated as interval. Even so, if we apply the mean, what are we to do about interpretation? For example, what does it mean for the average response to increase by half a point for patients receiving treatment compared to control? Work on this issue is also ongoing and encompasses topics like the minimal clinically important difference or the application of rank-based treatment effects in medicine. What do you think is the solution?

Cheers to 2 out of 3! Stay tuned… 