# Overview

title: Subtleties in controlling confounds in inferential statistics subtitle: some surprising, some obvious in retrospect author: Phillip M. Alday institute: University of South Australia date: 29 April 2016, Uni Adelaide

# When all you have is a hammer ...

Animate and inanimate words chosen as stimulus materials did not differ in word frequency ($p$ > 0.05).

Controls and aphasics did not differ in age ($p$ > 0.05).

## Control and ecological validity in conflict

• Experimental control not possible in the brain & behavioural sciences in the same way as in the physical sciences:
• truly artificial stimuli problematic (novel as types & tokens, etc.)
• natural stimuli strongly confounded, with unclear causality and primacy
• Still, we try to match our stimulus, participant, etc. groups

(Sassenhagen & Alday, under review at B&L)

## What's the problem?

Animate and inanimate words chosen as stimulus materials did not differ in word frequency ($p$ > 0.05).

Controls and aphasics did not differ in age ($p$ > 0.05).

## Where do I start?

. . .

### Philosophy

1. You can't accept the null in NHST, only fail to reject it.

. . .

### Statistics

1. You're violating testing assumptions because by design you did not randomly sample.
2. You've performing inferences on a population you don't care about.

. . .

### Pragmatics

1. You're failing to perform the inference you actually care about.

## Philosophy: Accepting the null.

. . .

• Simply put, NHST doesn't have the notion of 'accepting' hypotheses, especially not the null.
• You only reject a hypothesis as having a likelihood (probability conditional on your data model) that is too low to be taken seriously.

## Statistics: Getting useless information.

### Random sampling

• You just aren't doing it by any stretch of the imagination.
• You are actively trying to distort measures of both central location and spread.

### Populations vs. samples

• Inferential statistics, including statistical testing, draw conclusions from the data present about the data absent.
• The absent data are things we don't care about:
• The set of all animate vs. all inanimate nouns
• The set of all possible patients vs. all possible controls
• Alternatively, we have a completely sampled population and there are no absent data.
• So just use descriptive statistics and make sure they match!

## Pragmatics: Testing what you care about.

• Even if we could
• accept the null and
• pretend that we're sampling randomly
• from a population we care about
• we're still answering a boring question:

do these two populations differ systematically in the given feature?

• when we actually care about:

is the variance observed in my manipulation (better or at least partially) explained by the differences in the given feature?

## What to do, what to do

• Stop inferential tests for confound control.
• Try to match groups as closely as possible using purely descriptive statistics (reduce confounds and collinearity).
• If you can (and this is a should could!), explicitly model these confounds as a covariate
• Painful with ANOVA / ANCOVA / other 1970s statistics
• Not a problem with modern (explicit) regression techniques like mixed-effects models
• Which you really should be using anyway for many BBS designs [cf. @clark1973a; @judd.westfall.etal:2012pp; @westfallkennyjudd2014a]

. . .

• And thus you correctly use statistics to answer questions you care about.

# I scream, you scream ...

## Fresh off the presses

\

(DOI: 10.1371/journal.pone.0152719)

## Arrows show causal true relationships

(All model diagrams from John Myles White)

. . .

## Conditional probabilities and modelling (is so hard for frequentists)

• Conditioning effectively "blocks" a given path
• Non-blocked paths allow for "spurious" correlations and false positives

. . .

## Simulated for typical data

(DOI: 10.1371/journal.pone.0152719.g002)

## But all of this follows directly from the GLM

### Standard GLM applications have a "vertical" error/variance term.

$$Y = \beta_0 + \beta_1 X_1 + \varepsilon$$ $$\varepsilon \sim N(0,\sigma)$$

. . .

In other words, we assume:

. . .

1. (Measurement) error/variance only occurs in the dependent variable.
2. We manipulate the independent variables directly and without error.

# The end is near ...

## So what do we do in practice?

• Mind your covariates and latent variables!

. . .

• @westfall.yarkoni:2016p recommend structural equation modelling (SEM).
• Check out the online app: http://jakewestfall.org/ivy/.
• "Traditional" and "modern" regression can still work nicely when we (can) accommodate the correct structure and conditioning in our model.
• PCA, ICA, residualisation, etc. may not bring as much as you hope (unless you're just reducing dimensionality / collinearity) if/because they don't add any structure to the model.

<!-- causal structure of word-length and frequency?-->

## As always ...

You can find my stuff online: palday.bitbucket.org