Wiki

Clone wiki

bnpy-dev / Datasets / DatasetList.md

Datasets for mixture models

Real-valued data (for Gaussian likelihoods)

  • Galaxy

Velocities (km/second) of 82 galaxies in a survey of the Corona Borealis region.

http://www.stats4stem.org/r-galaxy-data.html

Source: Roeder, K. (1990) Density estimation with confidence sets exemplified by superclusters and voids in galaxies. Journal of the American Statistical Association, 85, 617–624.

Example analysis: http://projecteuclid.org/download/pdf_1/euclid.aos/1016120364

  • Old Faithful

Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.

http://www.stat.cmu.edu/~larry/all-of-statistics/=data/faithful.dat

Source (??): http://www.jstor.org/stable/2347385

Example analysis: See Bishop's PRML textbook.

Potential extension using lots more data: http://www.geyserstudy.org/geyser.aspx?pGeyserNo=OLDFAITHFUL

  • Flea Beetles

Data were collected on the genus of flea beetle Chaetocnema, which contains three species: concinna (Con), heikertingeri (Hei), and heptapotamica (Hep). Measurements were made on the width and angle of the aedeagus of each beetle. The goal of the original study was to form a classification rule to distinguish the three species.

http://www.dm.unibo.it/~simoncin/FleaBeetles.html

Source: Lubischew, A.A. (1962) On the use of discriminant functions in taxonomy. Biometrics, 18, 455-477.

Also found in: Hand, D.J., et al. (1994) A Handbook of Small Data Sets, London: Chapman & Hall, 254-255.

Updated