Snippets

Firoj Alam confidence score

Created by Firoj Alam last modified
Confidence score
icsiboost does not really generate anything near probabilities. There is a paper that studied the question: Obtaining calibrated probabilities from boosting by Niculescu-Mizil and Caruana.
They advise three solutions:
Logistic Correction
Platt Calibration
Isotonic regression
The first one consists in transforming the scores using this formula: 1/(1+exp(-2*n*score)), where n is the number of weak learners.
It is implemented in icsiboost through the --posteriors option. For instance, on the adult dataset, it results in:
https://code.google.com/p/icsiboost/wiki/OutputProbabilities

How to confidence interval in R?
1. confidence interval for proportions
>phat=576/1000
# find the critial value
>z=qnorm(0.025,mean=0,sd=1,lower.tail=FALSE)
>ME=z*sqrt(phat*(1-phat)/1000)
>phat+c(-ME,-ME)

2nd method: 
>phat=0.5
>binom.confint(576,1000,conf.level=0.95,methods='asymptotic')

2. Sample size: 
>p=0.5
>ME=0.03
>z=qnorm(0.025,mean=0,sd=1,lower.tail=FALSE)
>z^2*p*(1-p)/ME^2
>1067.072 # sample size we need

3. confidence interval for mean
> plastic = read.table('agechange.txt')
> agechange = plastic$V1
> xbar = mean(agechange)
> var = var(agechange)
> t = qt(0.025, df=59, lower.tail=FALSE)
> ME = t* sqrt(var/60)
> xbar + c(-ME, +ME)
[1] 6.415091 7.938242

2nd method:
t.test(agechange, conf.level=0.95)
t.test(agechange, conf.level=0.95)$conf.int


Statistical test for significance
Hypothesis test for proportions:
#576 success out of 1000.
>binom.test(576, 1000, p=0.5, alternative='two.sided')
Hypothesis test for mean:
> bodytemp = read.table('TempData.txt')
> temp = bodytemp$V1
# temp contains the values to take the mean. mu is the theoretical mean value to compare
> t.test(temp, mu=37, alternative='two.sided')

One sample t-test on the differences
#Read data
skeleton = read.table('SkeletonsMatchedPairsData.txt', header=T)

#Attach variable names
attach(skeleton)

#Take a look at the first few observations
head(skeleton)

mydifference = SucheyBrooksError - DiGangiError
cbind(mydifference, Difference)
mydifference == Difference

mean(Difference, na.rm=T)
sd(Difference, na.rm=T)

#one sample diff
t.test(Difference, mu=0, alternative='two.sided')


Two sample paired t-test
> t.test(SucheyBrooksError, DiGangiError, paired=T, mu=0, alternative='two.sided')

#Comparing two proportions
> n1 = 1050
> n2 = 1046
> phat1 = 0.57
> phat2 = 0.42
> #Number of successes
> x1 = round(n1*phat1, 0)
> x1
[1] 598
> x2 = round(n2*phat2, 0)
> x2
[1] 439
> prop.test(c(x1,x2), c(n1,n2), alternative='two.sided', correct=F)

Comments (0)

HTTPS SSH

You can clone a snippet to your computer for local editing. Learn more.