Commits

Anonymous committed a7845e8

Examples to 3.11 paragraph

Comments (0)

Files changed (38)

StatSimulationBased.txt

+Content-Type: text/x-zim-wiki
+Wiki-Format: zim 0.4
+Creation-Date: 2011-08-07T21:42:01+03:00
+
+====== StatSimulationBased ======
+Created Sunday 07 August 2011
+
+=== About ===
+	The Foundations of Statistics: A Simulation-based Approach,
+	 Shravan Vasishth, Michael Broe: Books http://amzn.to/oGOBSs
+	{{./Screenshot - 08202011 - 03:31:26 PM.png}}
+
+
+=== Content ===
+	**1. Getting Started.**
+	**2. Randomness and Probability.**
+	**3. The Sampling Distribution of the Sample Mean.**
+	4. Power.
+	5. Analysis of Variance (NOVA).
+	6. Bivariate Statistics and Linear Models.
+	7. An Introduction to Linear Mixed Models.
+
+
+=== Some simple command in Numpy ===
+#creating array
+>>> import numpy as np
+>>> scores = np.int_([99, 97, 72, 56, 88, 80, 74, 95, 66, 57, 89])
+
+#minimum and maximum values of given array
+>>> scores.max()
+1: 99
+>>> scores.min()
+2: 56
+
+Mean:
+>>> scores.mean()
+3: 79.36363636363636
+
+Variance - it tells you how far away the individual scores are from the mean score on average, and it's defined as follows:
+
+{{./equation001.png?type=equation}}
+>>> scores.var()
+4: 219.68595041322314
+
+* **Standard Deviation**
+	{{./equation.png?type=equation}}
+	* **Why do we divide by n-1 and not n?**
+		1. The sum of deviations from the mean is always zero, so if we know n-1 of deviations, the last deviation is predictable.
+	* The unrelated numbers that give us the mean and st.dev are also called the degrees of freedom
+>>> scores.std()
+5: 14.821806583990467
+
+* **Median** - midpoint of a sorted (increasing order) list of distribution
+>>> np.median(scores)
+6: 80.0
+
+* **Quartiles/Percentiles**
+	* quartiles Q1 and Q2 are measures of spread around the median - they are the median of the observations below (Q1) and above (Q3) the 'grand' median
+	* **Interquartile range (IQR)**: Q3-Q1
+	* 5number summary: **MIN, Q1, MEDIAN, Q3, MAX**
+>>> np.percentile(scores, 25)
+7: 69.0
+>>> np.percentile(scores, [25, 75])
+8: [69.0, 92.0]
+
+=== Graphical summaries ===
+
+* **Boxplot - **essentially shows the 5-number summary. The box in the middle has a line going through it, that's the median. The lower and upper ends of the box are Q1 and Q3 respectively, and the 2 'whiskers' at either end of the box extend to the minimumand maximum value.
+>>> plt.boxplot(scores)
+* **Histogram **- shows the number of scores that occur within particular ranges.
+>>> plt.hist(scores)
+
+=== Randomness and probability ===
+
+Many random phenomena have the following property:
+	while they are unpredicable in specific individual cases, they follow predictable laws in the aggregate.
+* **The sum and product rules**
+	* **Probability mass**// - total 'weight' of event over all the logically possible  outcomes.//
+	1. **Sum Rule: **The probability of mutually exclusive events occurring is the sum of probabilities of each of the events.
+
+	2. **Product Rule: **When 2 or more events are independent, the probability of both of them occuring is the product of their individual probabilities.
+
+#generating 10 random binomial value:
+#one stone example:
+>>> stats.binom.rvs(1,0.5, size = 10)
+>>> np.sum(stats.binom.rvs(1,0.5, size = 10)) * 1.0 / 10
+18: 0.40000000000000002
+>>> np.sum(stats.binom.rvs(1,0.5, size = 1000)) * 1.0 / 1000
+19: 0.48199999999999998
+#40 stones 
+>>> stats.binom.rvs(40,0.5, size = 10)
+21: array([12, 15, 29, 20, 18, 21, 19, 20, 17, 19])
+#plotting 1000 experiments
+>>> results = stats.binom.rvs(40, 0.5, size = 1000)
+>>> plt.hist(results, bins = 40)
+
+**The Binomial Distribution**
+The binomial theorem allows us to compute the probability of k Right-stone hits (success) when we make n observations (trials), when the probability of a Right-stone hit (success) is p:
+		{{./equation002.png?type=equation}}
+The binomial theorem can be applied whenever there are only 2 possible primitive outcomes, the fixed, n trials are mutually independent, and the probability p of a 'success' is the same for each trial.
+
+#The number of ways we can arrange 3 R's in 4 positions - aka finding **Binomial Coefficient**
+>>> scipy.misc.comb(4,3)
+0: array(4.000000000000001)
+>>> scipy.misc.comb(4,[1,4])
+1: array([ 4.,  1.])
+>>> outcomes = scipy.misc.comb(40, [x for x in xrange(0,40)])
+>>> plt.plot(outcomes)
+
+=== 2.3 Practical example Balls in a box ===
+//Suppose we have 12,000 balls in a big box, and we know that 9000 (3/4) are Red, the others White. We say we have a population of 12,000. Suppose we take a RANDOM SAMPLE of 100 balls from these 12,000. We'd expect to draw about 75 white balls. What's the probability of getting exactly 75? source: //[[./ex2_box.py|Balls in box]]
+
+A number that describes some aspect of a sample is called **a statistics**. The particular statistics we are computing here is the **sample count,** and if we plot the results we will be able to get an idea of the **sampling distribution** of this statistic. 
+
+**As the sample size goes up, the probability of the most likely sample count goes down. The spread, or standard deviation, decreases as we increase sample size.**
+Demo in file: [[./ex2-1_probintervals.py]]
+* In the binomial distribution - most of the probability is clustered around the mean
+* Most important conceptual steps in statistical inference:
+	//If the sample count is within 6 of the mean 95% of the the time, then 95% of the time the mean is within 6 of the sample count.//
+* The accuracy of the confidence interval increases with a sample size.
+* **Statistic** describes some aspect of a sample, a **Parameter** describes some aspect of a population.
+* The spread, or **standard deviation**, decreases as we increase sample size.
+* Mean minimizes variance.
+
+=== The binomial versus the Normal Distribution ===
+normal distribution:
+{{./equation003.png?type=equation}}
+
+One important difference between the normal and binomial distributions is that the former refers to continuous dependent variable, whereas the latter refers to discrete binomial variable.
+
+==== Chapter 3. The sampling distribution of the sample mean. ====
+* Standard deviation of the distribution of means gets smaller as we increase sample size.
+* As the sample size is increased, the mean of the sample means comes closer and closer to the population mean mu_x.
+* There is lawful relationship between the standard deviation sigma of the population and the standard deviation of the distribution of means:
+	{{./equation004.png?type=equation}}
+* **Central limit theorem:** Provided the sample size is large enough, the sampling distribution of the sample mean will be close to normal irrespective of what the population's distribution looks like.
+* The sampling distributions of various statistics (the sampling distribution of the sample mean, or sample proportion, or sample count) are nearly normal. The noral distribution implies that a sample statistics that is close to the mean has a higher probability than one that one that is far away.
+* The mean of the sampling distribution of the sample mean is the same as the population mean.
+* It follows from the above two facts that the mean of sample is more likely to be close to the population mean than not.
+
+=== s is an unbiased estimator of ===
+	* source: [[./ex3#5sample_sds.py]]
+	* we'll see that any one sample's standard deviation s is more likely to be close to the population standard deviation δ .
+	* If we use s as an estimator of δ, we're more likely than not get close to the right value: we say s is an unbiased estimator of δ . This is true even if the population is not normally distributed.
+	* Notice that the Standard Error will vary from sample to sample, since the estimate s of the population parameter δ will vary from sample to sample. And of course, as the sample size increases the estimate s becomes more accurate, as does the SE, suggesting that the uncertainty introduced by this extra layer of estimation will be more of an issue for smaller sample sizes.
+	* If we were to derive some value v for the SE, and simplyplug this in to the normal distribution for the sample statistics, this would be equivalent to claiming that v really was the population parameter δ .  What we require is a distribution whose shape has greater uncertainty built into it than the the normal distribution.
+
+
+* === The t-distribution ===
+
+	* source: [[./ex3-6tdist.py]]
+	* In the limit, if the sample were the size of the entire population, the t-distribution would be the normal distribution, so the t-curve becomesmore normal as sample size increases. 
+	* This distribution is formally defined by the degrees of freedom  and has more of the total probability located in the tails of the distribution. It follows that the probability of a sample mean being close to the true mean is slightly lower when measured by this distribution, reflecting ou greater uncertainty,
+	* standart error: {{./equation005.png?type=equation}}
+
+=== The one-sample t-test ===
+
+	* source:  [[./ex3-7tsample.py]]
+	* q: How many SE's do we need to go to the left and right of the sample mean, within the appropriate t-distribution, to be 95% sure that the population mean lies in that range?
+	* A: In the pre-computing days, people used to look up a table that told you, for n-1 degrees of freedom, how many SE's you need to go around the sample mean to get a 95% CI.
+
+=== Some observations on Confidence Intervals ===
+
+	* source: [[./ex3-8.py]]
+	* One importantpoint to notice is that the range defined by the confidence interal wil vary with each sample even if the sample size is kept constant. The reason is that the sample mean will vary each time, and the standard deviation will vary too.
+	* The sample mean and standard deviation are likely to be close to the population mean and standard deviation,but they are ultimately just estimates of the true parameters.
+	* '95%' confidence interval means:  It's a statement about the probabability that the hypothetical confidence intervals (that would be computed from the hypothetical repeateed samples) will contain the population mean.
+	* When we compute a 95% confidence interval for particular sample, we have only one interval. Strictly speaking, that particular interval does not mean that the probability that the population mean lies within that interval is 0.95. For that statement to be true, it would have to be the case that the population mean is a random variable.
+	* The population mean is a single point value that cannot have a multitude of possible values and is therefore not a random variable. If we relax this assuption, that the population mean is apoint value, and assume instead that the populationmean is in reality a range of possible values, then we could say that any on 95% confidence interval represens the range within which the population mean with probability 0.95.
+
+=== Sample SD, degrees of freedom, unbiased estimators ===
+
+	* source: [[./ex3-9biasedestim.py]]
+	* **Sample standard deviation** s is just the root of the variance: the average distance of the numbers in the list from the mean of the numbers. {{./equation006.png?type=equation}}
+
+=== Summary of the sampling process ===
+	* summary of the notation used
+	**the sample statistic	#	an unbiased estimate of**
+	sample mean 	{{./equation007.png?type=equation}}		#	population mean µ 
+	sample SD s			#	population SD σ 
+	standard error SE_x	#	sampling distribution {{./equation008.png?type=equation}}
+	
+	* **statistical inference** involves a single sample value but assumes knowledge of the sampling distribution which provides probabilities for all possible sample values.
+	* The **statistics** (e.g mean) in a random sample is more likely to be closer to the **population parameter** (the population mean) than not. This follows from the normal distribution of the sample means.
+	* In the limit, the **mean of the sampling distribution** is equal to the population parameter.
+	* The further away a **sample statistic** is from the mean of the sampling distribution, the lower the probability that such a sample will occur.
+	* The standard deviation of the sampling distribution {{./equation009.png?type=equation}} is partially determined by the inherent variability δ in the population, and partially determined by the sample size. It tells us how steeply the probability falls off from the center. 
+		* If {{./equation010.png?type=equation}} is small, then the fall-off in probability is steep: //random samples are more likely to be very close to the mean, samples are better indicators of the population parameters, and inference is more certain//.
+		* If {{./equation011.png?type=equation}} is large, then the fall-off in probability from the center is gradual: //random samples far from the true mean are more likely, samples are not such good indicators of the population parameters, and inference is less certain.//
+	* While we do not know {{./equation012.png?type=equation}} , we can estimate it using SE_x and perform inference using a distribution that is almost normal, but reflects the increase in uncertainty arising from this estimation: **the t-distribution**.
+
+=== 3.11 Significance Tests ===
+	* source:

StatSimulationBased/Screenshot - 08202011 - 03:31:26 PM.png

Added
New image

StatSimulationBased/equation.png

Added
New image

StatSimulationBased/equation.tex

+s = \sqrt{ \frac{\sum_{i = 1}^{n} (x_i - \bar{x})^2 }{n-1}}

StatSimulationBased/equation001.png

Added
New image

StatSimulationBased/equation001.tex

+variance ~ = ~ \frac{(x_1 - \bar{x})^2 +
+	(x_2 - \bar{x})^2 + ... +
+	(x_n - \bar{x})^2}{n-1} = 
+	\frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2

StatSimulationBased/equation002.png

Added
New image

StatSimulationBased/equation002.tex

+p  = \begin{pmatrix}
+		n \\
+		k
+	\end{pmatrix} * p^{k}(1-p)^{n-k}

StatSimulationBased/equation003.png

Added
New image

StatSimulationBased/equation003.tex

+f(x) = \frac{1}{\delta * \sqrt{2\pi}} * E^{-\frac{(x - \mu)^2}{2\delta^2}}

StatSimulationBased/equation004.png

Added
New image

StatSimulationBased/equation004.tex

+\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

StatSimulationBased/equation005.png

Added
New image

StatSimulationBased/equation005.tex

+SE_{\bar{x}} = \frac{s}{\sqrt{n}}

StatSimulationBased/equation006.png

Added
New image

StatSimulationBased/equation006.tex

+s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2

StatSimulationBased/equation007.png

Added
New image

StatSimulationBased/equation007.tex

+ \bar{x}

StatSimulationBased/equation008.png

Added
New image

StatSimulationBased/equation008.tex

+\sigma_{\bar{x}}

StatSimulationBased/equation009.png

Added
New image

StatSimulationBased/equation009.tex

+\delta_{\bar{x}}

StatSimulationBased/equation010.png

Added
New image

StatSimulationBased/equation010.tex

+\delta_{\bar{x}}

StatSimulationBased/equation011.png

Added
New image

StatSimulationBased/equation011.tex

+\delta_{\bar{x}}

StatSimulationBased/equation012.png

Added
New image

StatSimulationBased/equation012.tex

+\delta_{\bar{x}}

StatSimulationBased/ex2-1_probintervals.py

+# -*- coding: utf-8 -*-
+"""
+Created on Wed Aug 10 19:43:19 2011
+
+@author: -
+
+We first focus on how to compute the probability of such an interval;
+we then examine how this probability behaves as we alter the sample size.
+To explore this approach, consider an alternative (simpler) scenario where
+we have 12,000 Red and White balls, and exactly half are Red 
+(p = 0.5).
+We take a sample of 40 balls, and calculate the probability of 
+getting 1. . . 40 Reds:
+
+"""
+#import numpy
+import scipy
+import scipy.stats as stats
+import matplotlib.pyplot as plt
+
+n = 40
+p = 0.5
+probs = stats.binom.pmf(scipy.r_[0:n], n,p)
+print probs
+#what is probability to get 19, 20 or 21 red balls in samp of 40
+print "margin+1: ", scipy.sum(probs[19:21])
+#and for margin +2
+print "margin+2: ", scipy.sum(probs[18:23])
+
+#calc all margin
+mean_index = 21
+intervals = scipy.zeros(20)
+for i in scipy.r_[1:20]:
+    indices = probs[(mean_index-i):(mean_index+i)]
+    intervals[i] = scipy.sum(indices)
+
+print "head of intervals:", intervals[0:9]
+
+fig = plt.figure()
+pt1 = fig.add_subplot(111)
+pt1.set_title("Intervals with margin")
+plt.plot(intervals)
+
+plt.axvline(6.1, ymin = 0, ymax = 0.95, color = 'r')
+plt.axvline(0.95, xmin = 0, xmax = 6.1, color = 'r')
+
+plt.show()

StatSimulationBased/ex2-2BinomNorm.py

+# -*- coding: utf-8 -*-
+"""
+Created on Fri Aug 12 21:18:53 2011
+
+Examples for chapter: 
+    Binomial vs Normal distribution
+"""
+import scipy
+import matplotlib.pyplot as plt
+
+def normal_dist(x, mu, sigma):
+    '''
+    x - item of given sample
+    mu - mean of sample
+    sigma - standart deviation
+    '''
+    return  1/(scipy.sqrt(2*scipy.pi) * sigma) * scipy.exp(1)**(
+    -1.0*((x-mu)**2/(2.0*sigma**2)))
+
+N = 40 
+mu = 20
+sigma = 3
+vals = [normal_dist(x, mu, sigma) for x in scipy.r_[1:N]]
+plt.plot(vals, color ='r')
+#generate vals using built in function
+#vals = scipy.stats.norm.pdf(scipy.r_[1:N], mu, sigma)
+#plt.plot(vals)
+plt.show()
+

StatSimulationBased/ex2_box.py

+# -*- coding: utf-8 -*-
+"""
+Practical example 2.3 Picking Balls
+
+"""
+import scipy
+import scipy.stats
+import numpy
+import matplotlib.pyplot as plt
+
+#helper functions
+def binom_test(k, n, p):
+    """
+    k - number of elements taken
+    n - size of sample
+    p - probability of success
+    """
+    binom_coef = scipy.misc.comb(n, k, exact = 0)
+    result = binom_coef * (p**k) * (1-p)**(n-k)
+    return result
+
+#experiment
+#we code RED as 1, and WHITE as 0
+n_balls = 9000
+n_red = 6000
+n_white = 3000
+n_trials = 1000
+sample_size = 100
+
+reds = numpy.ones(n_red)
+whites = numpy.zeros(n_white)
+box = numpy.append(reds, whites)
+
+results = numpy.zeros(n_trials, dtype = int)
+for i in xrange(1,n_trials):
+    selected_balls = numpy.random.randint(0, n_balls, sample_size)
+    results[i] = numpy.sum(box[selected_balls])
+
+plt.hist(results, bins = 10)
+scipy.stats.relfreq(results)
+plt.show()
+print "probability k=75, n=100, p = 3/4", binom_test(75, 100, 0.75)
+print "Same with scipy function", scipy.stats.binom_test(75, 100, 0.75)

StatSimulationBased/ex3#5sample_sds.py

+# -*- coding: utf-8 -*-
+"""
+Created on Fri Aug 19 18:43:41 2011
+
+sample deviations and population deviation
+
+@author: -
+"""
+import scipy
+import numpy
+import scipy.stats
+import matplotlib.pyplot as plt
+
+trials = 1000
+sample_size = 40
+mean = 60
+sd = 4
+
+trial_sds = scipy.zeros(trials, dtype = numpy.float)
+norm_gener = scipy.stats.norm(loc = mean,
+                              scale = sd)
+for i in scipy.r_[0:trials]:
+    sample = norm_gener.rvs(size = sample_size)
+    trial_sds[i] = numpy.std(sample)
+
+print trial_sds[0:10]
+plt.hist(trial_sds)
+plt.show()
+
+print '''
+If we use s as an estimator of δ, we're more likely than not 
+get close to the right value: we say s is an unbiased estimator
+ of δ . This is true even if the population is not normally 
+distributed. \n Lets demo it using expotential distribution
+'''
+
+exp_gener = scipy.stats.exp(loc = mean, scale = sd)
+trial_sds1 = scipy.zeros(trials, dtype = numpy.float)
+for i in scipy.r_[0:trials]:
+    sample = exp_gener.rvs(size = sample_size)
+    trial_sds1[i] = numpy.std(sample)
+    
+plt.hist(trial_sds1)
+plt.show()
+

StatSimulationBased/ex3-6tdist.py

+# -*- coding: utf-8 -*-
+"""
+Created on Fri Aug 19 20:25:31 2011
+t-distribution
+@author: -
+"""
+import scipy
+import scipy.stats
+import matplotlib.pyplot as plt
+
+start = -4
+stop = 4
+step = 0.01
+deg_freedoms = scipy.array([2, 5, 15, 20])
+
+values = scipy.arange(start, stop, step)
+
+plt.figure(4)
+n = 0
+for i in scipy.r_[1:len(deg_freedoms)]:
+    plt.figure(i)
+    plt.subplot(211)
+    plt.title = "Degree of freedom: %d" % (i) 
+    plt.plot(values, scipy.stats.norm.pdf(values),
+             color = 'r')
+    plt.plot(values, scipy.stats.t.pdf(values, deg_freedoms[i]),
+             color = 'g')
+
+
+plt.show()
+

StatSimulationBased/ex3-7tsample.py

+# -*- coding: utf-8 -*-
+"""
+Created on Fri Aug 19 21:18:59 2011
+
+one sample t-test
+
+@author: -
+"""
+
+import scipy
+import scipy.stats
+
+mean = 60
+stdev = 4
+sample_size = 11
+
+sample = scipy.stats.norm.rvs(loc = mean, scale = stdev,
+                              size = sample_size)
+                              
+[t_value,p_value] = scipy.stats.ttest_1samp(sample,0)
+
+df = sample_size -1 #degree of freedom
+t_crit = scipy.stats.t.ppf(0.975,df)
+
+'''
+The 95% confidence interval is the range between
+t_crit standard errors below the mean, and t_crit ste's above.
+'''
+ste =  scipy.std(sample)/ scipy.sqrt(sample_size); # ste stands for Standard Error of the Mean
+confidence_interval = mean + t_crit * ste * scipy.array([-1,1])
+
+print confidence_interval
+

StatSimulationBased/ex3-8.py

+# -*- coding: utf-8 -*-
+"""
+Created on Fri Aug 19 21:54:39 2011
+
+Some observations on Confidence Interval
+@author: -
+"""
+
+import scipy
+import scipy.stats
+#ste as Standard Error
+ste = lambda x: scipy.std(x)/scipy.sqrt(len(x))
+
+def confidence_interval(sample):
+    mean = scipy.mean(sample)
+    stderr = ste(sample)
+    n = len(sample)
+    lower = mean + scipy.stats.t.ppf(0.225,n - 1)*stderr
+    upper = mean + scipy.stats.t.ppf(0.975,n - 1)*stderr
+    return (lower, upper)
+
+trials = 100
+sample_size = 100
+mean = 60
+stdev = 4
+
+trial_vals = scipy.zeros((trials,2))
+for i in scipy.r_[:trials]:
+    sample = scipy.stats.norm.rvs(size = sample_size,
+                                  loc = mean,
+                                  scale = stdev)
+    trial_vals[i] = confidence_interval(sample)
+
+print trial_vals[1:10]

StatSimulationBased/ex3-9biasedestim.py

+# -*- coding: utf-8 -*-
+"""
+Created on Sat Aug 20 13:45:16 2011
+Why stdev use n-1 and not n?
+@author: -
+"""
+import scipy
+import scipy.stats
+import matplotlib.pyplot as plt
+
+def new_var(sample):
+    mean = scipy.mean(sample)
+    return scipy.sum((sample - mean)**2)/len(sample)
+
+def new_std(sample):
+    return scipy.sqrt(new_var(sample))
+
+trials = 1000
+mean = 0
+stdev = 1
+sample_size = 10
+
+correct = scipy.zeros(trials)
+incorrect = scipy.zeros(trials)
+
+norm_gen = scipy.stats.norm(loc = mean, scale = stdev)
+for i in scipy.r_[0:trials]:
+    sample = norm_gen.rvs(size = sample_size)
+    correct[i] = scipy.std(sample, ddof = 1) # now divisor is n-1
+    incorrect[i] = new_std(sample)
+    #print correct[i], incorrect[i]
+    
+plt.figure(1)
+plt.subplot(211)
+plt.title("Correct stdev, mean %f" % (scipy.mean(correct)))
+plt.hist(correct)
+plt.subplot(212)
+plt.title("Incorrect stdev, mean %f" % (scipy.mean(incorrect)))
+plt.hist(incorrect)
+
+plt.show()

StatSimulationBased/ex3.py

+# -*- coding: utf-8 -*-
+"""
+Created on Fri Aug 12 22:03:06 2011
+==== The sampling distribution of the sample mean. ====
+
+@author: -
+"""
+import scipy
+import scipy.stats
+
+import matplotlib.pyplot as plt
+
+#find probability that val is in sd+2
+norm_dist = scipy.stats.norm(loc = 0, scale = 1.0)
+val = scipy.integrate.quad(norm_dist.pdf, -2,2)
+
+#mean40
+N = 1000
+mu = 60 #mean
+sigma = 4 #stdev
+sample_size = 40
+
+norm_dist = scipy.stats.norm(loc = mu, scale = sigma)
+#plt.hist(norm_dist.rvs(40))
+means = scipy.zeros(N)
+for i in scipy.r_[1:N]:
+    sample = norm_dist.rvs(size = sample_size)
+    means[i] = scipy.mean(sample)
+
+print "mean of means:", scipy.mean(means)
+print "stdev of means:", scipy.std(means)
+
+#histogram of sample.100 exponential vals
+vals = scipy.stats.expon(loc=mu).rvs(100)
+#plt.hist(vals)
+#plt.show()
+
+sample_size = 100
+#calculate mean and sdtev of 1000sample
+exp_dist = scipy.stats.expon(loc = mu)
+for i in scipy.r_[1:N]:
+    sample = exp_dist.rvs(sample_size)
+    means[i] = scipy.mean(sample)
+
+plt.hist(vals)
+plt.show()
+This repository includes pythonized examples of "The Foundation of statistics: A simulation based approach" . 
+
+I tried to keep code examples so authentic as possible with author's ones and most of cases it was possible.
+
+To view short conspectus, you have to use Zim with installed Latex plugin.