3bb2f6b
committed
Commits
Comments (0)
Files changed (24)

+48 1StatSimulationBased.txt

+0 0StatSimulationBased/equation029.png

+1 0StatSimulationBased/equation029.tex

+0 0StatSimulationBased/equation030.png

+1 0StatSimulationBased/equation030.tex

+0 0StatSimulationBased/equation031.png

+2 0StatSimulationBased/equation031.tex

+0 0StatSimulationBased/equation032.png

+5 0StatSimulationBased/equation032.tex

+0 0StatSimulationBased/equation033.png

+4 0StatSimulationBased/equation033.tex

+0 0StatSimulationBased/equation034.png

+1 0StatSimulationBased/equation034.tex

+0 0StatSimulationBased/equation035.png

+3 0StatSimulationBased/equation035.tex

+0 0StatSimulationBased/equation036.png

+3 0StatSimulationBased/equation036.tex

+0 0StatSimulationBased/equation037.png

+5 0StatSimulationBased/equation037.tex

+28 0StatSimulationBased/ex52variance.py

+54 0StatSimulationBased/ex53analyzevar.py

+53 0StatSimulationBased/ex53msbetween.py

+59 0StatSimulationBased/ex53mswithinnonequal.py

+25 0StatSimulationBased/ex5anova.py
StatSimulationBased.txt
+ * Gauss noticed that the observational error had a particular distribution: there were more observations close to the truth than not, and errors overshot and undershot with equal probability. The errors in fact have a normal distribution. Thus if we average the observations, the errors tend to cancel themselves out.
+ * Characterizing a sample mean as an error about a population mean is perhaps the simplest possible example of building a STATISTICAL MODEL:
+ * allows us to compare statistical models and decide which one better characterizes the data. (powerful idea)
+ * If an effect α_j is present, the variation between groups increases because of the systematic differences between groups: //the betweengroup variation is due to error variation plus variation due to α_j. So the null hypothesis becomes://
+ * As the sample size goes up, the sample means will be tighter, and the variance will go down, but it will always be positive and skewed right, and thus the mean of this sampling distribution will always overestimate the true parameter.
+ * That is, the difference beween anyt value and the grand mean is equal to the sum of (I) the difference between that value and its group mean and (II) the difference between its group mean and the grand mean.
+ * To get to the variances within and between each group, we simply need to divide each SS by the appropriate degrees of freedom.
+ * The DFtotal and DFbetween are analogous to the case for the simple variance {{./equation034.png?type=equation}}
+ //The number of scores minnus the number of parameters estimated gives you the degrees of freedom for each variance://
+ * The null hypothesis amounts to saying that there is no effect of αj: that any between group variance we see is completely attributable to within group variance:
+ * The key idea of ANOVA: //when the groups' means are in fact identical, the variance of these two distribution is very close to population variance.//
+ * FSTATISTIC  precisely analogous to a tstatistics; and the accompanying sampling distribution  the Fdistribution  can be used precisely like a tcurve to comput the pvalue for a result.
+ * Fdistribution is defined as F(DFa, DFb), where DFa is the degrees of freedom of the MSbetween (numerator) and DFb is the degrees of freedom of the MSwithin (denominator).
+ * **MSwithin** is computing the spread about the mean in each sample: the location of the mean in that sample is irrelevant. As long as the population variances remain identical, MSwithin will always estimate this variance in and unbiased manner.
+ * If the null hypothesis is in fact false (if the population means differ), then it's highly likely that MSbetween is greater than MSwithin, and that the Fratio is significantly greater than 1.
+ * When population means actually differ, for a given sample it is possible that MSbetween is lower and that MSwithin is higher than the population's variances.
+ * A common rule of thumb is that //the results of ANOVA will be approximately correct if the largest standard deviation is less than twice the smallest standard deviation.//