Materials developed for Texas A&M Unversity - Commerce course CSci 502: Statistics for Computational Science and Analysis. These materials were initially developed in the Summer of 2017. The materials in this course repository are lecutre notes and problem sets derived from the following course textbooks and online resources:
- Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2012). Probability and statistics for engineers and scientists (9th ed). New York: Prentice Hall.
- Akritas, M. (2016). Probability and statistics with R for engineers and scientists. New York: Pearson.
- Wickham, H. & Grolemund, G. (2017). R for data science. O'Reilly Media.
- Software Carpentry: Programming with R, R for Reproducible Scientific Research
Hi and Welcome to our Statistics for Computational Science and Analysis. My name is Derek Harter, and I am a Professor here at the department of Computer Science and will be conducting our class session. A few students I have given permission to try and follow the course remotely, so I am going to be trying to post instructions and notes about our face to face sessions here in e-college. BTW, I am also thinking about setting up a Google hangout or something like that with a webcamera for our in class sessions, so that remote students might still listen in and participate. If the remote students think this would be useful let me know and I'll send instructions and links for the hangout.
This session I am going to be doing two things differently from previous versions of this course. First of all, as already mentioned, I am going to be incorporating using R for teaching, illustration and assignments in the course. I will give some hints and instructions for setting up R and R studio on a windows system below in a moment. Secondly, especially since we will probably only have 3 or 4 students in our face-to-face classes, I am not going to be doing a traditional stand up lecture in front of everyone. We are going to do a reverse structured course. I will assign readings from our textbook and will assume that you will do the readings before attending class. In class we are going to be doing problems and examples from our textbook. I have given a schedule of the problems I will be suggesting we work through in class, though we won't necessarily do all of them, and if students find other problems or questions they want to work on together we will certainly be flexible and work on what those attending find they need most. You can find the homework and in class problem schedule on the left sidebar and in our course syllabus.
About our textbook, we are still requiring you get and read the Walpole, Myers, Myers and Ye "Probability and Statistics for Engineers and Scientists" (9th ed) text. But I am also going to be providing reading notes and using problems and materials from the new text by Akritas "Probability and Statistics with R for Engineers and Scientists" (1st ed). If you have the means it would be good to obtain both texts, though I know they are expensive. If you only get 1 text, stick with the Walpole et al. text. In the syllabus there is a link to the website for a course taught by Akritas in which he has an older version of his text available for free. I suggest you use that version for additional reading, though you might find it sufficient to simply use my reading notes of the Akritas text.
Getting Started (Week 1 activities)
I have made a video where I demonstrate performing all of the following steps on a Windows 10 system. I give more detailed hints and help in the video on how to successfully the Git, MiKTex, R and RStudio software you will need to get a working R/RMarkdown ecosystem running on your own computing system.
You need to perform the following tasks. I will be helping students with these in our first class meeting. If you don't attend our class meeting you need to finish these task by Tuesday, and also complete all of the readings and the suggested R programming tutorial in the first week.
- Get the Walpole, Myers, Myers & Ye textbook. Read Chapter 1
- Install git on your system.
- Clone the repository for this course to your system.
- Install a TeX distribution for RMarkdown Documents, MiKTeX (Windows) or TeXLive (Mac OS X).
- Install R and RStudio on your system.
- Inside of RStudio install all of the packages needed for our course, in order to process and make RMarkdown documents.
- Do the R for Reproducible Scientific Analysis tutorials from the Software Carpentry website. For the first week you should complete tutorials 1-9 (from Introduction up to Vectorization)
Here are more detailed instructions, and links. I am mostly giving links to help students install things on Windows based systems. If you have Windows version 7 or higher you should be able to install and use everything mentioned below for free. If you are a Mac or Linux user and need help let me know, but installation of R and RStudio is usually much easier on those systems (the built in package managers usually have good versions you can install).
1. Get the Course Textbook
The course textbook is:
Probability and Statistics for Engineers and Scientists, 9th Ed. by Walpole, Myers, Myers and Ye, Prentice Hall. ISBN-13: 978-0-321-62911-1
Get the textbook this week, and read Chapter 1.
2. Install git on your system
There is a git repository associated with this class I will be using this to distribute materials and assignments to the class. You will not really need to learn git (though I really encourage you learn a bit about it and how to use it). We will not be collaborating on the project, you will simply be using it to download the materials to read and use for the clase.
- Install git on your system. Git can be downloaded and installed from: https://git-scm.com/download
- When installing on Windows, you should be able to accept all the default options.
3. Clone the Repository for this Course
- Clone our course repository from bitbucket to your system. Our course
repository overview page is here: https://bitbucket.org/dharter/stats-compsci-analysis
- There are several ways to clone a repository using git. I usually open up a dos prompt (command line terminal). From a command line you need to enter the following to clone the course repository:
git clone https://bitbucket.org/dharter/stats-compsci-analysis.git
- This will create a directory named 'stats-compsci-analysis' on your system. Locate this directory as it contains the R/RStudio project you will be working with for the class.
4. Install a TeX Distribution for RMarkdown Documents
In order to knit RMarkdown documents to PDF files, we need a working TeX distribution.
For Windows, it is recommended to downlaod and install the MiKTeX distribution: https://miktex.org/2.9/setup There is a note that we need to install the Complete rather than the Basic distribution if using MiKTeX. So don't download the Basic Installer. Instead download the Net Installer, which allows you to download and install a complete LaTeX system. You will have to run the installer twice. First download the MiKTeX distribution (select a download of the complete distribution). This will only download the files, and the installer will exit. Run the installer again, and select to install it.
For Mac OS X, it is recommended to use the TeXLive distribution: https://tug.org/mactex
5. Install R and RStudio on you System
R and RStudio should install without issues on a Windows system (and as mentioned are easily obtainable on Mac and Linux OS)
Download and Install R (R-3.5.2 is most recent version, as of Spring 2019) The url to download the installer for R is: https://cran.rstudio.com/bin/windows/base/
Download and Install RStudio The url to download the installer for RStudio is: https://www.rstudio.com/products/rstudio/download2/
6. Install needed R Packages in for R/RStudio
You need to install some additional R packages for this course. The easiest way to do this is to start RStudio that you just installed and install the additional packages from there. There is a pull down menu under Tools -> Install Packages that will allow you to do this, or you can install packages from the console/command line. The list of packages you need for the course can be found in our repository in the file named config/global.dcf (the libraries property). Basically, from the console command line, you can do this:
install.packages(c('reshape', 'plyr', 'dplyr', 'ggplot2', 'stringr', 'lubridate', 'tidyverse', 'latex2exp', 'scatterplot3d', 'igraph', 'gtools'))
There are additional packages besides these we will need in order to Knit RMarkdown documents into pdf files. However, the first time you try and Knit a .Rmd file it will say that packages are missing and ask if you want to install them. When this happens go ahead and answer yes and then you will have all the packages you need for the course.
7. R for Reproducible Scientific Analysis tutorials on Software Carpentry website
The link for the Software Carpentry site is given in the syllabus. Here it is again: https://software-carpentry.org/lessons/
I suggest you do the first 9 tutorials/lessons in the R for Reproducible Scientific Analysis, these are 1. Introduction to R and RStudio through 9. Vectorization The others will be useful as well, but those should help out for the first week or 2.