Account Setup for DNN Training and Testing on Cheetah

Task: Installing Scientific Python Stack using Anaconda

The first step is to install Python and needed scientific libraries. We will use the Anaconda distribution installer for Linux/Unix machines. Links to the Continuum Anaconda install scripts: https://repo.continuum.io/archive/

We should usually choose the most recent version (5.0.1 at time of writing these instructions), and use either a Python 2 or a Python 3 as needed. For these instructions we will install Python 3. Choose Linux 64 bit version of the installer:

Python 2.7: https://repo.continuum.io/archive/Anaconda2-5.0.1-Linux-x86_64.sh
Python 3.6: https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh

Step 1: Download and start the installer

The installer for Linux is a simple bash shell script. After ssh into your cheetah account, do the following to download it and begin installation from the command line.

dnnex@cheetah:~$ wget -c https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh
dnnex@cheetah:~$ bash Anaconda3-5.0.1-Linux-x86_64.sh

The installation script will ask several questions. Perform the following tasks with the installer:

Accept the license agreement
Confirm the default install location, which will be /home/username/anaconda3
Answer yes to the question to prepend the Anaconda3 install location to your PATH. This will add the Python interpreter and anaconda environment to your path, so that you can run python scripts from command line using this installation/environment.

The final step adds the python interpreter to your PATH environment variable, however the variable has not yet been set. Either log out and log back into your account, or else run your .bashrc by hand like this to update the variable:

dnnex@cheetah:~$ . .bashrc

Either way, ensure that your PATH has /home/username/anaconda3/bin in it, and that you can find and run the python interpreter you just installed:

dnnex@cheetah:~$ echo $PATH
/home/dnnex/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
dnnex@cheetah:~$ which python
/home/dnnex/anaconda3/bin/python
dnnex@cheetah:~$ python
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.version)
3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49) 
[GCC 7.2.0]
>>> Ctrl-D

Note: You can use Ctrl-D to exit from the python interpreter on Linux

Step 2: Update and Check Installation

Lets make sure conda and anconda are completely up to date (even though we install most recent version, newer update may be available).

dnnex@cheetah:~$ conda update conda
dnnex@cheetah:~$ conda update anaconda

As a final check that the distribution has all of the scientific libraries needed, create the following python script file and run it. You will need to create the file in a text editor. You can use vi or nano. Alternatively, learn how to set up sftp (see task below), and create and edit the files using the editor on your local machine.

dnnex@cheetah:~$ vi versions.py
dnnex@cheetah:~$ cat versions.py 
# scipy
import scipy
print('scipy: %s' % scipy.__version__)

# numpy
import numpy
print('numpy: %s' % numpy.__version__)

# matplotlib
import matplotlib
print('matplotlib: %s' % matplotlib.__version__)

# pandas
import pandas
print('pandas: %s' % pandas.__version__)

# statsmodels
import statsmodels
print('statsmodels: %s' % statsmodels.__version__)

# scikit-learn
import sklearn
print('sklearn: %s' % sklearn.__version__)

dnnex@cheetah:~$ python versions.py 
scipy: 0.19.1
numpy: 1.13.3
matplotlib: 2.1.0
pandas: 0.20.3
statsmodels: 0.8.0
sklearn: 0.19.1

Task: Install TensorFlow and Keras

The Anaconda distribution comes preinstalled with many scientific python libraries, but by default it does not include TensorFlow and Keras for deep learning. We want to ensure that we install the gpu compiled and enabled version of TensorFlow, in order to make use of the NVidia gpus on cheetah. Install TensorFlow and Keras using the following commands:

dnnex@cheetah:~$ conda install tensorflow
dnnex@cheetah:~$ conda install tensorflow-gpu
dnnex@cheetah:~$ conda install keras

Once installed, test you can access tensorflow and keras, check the versions and see which computing device is being used.

dnnex@cheetah:~$ python
Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 13 2017, 12:02:49) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> print(tensorflow.__version__)
1.4.1
>>> import keras
Using TensorFlow backend.
>>> print(keras.__version__)
2.1.3
>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())
2018-02-14 09:09:59.313097: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
2018-02-14 09:10:03.278596: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-02-14 09:10:03.279669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:0d:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-02-14 09:10:03.528430: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-02-14 09:10:03.529458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:83:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-02-14 09:10:03.779963: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-02-14 09:10:03.780764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 2 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:84:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-02-14 09:10:04.010110: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-02-14 09:10:04.011043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 3 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:87:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-02-14 09:10:04.251445: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-02-14 09:10:04.252275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 4 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:88:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-02-14 09:10:04.258346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-02-14 09:10:04.258507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 2 3 4 
2018-02-14 09:10:04.258522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y N N N N 
2018-02-14 09:10:04.258530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   N Y Y Y Y 
2018-02-14 09:10:04.258538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 2:   N Y Y Y Y 
2018-02-14 09:10:04.258546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 3:   N Y Y Y Y 
2018-02-14 09:10:04.258553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 4:   N Y Y Y Y 
2018-02-14 09:10:04.258585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:0d:00.0, compute capability: 6.1)
2018-02-14 09:10:04.258599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1)
2018-02-14 09:10:04.258609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:84:00.0, compute capability: 6.1)
2018-02-14 09:10:04.258619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:87:00.0, compute capability: 6.1)
2018-02-14 09:10:04.258628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:4) -> (device: 4, name: GeForce GTX 1080 Ti, pci bus id: 0000:88:00.0, compute capability: 6.1)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 4079706712034997739
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10968950375
locality {
  bus_id: 1
}
incarnation: 4489520959231759110
physical_device_desc: "device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:0d:00.0, compute capability: 6.1"
, name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 10968950375
locality {
  bus_id: 1
}
incarnation: 1431643417137254847
physical_device_desc: "device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1"
, name: "/device:GPU:2"
device_type: "GPU"
memory_limit: 10968950375
locality {
  bus_id: 1
}
incarnation: 1814434061480581082
physical_device_desc: "device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:84:00.0, compute capability: 6.1"
, name: "/device:GPU:3"
device_type: "GPU"
memory_limit: 10968950375
locality {
  bus_id: 1
}
incarnation: 12826478425154617298
physical_device_desc: "device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:87:00.0, compute capability: 6.1"
, name: "/device:GPU:4"
device_type: "GPU"
memory_limit: 10968950375
locality {
  bus_id: 1
}
incarnation: 14587447460965616845
physical_device_desc: "device: 4, name: GeForce GTX 1080 Ti, pci bus id: 0000:88:00.0, compute capability: 6.1"
]

The last python command uses a function from tensorflow to list the computing devices that are currently accessible to tensorflow. It is important that you see 5 NVidia GTX 1080Ti devices (numbered 0 to 4) on the list. This means that tensorflow and keras installed correctly and are able to detect the gpu devices on cheetah for use in training/testing.

Alternatively, it might be a good idea to add the above checks to the versions.py script, so that you can run them and check whenever needed:

dnnex@cheetah:~$ vi versions.py 
dnnex@cheetah:~$ cat versions.py 
# scipy
import scipy
print('scipy: %s' % scipy.__version__)

# numpy
import numpy
print('numpy: %s' % numpy.__version__)

# matplotlib
import matplotlib
print('matplotlib: %s' % matplotlib.__version__)

# pandas
import pandas
print('pandas: %s' % pandas.__version__)

# statsmodels
import statsmodels
print('statsmodels: %s' % statsmodels.__version__)

# scikit-learn
import sklearn
print('sklearn: %s' % sklearn.__version__)

# tensorflow
import tensorflow
print('tensorflow: %s' % tensorflow.__version__)

# keras
import keras
print('keras: %s' % keras.__version__)

# tensorflow/keras gpu devices
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

dnnex@cheetah:~$ python versions.py

Task: Setup and Test GPU Utilization

Step 1: Check NVidia GPU status on cheetah

First of all, make sure you can run the nvidia-smi command from the command line. This command lists the current status of the NVidia gpu cards on the system. You can see the utilization of each card, and below that, which people/processes are running tasks using the gpus (in the example below, there are currently no tasks running on the gpus).

dnnex@cheetah:~$ nvidia-smi
Wed Feb 14 08:40:40 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:0D:00.0 Off |                  N/A |
| 20%   28C    P0    59W / 250W |      0MiB / 11172MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:83:00.0 Off |                  N/A |
| 20%   27C    P0    59W / 250W |      0MiB / 11172MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:84:00.0 Off |                  N/A |
| 20%   28C    P0    59W / 250W |      0MiB / 11172MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:87:00.0 Off |                  N/A |
| 20%   31C    P0    59W / 250W |      0MiB / 11172MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX 108...  Off  | 00000000:88:00.0 Off |                  N/A |
| 20%   29C    P0    53W / 250W |      0MiB / 11172MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Step 2: Test Keras execution on CPU and GPU

We will use the simple training example on the MNIST training set from chapter 2 of the Challot textbook to do a quick test of cpu and gpu usage with keras. First of all, create the following python script. This script will train with the MNIST data for 100 epochs.

dnnex@cheetah:~$ vi keras-mnist-train.py
dnnex@cheetah:~$ cat keras-mnist-train.py 
# This simply sets environment variable used by tensorflow to select which
# devices it can see.  This needs to be done before importing keras and
# tensorflow
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '3' #  leave empty to use cpu, or use 0, 1, 2, 3, 4 to use corresponding gpu

# Import needed functions and classes from keras for training
from keras.datasets import mnist
from keras.utils import to_categorical
from keras import models
from keras import layers
import time

# load the train and test data from the mnist data set
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# create a simple fully connected feed forward network with a single hidden layer of 512 units
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

# compile the model to a tensorflow flow graph
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

# prepare the image data
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

# prepare the labels (use 1-shot encoding)
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# overtrain for 25 epochs
start_time = time.time()
network.fit(train_images, train_labels, epochs=25, batch_size=128)

# determine accuracy on the test data
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

elapsed_time = time.time() - start_time
print('elapsed time:', elapsed_time, ' seconds')

As given, this script demonstrates setting the CUDA_VISIBLE_DEVICES environment variable, which seems to be reliably used by tensorflow to select which gpu device to use for computations. If this is blank, no gpu is used and calculations are performed on the cpu instead.

If you run the script on the cpu, you should see something like the following. You should also log into another terminal session, and check the nvidia-smi and the top command to see that your script is indeed using the cpu and not the gpu.

dnnex@cheetah:~$ python keras-mnist-train.py 
Using TensorFlow backend.
Epoch 1/25
2018-02-14 09:48:57.617029: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
2018-02-14 09:49:01.366100: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE
2018-02-14 09:49:01.366203: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: cheetah
2018-02-14 09:49:01.366214: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: cheetah
2018-02-14 09:49:01.366331: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 384.111.0
2018-02-14 09:49:01.366365: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  384.111  Tue Dec 19 23:51:45 PST 2017
GCC version:  gcc version 7.0.1 20170407 (experimental) [trunk revision 246759] (Ubuntu 7-20170407-0ubuntu2) 
"""
2018-02-14 09:49:01.366387: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 384.111.0
2018-02-14 09:49:01.366403: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 384.111.0
60000/60000 [==============================] - 8s 126us/step - loss: 0.2566 - acc: 0.9264
Epoch 2/25
60000/60000 [==============================] - 4s 61us/step - loss: 0.1021 - acc: 0.9699

Epoch 25/25
60000/60000 [==============================] - 4s 66us/step - loss: 7.4265e-04 - acc: 0.9998
10000/10000 [==============================] - 0s 48us/step
test_acc: 0.9822
elapsed time: 98.96917152404785  seconds

You should see when running using the cpu only that

There is no mention of a gpu device being used before training begins
Each epoch will typically take about 4 seconds (60 us/step), though the first epoch will usually be longer (because some extra setup happens in the first epoch).
The total elapsed time will be about 100 seconds usually on the cpu (may be more if multiple people are using cpu on cheetah at the same time)

You should also run the script using a gpu instance. Student for the CSci 560 class should have been assigned a particular gpu instance for the class. If you don't know it or have it please ask the instructor first. Edit the script to specify the number of the gpu instance to use and run the script again.

dnnex@cheetah:~$ python keras-mnist-train.py 
Using TensorFlow backend.
Epoch 1/25
2018-02-14 09:52:42.632954: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
2018-02-14 09:52:46.631949: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-02-14 09:52:46.632560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:84:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-02-14 09:52:46.632605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:84:00.0, compute capability: 6.1)
60000/60000 [==============================] - 7s 110us/step - loss: 0.2558 - acc: 0.9269
Epoch 2/25
60000/60000 [==============================] - 2s 34us/step - loss: 0.1027 - acc: 0.9691


Epoch 24/25
60000/60000 [==============================] - 2s 34us/step - loss: 7.8992e-04 - acc: 0.9998
Epoch 25/25
60000/60000 [==============================] - 2s 34us/step - loss: 6.5304e-04 - acc: 0.9999
10000/10000 [==============================] - 1s 57us/step
test_acc: 0.9829
elapsed time: 56.991036891937256  seconds

Again you should examine nvidia-smi and top to ensure that you are indeed using the expected gpu. Some things to note:

If you are using gpu, tensorflow will give a message about greating TensorFlow device on a GPU
tensorflow will allways call this /device:GPU:0 in the output here. This does not mean it is using gpu 0. When you say, for example, only gpu 3 is visible, then it just considers that the one and only gpu. You can double check this by instead verifying it is using the correct pci bus id of the gpu you were expecting
Training time should be about 1/2 of the time needed for the cpu. Typically 2s (34 us/step) for each epoch and about 56 seconds elapsed time overall.

Step 3: Set Default GPU Instance

If I have given you an account for cheetah, I may have assigned you a particular gpu to use by default on the machine. We don't have any automatic job submission system set up on cheetah that would first look at gpu utilization, and allocate a gpu for you that is not being used. If you are supposed to be using a particular gpu instance as your default, please perform the following steps.

You can set the environment variable CUDA_VISIBLE_DEVICES in your .bashrc. Add the following line to your .bashrc file, and log out and back in or reload the .bashrc config file. Of course change the device number to the gpu instance you were asked to use as your default:

dnnex@cheetah:~$ cat .bashrc
...
# set default GPU instance
export CUDA_VISIBLE_DEVICES="4"

dnnex@cheetah:~$ . .bashrc
dnnex@cheetah:~$ echo $CUDA_VISIBLE_DEVICES
4

Once you have set this environment variable, any python scripts or jupyter server instances you run on cheetah that use tensorflow for gpu access should use the indicated gpu device by default.

Task: Set up and Run Jupyter Notebooks from Cheetah

The jupyter notebook system and server should already be installed as a standard part of the anaconda distribution, if you installed python using anaconda. You can check that the jupyter server is in your path:

dnnex@cheetah:~$ which jupyter
/home/dnnex/anaconda3/bin/jupyter

By default jupyter is configured to only allow connections from the localhost to a running jupyter instance. If you want to run a server you need to change the configuration a bit to allow for remote connections from other machines to be accepted. Also by default the jupyter notebook server will try and run using port 8888 (though if that port is already in use, it may choose another one). If you are in the CSci 560 class, in order to try and minimize interference a bit, I will probably have also assigned you a port number to use for your jupyter sessions. The default port number to use can also be specified in the jupyter configuration.

First of all, we need to generate an explicit local configuration file(s) for jupyter. This will create a set of files in your home directory under the ~/.jupyter directory with configuration information.

dnnex@cheetah:~$ jupyter notebook --generate-config
Writing default config to: /home/dnnex/.jupyter/jupyter_notebook_config.py

You will need to edit that config file and change 2 settings:

Uncomment and change the setting from localhost to *, to allow connections from any (remote) host: c.NotebookApp.ip = ''
Uncomment and change the setting from 8888 to the port number you were told to use. Here I change the default to be 8900: c.NotebookApp.port = 8900

dnnex@cheetah:~$ vi ~/.jupyter/jupyter_notebook_config.py 
dnnex@cheetah:~$ grep c.NotebookApp.ip ~/.jupyter/jupyter_notebook_config.py 
c.NotebookApp.ip = '*'
dnnex@cheetah:~$ grep c.NotebookApp.port ~/.jupyter/jupyter_notebook_config.py 
c.NotebookApp.port = 8900

This would be a good point to also clone a copy of our class repository, so that we have some jupyter notebooks on cheetah we can run. Create a directory called repos and clone the class repository to it first.

dnnex@cheetah:~$ mkdir repos
dnnex@cheetah:~$ cd repos
dnnex@cheetah:~/repos$ git clone https://dharter@bitbucket.org/dharter/neural-network-class.git
Cloning into 'neural-network-class'...
remote: Counting objects: 981, done.
remote: Compressing objects: 100% (290/290), done.
remote: Total 981 (delta 132), reused 0 (delta 0)
Receiving objects: 100% (981/981), 35.89 MiB | 17.33 MiB/s, done.
Resolving deltas: 100% (490/490), done.
dnnex@cheetah:~/repos$ cd neural-network-class/
dnnex@cheetah:~/repos/neural-network-class$ ls
assignments  cluster  notebooks  notes.md  README.md  run.py

Now you should be able to run a jupyter notebook server on cheetah and connect to it.

dnnex@cheetah:~/repos/neural-network-class$ jupyter notebook
[W 10:15:29.051 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[I 10:15:29.077 NotebookApp] JupyterLab alpha preview extension loaded from /home/dnnex/anaconda3/lib/python3.6/site-packages/jupyterlab
JupyterLab v0.27.0
Known labextensions:
[I 10:15:29.078 NotebookApp] Running the core application with no additional extensions or settings
[I 10:15:29.081 NotebookApp] Serving notebooks from local directory: /home/dnnex/repos/neural-network-class
[I 10:15:29.082 NotebookApp] 0 active kernels 
[I 10:15:29.082 NotebookApp] The Jupyter Notebook is running at: http://[all ip addresses on your system]:8900/?token=650fec7d120b2bd2fd1dd72dda3120ba497773102c86bdfb
[I 10:15:29.082 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 10:15:29.082 NotebookApp] No web browser found: could not locate runnable browser.
[C 10:15:29.082 NotebookApp] 

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8900/?token=650fec7d120b2bd2fd1dd72dda3120ba497773102c86bdfb

Some things to note:

The last line shows the url you need to paste into your web browser.
The port number should be the port number you entered in your configuration file for the default
You should see just above that a message about [all ip addresses on your system] if you correctly set the configuration to all for connections from external machines/ip addresses
As the message says, copy and paste the URL into your browser. However, you need to change localhost to be cheetah (or you can use cheetahs ip address: 10.19.0.113)
for example, I would paste the following in my browser to connect to the above jupyter server: http://cheetah:8900/?token=650fec7d120b2bd2fd1dd72dda3120ba497773102c86bdfb or http://10.19.0.113:8900/?token=650fec7d120b2bd2fd1dd72dda3120ba497773102c86bdfb

Task: Using sftp to Move Files to/from Cheetah

sftp on Windows 10

It is convenient to learn how to be able to access and move files to/from a remote machine over a secure ftp (sftp) connection. Since you have secure shell access, you can use sftp to browse and move files between your own machine and your cheetah account.

On Mac and Windows machines, the file browsers usually have build in support for sftp. For example, on Ubuntu Linux, I can selct "Other Locations" and enter the following address on the "Connect to Server" field: "sftp://dnnex@cheetah/home/dnnex"

This is a typical url like specification, saying using the sftp protocol, connect to the machine named cheetah using username dnnex. With this I can browse and move files in my file explorer between my local machine and cheetah. I believe that the Mac file browser also supports sftp connections and browsing natively.

Unfortunately windows continues to not support sftp file protocol in the native Windows file explorer. You can use a program like FileZilla and WinSCP, which give gui file browser interfaces and allow you to specify sftp locations:

FileZilla: https://filezilla-project.org/
WinSCP: https://winscp.net/eng/index.php

But on windows, I personally prefer the following option using Swish, which allows you to add support for sftp connections to the normal Windows file explorer.

Instructions: https://www.howtogeek.com/165893/how-to-integrate-a-remote-sftp-directory-into-windows-explorer/
The Link on that set of instrucitons appears to be dead. Here is the correct location to download swish from: https://sourceforge.net/projects/swish/

Some notes on setting up swish - The installer is a standard installer, accept the license agreement and install it in the default location. - Once installed, if you go to "This PC" in the Windows Explorer, there will be a Swich Device. Double click on this. - From here you can "Add SFTP Connection". You will get a dialog. Use cheetah for the host, your username, and specify /home/username for the path. - Once created you can then log in. It will prompt for your password the first time you open. - This allows you to move files easily from/to cheetah. And you can even use an editor like notepad directly to edit files in the file explorer.

Editing Files over sftp on Windows

You can use the above instructions to browse files over sftp, and to download and upload them through the standard windows browser. However, you cannot open up files directly in an editor on your local machine to edit them using this method.

One recommended method if you want to use an editor on your local machine is to use notepad++: https://notepad-plus-plus.org/download/v7.5.4.html

Once you have the notepad++ editor installed and running, you need to add the NppFTP plugin to the editor.

Download site for NppFTP plugin: https://sourceforge.net/projects/nppftp/
Instructions for installing and using plugin to edit files over sftp: https://blog.sleeplessbeastie.eu/2015/07/27/how-to-edit-files-using-notepad-plus-plus-over-ssh-file-transfer-protocol/

Wiki

neural-network-class / CheetahDNNSetup