triNNity is a C++ library implementing a bunch of CNN primitives. Supported platforms include
How do I get set up?
All you need is a C++ compiler. The library has been tested with
PKGBUILD file is included for Arch Linux. To build the package do
make package. You can install the library system-wide with
pacman -U passing the generated
Please see te
triNNity-demos project for examples for several popular CNNs.
Each module in the library exports both a low and a high level interface. The low-level interface exposes the library operations via template functions, while the high-level interface exposes the library operations as
Layer objects. The principal difference is that the low level interface does absolutely zero memory management, while the high level interface will manage your kernel buffers for you.
To use either interface, simple
#include the relevant header file - for the low-level interface this is
<triNNity/module/mcmk.h>, and for the high-level interface, it is
<triNNity/module/layer.h> You can use more than one module at once - for example, if you are building a mixed dense-sparse network on CPU, using the high-level interface, simply do:
#include <triNNity/dense/cpu/layer.h> #include <triNNity/sparse/cpu/layer.h>
We use namespaces to logically collect all of the library operations. For example, to use the low level interface for the dense/cpu module, say
using triNNity::dense::cpu. To use the high level interface, say
using triNNity::layer. You can of course use both interfaces at once, if you would like to use high-level code to implement some parts of your network, but need the precision of the low-level interface in other parts.
Using a GPU/Accelerator
We support the use of GPUs and other accelerator devices. When using such devices, the issue of offload becomes important. To maximize performance, the copying of data to and from the device from the host usually needs to be minimized.
We support two modes for offload of data to accelerators. The first mode simply pushes and pulls arrays to and from the device whenever necessary. This is useful when prototyping an application, because no changes to the code are necessary to make the CPU version make use of the accelerator. You can use this mode by creating
Layer objects from a cpu module (e.g.
dense/cpu), but specifying that the work should be done on the accelerator (e.g. with
The second mode assumes that your data lives on the accelerator device. For this mode, simply switch from using a cpu module to the corresponding module for your accelerator (e.g. from
Layer objects from accelerator modules presume that their inputs and outputs are stored in device memory, and allocate all their intermediate buffers in device memory.