This project is a front-end for the triNNity DNN optimization approach.
Build and Installation
The currently supported optimization modes are
heuristic mode has a number of heuristics which can be used to generate
an instantiation of a network. To see a list of these modes, say
triNNity-optimizer heuristic --help
pbqp mode uses the topology of the network and the provided cost models
to generate a large PBQP query, suitable for solving with an external PBQP
The result of the optimization process is a solution file, which specifies what algorithm to use for each convolution in the network.
For convenience, the optimizer can apply that solution to generate the code for an instantiation of the network. At present, only the triNNity primitive library is supported, but support for other frameworks can be added fairly easily.
For more information, say
triNNity-optimizer instantiate --help.
User Interface and Input Files
The optimizer uses several simple plaintext file formats to represent the necessary features of networks and of the hardware.
Network Topology Description File
This file describes the topology of the network, in terms of the layers. At present, we only model convolution layers because all other layers are inconsequential in terms of performance. This may change in future.
The first line of the file must contain three whitespace separated integers: the
number of nodes (N), the number of parameters (P) which define a node, and the number
of edges (E). For example, in
first line is
5 7 4, indicating N=5 nodes, each defined by P=7 parameters,
The following N lines must contain P whitespace separated integers, defining each node. The subsequent E lines must contain exactly 2 whitespace separated integers, with the source and sink index for the edge.
While nodes and edges in the DNN graph may be specified in any order, they are referred to by their index in the order specified in the topology file.
Network Constraints Description File
The constraints file describes constraints on network nodes that must hold irrespective of the solving method. For example, if some node must accept CHW input, or some node must be implemented by a specific algorithm, et cetera, this is the place to record that information.
The file should consist of N comma-and-whitespace separated lines. Each line should consist of an integer, naming the graph node to which the constraint applies, and three constraint fields, which are strings. The first field is the input data layout, the second is the algorithm name to use, and the third is the output data layout. You can use an asterisk to indicate a wild-card field.
For example, if the second convolution in our network should be implemented
frobnicate algorithm, the constraints file would contain a line
1, *, frobnicate, *
If the seventh convolution should produce output in HCW format (perhaps because of a special nonlinearity, or another operation requiring this layout), the constraints file would contain a line like:
6, *, *, hcw
Algorithm and Data Layout Description File
This file defines the set of algorithms available to implement the graph nodes. Each line has three comma-separated fields: algorithm name, input layout identifier, and output layout identifier. This file is the source of all information about what algorithms and data layouts exist; when you name a data layout or algorithm in the network constraints file, it is the algorithm names and layout names in this file that you are matching against.
This file typically only needs to be written once for each backend library that you would like to connect to the optimizer. It is a static descriptor of the algorithms contained in that library.
Both the heuristic and solver-based optimization modes produce a solution which specifies how the network should be configured. This solution file has a single integer per line, specifying in order, for each graph node listed in the topology, the index of the algorithm to use to implement that node. Each index returned is the algorithm index in the Algorithm and Data Layout Description File.
Solver Specific Files
For solver-based optimization, cost model data must be provided so that the optimizer can make better decisions based on real benchmarking of some hardware where you intend to run your network. There are two cost model files that must be produced; one for nodes (convolutions) and one for edges (data layout transformations).
Algorithm Cost Model
The convolution algorithm cost file has a straightfoward layout. Each column corresponds to one set of convolution parameters which has been benchmarked. Each row corresponds to a convolution algorithm.
The first P rows are special: they must list, in order, the parameters which define the convolution being benchmarked. These are the same parameters from the network topology file.
The first column is also special: it must contain the string corresponding to the name of the algorithm being benchmarked, or in the first P rows, the name of the convolutional parameter on the row.
All other fields should contain floating point numbers. The value zero is
specially treated to mean that the algorithm in question cannot be used for
a convolution with the specified parameters. For example, Winograd convolution
cannot be used with
k == 1.
Data Layout Transformation Cost Model
The cost file for data layout transformations is also straightforward. Each row
corresponds to the edge with the same index in the network topology file. There
are only 3 special rows at the start, for the number of input channels, input
width, and input height of the graph edge. The first column is again special,
containing the name of the transformation, which is something like
It is often not possible to microbenchmark every possible scenario in a large project. For example, to produce a full fine-grained cost model for ResNet-152 using the triNNity library as the back-end, 152 * 71 = 10792 microbenchmarks would have to be run. If each benchmark took one minute, that's almost 180 hours of benchmarking!
The optimizer supports inferred costs, and can optimize networks with convolutions with unseen parameters. When a convolution does not appear in microbenchmarking data, the optimizer will compute an approximate cost by interpolation (currently using the L2 norm) from similar convolutions that do appear in the cost model.
This tool ships with example network descriptions and cost model data to allow you to quickly test integration with your toolchain without having to first produce your entire benchmarking data set.
triNNity-optimizer heuristic -x LOCAL_OPTIMAL \ -t examples/networks/alexnet/alexnet.topology \ -a examples/libraries/triNNity/triNNity-algorithms.csv \ -n examples/cost-models/Intel\(R\)\ Core\(TM\)\ i5-2500K\ CPU\ @\ 3.30GHz/hcy.csv
This command runs the optimizer in heuristic mode, selecting the
LOCAL_OPTIMAL heuristic. We pass in the AlexNet topology description, the
algorithm names and data layout descriptions for the
triNNity-algorithms.csv), and finally the cost
model to use, generated with microbenchmarking. The output (on standard output)
should be as follows:
0 50 44 44 44
The heuristic suggests we use algorithms with index 0, 50, 44, 44, and 44 for
the respective layers of AlexNet. Looking at
examples/libraries/triNNity/triNNity-algorithms.csv, we can see that these
correspond to the algorithms
winograd-2x2-3x3-vec-4 in the triNNity library.
This output is in the "solution" file format described earlier, and can be redirected to a file to save for later use.
list-heuristics can be used to show a list of the available
triNNity-optimizer pbqp -t examples/networks/alexnet/alexnet.topology \ -a examples/libraries/triNNity/triNNity-algorithms.csv \ -n examples/cost-models/Intel\(R\)\ Core\(TM\)\ i5-2500K\ CPU\ @\ 3.30GHz/hcy.csv \ -e examples/cost-models/Intel\(R\)\ Core\(TM\)\ i5-2500K\ CPU\ @\ 3.30GHz/hcy-transforms.csv \ -s examples/networks/alexnet/alexnet.constraints
PBQP mode requires the network constraints file and the edge costs file in addition to the node costs. The optimizer traverses the graph structure of the network building a query that covers all possible instantiations of the network under the supplied constraints and with the supplied costs.
The output (on standard output) is a plaintext query prepared for input to a PBQP solver. We don't reproduce the output here because the PBQP query is quite large, but it can be viewed by copying and pasting the command given into your terminal. This query can be redirected to a file for later use, or piped directly into the input of your preferred PBQP solver, if it supports reading from standard input.
The PBQP solver should process the query and give back a solution that looks just like the one in the heuristic example above (though probably with different methods selected, and some solvers append metadata, such as projected total costs, that can be ignored). The exact means of producing the solution file will vary from solver to solver, and may require the user to do a little plumbing work on the command line.
triNNity-optimizer instantiate -t examples/networks/alexnet/alexnet.topology \ -a examples/libraries/triNNity/triNNity-algorithms.csv \ -m examples/libraries/triNNity/triNNity-methods.csv \ -s /tmp/test.solution \ -i TRINNITY \ -l examples/networks/alexnet/alexnet.layers
Instantiate mode is a very simple tool that turns a solution into a more useful
format. For example, a header files which contains
#define statements that
specify the algorithm to use for each layer using enum values from a library.
The example shown uses the backend
-i TRINNITY which generates a header file
suitable for use with the example networks which ship with the
The auxiliary file
alexnet.layers contains the mapping from the layers as they
are specified in the topology file to logical identifiers that can be used when
emitting code (for backend frameworks that identify layers with a string key).