Computer arithmetic for DNN acceleration or how to compute right with errors and fast
Numerical algorithms rarely use exact arithmetic to perform their computations, instead they employ more efficient floating-point or fixed-point representations. The use of these finite-precision numerical formats results in computational errors that influence the result. On the other hand, the choice of precision influences the performance (latency, memory usage) of the implemented algorithm.
In the search of a sweet spot between the accuracy and the efficiency, we first establish the relation between the precision and accuracy for a given algorithm and then optimise arithmetic parameters (data formats, function approximations, hardware operators) s.t. accuracy requirements are satisfied at minimal cost. Ideally, this process should be automatic and applicable to large-scale systems.
In this talk we showcase a range of tools and techniques for arithmetic optimisations for DNN inference. We act on three levels:
- Automatic analysis of numerical quality of a DNN model and establishing the minimal accuracy requirement for each layer of a network;
- Building custom low-precision approximations for activations functions with guaranteed accuracy;
- Building custom hardware operators for dot-product and employing the power of Integer Linear Programming to optimise the resources.