Coherance-based Approximate Derivatives via Web of Affine Spaces Optimization

Yale University
*Equal Contribution

Abstract

Computing derivatives is a crucial subroutine in computer science and related fields as it provides a local characterization of a function’s steepest directions of ascent or descent. In this work, we recognize that derivatives are often not computed in isolation; conversely, it is quite common to compute a sequence of derivatives, each one somewhat related to the last. Thus, we propose accelerating derivative computation by reusing information from previous, related calculations—a general strategy known as coherence. We introduce the first instantiation of this strategy through a novel approach called the Web of Affine Spaces (WASP) Optimization. This approach provides an accurate approximation of a function’s derivative object (i.e. gradient, Jacobian matrix, etc.) at the current input within a sequence. Each derivative within the sequence only requires a small number of forward passes through the function (typically two), regardless of the number of function inputs and outputs. We demonstrate the efficacy of our approach through several numerical experiments, comparing it with alternative derivative computation methods on benchmark functions. We show that our method significantly improves the performance of derivative computation on small to medium-sized functions, i.e., functions with approximately fewer than 500 combined inputs and outputs. Furthermore, we show that this method can be effectively applied in a robotics optimization context. We conclude with a discussion of the limitations and implications of our work

Method

Step 1

We have a new input that needs to be solved for derivative.


            fn main() {
              let x1: f64 = 2.0;
              let x2: f64 = 3.0;
              
              // Calculate derivative between x1 and x2
              let derivative = (x2 - x1) / 1.0;
              println!("Derivative: {}", derivative);
            }
          

Step 2

Select an affine space for computing a single JVP.

Step 3

Compute the JVP. The true derivative lies somewhere in this affine space.

Step 4

Perform constrained optimization to find the closest point on the current space to other affine spaces.

Last step

Update the entire F matrix.

The next derivative

We can repeat the above steps to compute the next derivative.

And the next

And the next

Error correction

When our error detector asserts a high possible error, we repeat the whole process for another JVP.

Results

Evaluation 1

Our method significantly improves the performance of derivative computation on small to medium-sized functions, i.e., functions with approximately fewer than 500 combined inputs and outputs.


Evaluation 2

Furthermore, we show that this method can be effectively applied in a robotics optimization context.

BibTeX