chainer.grad¶
- chainer.grad(outputs, inputs, grad_outputs=None, grad_inputs=None, set_grad=False, retain_grad=False, enable_double_backprop=False, loss_scale=None)[source]¶
Computes the gradient of output variables w.r.t. the input variables.
This function implements the backpropagation algorithm. While
Variable.backward()
also implements backprop, this function selects the smallest paths in the computational graph needed to compute the gradients w.r.t. inputs. The error is backpropagated only through these selected paths, which may reduce the overall computational cost.This function also differs from
Variable.backward()
in the way to return the gradients; it directly returns the gradient variables as a list instead of setting gradients to theVariable.grad_var
attribute of the original variable. It means users do not need to clear the gradient w.r.t. each variable before computing the gradient using this function. Ifset_grad
option is set toTrue
, the computed gradient is also stored in theVariable.grad_var
attribute of each variable, in which case any original value ofVariable.grad_var
will be updated even if it had already been set.- Parameters
outputs (tuple or list of
Variable
) – A sequence of output variables from which backprop starts.inputs (tuple or list of
Variable
) – A sequence of input variables each of which this function computes the gradient w.r.t.grad_outputs (tuple or list of
Variable
or None) – A sequence of variables that gives the initial value of each output gradient. If an element is set toNone
, an array filled with 1 is used. If this argument itself isNone
, it is treated as a sequence ofNone
s.grad_inputs (tuple or list of
Variable
or None) – A sequence of variables that gives the initial value of each input gradient. The gradients computed by the backprop algorithm are accumulated to them (not in-place). If an element is set toNone
, the gradient is not accumulated to this value. If this argument itself isNone
, it is treated as a sequence ofNone
s.set_grad (bool) – If it is
True
, theVariable.grad_var
attribute of each input variable is set to the corresponding computed gradient variable.retain_grad (bool) – If it is
True
, the gradients w.r.t. all the intermediate variables are stored in theVariable.grad_var
attribute. In this case, theset_grad
option is ignored.enable_double_backprop (bool) – If it is
True
, the computed gradients can be further backpropagated. Enabling it may increase the memory consumption (and possibly the computational time) to remember the intermediate gradient values for the second backpropagation.loss_scale (float) – Loss scaling factor. Loss scaling is a useful technique to mitigate vanishing gradient issue that tends to happen when low precision data type like float16 is used during training. If you set loss scaling factor, gradients of loss values are to be multiplied by the factor before backprop starts. The factor is propagated to whole gradients in a computational graph along the backprop. The gradients of parameters are divided by the factor just before the parameters are to be updated.
- Returns
A list of gradient variables w.r.t. the inputs.