DL4J Feedforward Learner (legacy)

This Node Is Deprecated — This node is kept for backwards-compatibility, but the usage in new workflows is no longer recommended. The documentation below might contain more information.
This node supplies means to learn the network configuration specified by the Deep Learning Model. Thereby, the model can be either trained supervised or unsupervised using several training methods like Stochastic Gradient Descent. The output layer of the network, which can be configured in the node dialog, will be automatically added by this node. Additionally, the node supplies further methods for regularization, gradient normalization and learning refinements. In order to learn the network, inputs will be automatically converted into a network understandable vector format. For the model input there are two options. If the supplied model is untrained it will be trained normally by the learner. If the model was trained by a previous learner the node will try to use the network parameters of the trained model to initialise the parameters of the new network for the new training run, because the network configuration can be changed between learner nodes. This way methods like Transfer Learning can be implemented. The output of the node is a learned Deep Learning Model containing the original configuration and tuned network weights and biases.

The KNIME Deeplearning4J Integration has been marked as legacy with KNIME Analytics Platform 5.0 and will be deprecated in a future version. If you are using this extension in a production workflow, consider switching to one of the other deep learning integrations available in KNIME Analytics Platform.


Learning Parameters

Training Mode
Whether to do supervised or unsupervised training.
  • SUPERVISED - label column needs to be specified
  • UNSUPERVISED - label column can be omitted
Use Seed
Whether to use a seed value for training. Used to make different learning runs comparable. If the same seed was used and the configuration didn't change, the results will the same between learning runs.
The seed value which should be used. Any Integer number may be used.
Number of Training Iterations
The number of parameter updates that will be done on one batch of input data.
Optimization Algorithm
The type of optimization method to use. The following algorithms are available:
  • LINE_GRADIENT_DESCENT - normal gradient descent
  • STOCHASTIC_GRADIENT_DESCENTT - gradient descent using minibatches
Do Backpropagation
Whether to do backpropagation. If this option is chosen the learner will perform supervised training using the specified techniques and hyper parameters.
Do Pretraining
Whether to to pretreaining. If this option is chosen the learner will perform unsupervised pretraining (Contrastive Divergence) of the network parameters. This option is only applicable for Restricted Boltzmann Machines and Autoencoders.
Do Finetuning
Whether to to finetuning. If this option is chosen the learner will perform supervised finetuning of the network parameters.
Use Pretrained Updater
Whether to use a pretrained updater of a trained model. Some updaters contain a history of previous gradients, hence, it can be specified if a supplied updater should be taken or a new should be created. This option will only take effect if Deep Learning Model supplied at the input was previously trained and contains a saved updater.
Updater Type
The type of updater to use. These specify how the raw gradients will be modified. If a pretrained updater is used this option will be ignored. The The following methods are available:
  • SGD
  • ADAM
Use Regularization
Whether to use regularization techniques to prevent overfitting.
L1 Regularization Coefficient
Strength of L1 regularization.
L2 Regularization Coefficient
Strength of L2 regularization.
Use Gradient Normalization
Whether to use gradient normalization.
Gradient Normalization Strategy
Gradient normalization strategies. These are applied on raw gradients, before the gradients are passed to the updater. An explanation can be found at:
  • RenormalizeL2PerLayer
  • RenormalizeL2PerParamType
  • ClipElementWiseAbsoluteValue
  • ClipL2PerLayer
  • ClipL2PerParamType
Gradient Normalization Threshold
Threshold value for gradient normalization.
Use Momentum
Whether to use momentum.
Momentum Rate
Rate of influence of the momentum term.
Momentum After
Schedule for momentum value change during training. This is specified in the following format:
'iteration':'momentum rate','iteration':'momentum rate' ...
This creates a map, which maps the iteration to the momentum rate that should be used. E.g. '2:0.8' means that the rate '0.8' should be used in iteration '2'. Leave empty if you do not want to use a schedule.
Use Drop Connect
Whether to use Drop Connect.

Global Parameters

Use Global Learning Rate
Whether to overwrite the learning rates specified in the layers of the network for all layers.
Global Learning Rate
The learning rate to use for all layers.
Use Global Drop Out Rate
Whether to overwrite the drop out rates specified in the layers of the network for all layers.
Global Drop Out Rate
The drop out rate to use for all layers.
Use Global Weight Initialization Strategy
Whether to overwrite the weight initialization strategy specified in the layers of the network for all layers.
Global Weight Initilialization Strategy
The weight initialization strategy to use for all layers.

Data Parameters

Batch Size
The number of examples used for one minibatch.
The number of epochs to train the network, hence the number of training runs on the whole data set.
Size of Input Image
If the input table contains images the dimensionality of the images needs to be specified. This value needs to be three numbers separated by a comma specifying the dimension sizes of the images (size x,size y,number of channels). E.g. 64,64,3

Column Selection

Label Column
The column of the input table containing labels for supervised learning.
Input Column Selection
The columns of the input table containing the training data for the network.

Output Layer Parameter

Number of Output Units
The number of outputs for this layer. For supervised training this value is determined automatically, hence it is not possible to set it. For unsupervised training this value specifies the number of neurons in the output layer.
Learning Rate
The learning rate that should be used for this layer.
Weight Initialization Strategy
The strategy which will be used to set the initial weights for this layer.
Loss Function
The type of loss function that should be used for this layer.
Activation Function
The type of activation function that should be used for this layer.

Input Ports

Finished configuration of a deep learning network.
Data table containing training data.

Output Ports

Trained Deep Learning Model


Learning Status
Shows information about the current learning run. Has an option for early stopping of training. If training is stopped before the last epoch the model will be saved in the current status.




You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.