Label Model

Estimates probabilistic labels by learning a generative label model from the provided noisy labels. This node is a key component for the realization of weak supervision approaches as popularized by Snorkel . The idea in weak supervision is that it is often possible to create a number of simple inaccurate models (e.g. simple rules or existing models for slightly different tasks) that can label unlabeled data and that the agreements and disagreements of these simple models can be analyzed to infer information of the true label. Our implementation is a TensorFlow based adaptation of the matrix completion approach proposed in this paper by the Snorkel team. We refer to the publication for details on the strategy.

Options

Label sources
Select the columns which act as label sources i.e. that contain noisy labels for some in the rows in the first input table. It is assumed that a missing value means that the respective label source did not label the corresponding row.
Epochs
The number of optimization steps to perform. More epochs can result in better results but also directly translate into a longer runtime.
Learning rate
The learning rate dictates how much a single training epoch changes the learned model. A smaller learning rate requires more epochs to reach convergence while a large learning rate might lead divergence of the algorithm.
Label column name
The name of the label (or class) column that the labels provided by the label sources belong to. This value is used to create the column names for the probability columns in the output.
Remove source columns
Select if you want to remove the label source columns from the output table.

Input Ports

Icon
Table containing label sources. A label source is just a nominal column. Note that missing values in a label source are interpreted as abstains i.e. it is assumed that a missing value indicates that the label source did decide not to label the corresponding row. Label sources without a set of possible values assigned are ignored during the computation and a corresponding warning is displayed on the node.

Output Ports

Icon
Contains for each row in the input additional probabilistic label columns. That is for each of the possible classes, a column which gives the probability that a row is an instance of that class.
Icon
Each row in this table gives the conditional probabilities that the label source displayed in the Label Source column takes on a specific value given the true label displayed in the Latent Label column.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.