Fingerprint Bayesian Learner

(Variant) of Naive Bayes for fingerprint columns, i.e. bitvectors. The learner implements a Naive Bayes like algorithm that incorporates sparsely occupied bits and unbalanced class distributions. Details of the algorithm are described in

Prediction of Biological Targets for Compounds Using Multiple-Category Bayesian Models Trained on Chemogenomics Databases, Nidhi Meir Glick, John W. Davies, and Jeremy L. Jenkins, J. Chem. Inf. Model., 2006, 46 (3), pp 1124–1133

Options

Class Column: The categorical class column.
Target Class Value: Choose the categorical value from the class column that defines the class to be modeled. All remaining values define the opposite class value (it's always one against all others). The trained model represents the chosen target class value.
Fingerprint Column: The column containing the fingerprint information.

Input Ports

: The data to learn from. It needs to contain a fingerprint column and a categorical class column.

Output Ports

A table containing the scores of the training data, whereby each row is predicted using a model trained on the n-1 remaining rows (leave-one-out). The table is sorted by descending score; it contains the following columns:

The true class values (copied from the input data).
The leave-one-out score (the sum-of-logs of the on-bits)).
The running error of the target class, i.e. the error on the training data if the current row and all preceding rows were predicted as positive class (as they have a score larger or equal to the row's score).
The running error on the negative class(es), i.e. if all rows below the current line were predicted as negative.

The threshold that minimizes the sum of both error rates is used as default cutoff in the predictor.
Note, these scores could also be determined using a Cross-Validation meta node. However, they are provided here as they can be easily computed in a single scan on the training data (as opposed to an expensive cross validation run).
This table can be very well visualized using a ROC Curve node.

A table representing each bit's importance on the different classes. The table has as many rows as there are bits in the fingerprint. The columns show for each bit position, how often a bit is set in (i) any of the rows and (ii) in rows of the respective target class. The value of the "logP" column is the logarithm of equation (6) in the above cited article. A value smaller than 0 indicates that the bit is uncharacteristic for the target class, a value larger 0 shows a strong characteristic for that bit and class. A value ~0 indicates that there is no or a weak relationship between the bit and the class.

The model; it's the input to the predictor node.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Base Chemistry Types & Nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.5.0.v202412191417

On NodePit since: 2025-07-02

Last update: 2025-08-01

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!