Fingerprint Bayesian Learner

(Variant) of Naive Bayes for fingerprint columns, i.e. bitvectors. The learner implements a Naive Bayes like algorithm that incorporates sparsely occupied bits and unbalanced class distributions. Details of the algorithm are described in

Prediction of Biological Targets for Compounds Using Multiple-Category Bayesian Models Trained on Chemogenomics Databases, Nidhi Meir Glick, John W. Davies, and Jeremy L. Jenkins, J. Chem. Inf. Model., 2006, 46 (3), pp 1124–1133

Options

Class Column
The categorical class column.
Target Class Value
Choose the categorical value from the class column that defines the class to be modeled. All remaining values define the opposite class value (it's always one against all others). The trained model represents the chosen target class value.
Fingerprint Column
The column containing the fingerprint information.

Input Ports

Icon
The data to learn from. It needs to contain a fingerprint column and a categorical class column.

Output Ports

Icon
A table containing the scores of the training data, whereby each row is predicted using a model trained on the n-1 remaining rows (leave-one-out). The table is sorted by descending score; it contains the following columns:
  1. The true class values (copied from the input data).
  2. The leave-one-out score (the sum-of-logs of the on-bits)).
  3. The running error of the target class, i.e. the error on the training data if the current row and all preceding rows were predicted as positive class (as they have a score larger or equal to the row's score).
  4. The running error on the negative class(es), i.e. if all rows below the current line were predicted as negative.
The threshold that minimizes the sum of both error rates is used as default cutoff in the predictor.
Note, these scores could also be determined using a Cross-Validation meta node. However, they are provided here as they can be easily computed in a single scan on the training data (as opposed to an expensive cross validation run).
This table can be very well visualized using a ROC Curve node.
Icon
A table representing each bit's importance on the different classes. The table has as many rows as there are bits in the fingerprint. The columns show for each bit position, how often a bit is set in (i) any of the rows and (ii) in rows of the respective target class. The value of the "logP" column is the logarithm of equation (6) in the above cited article. A value smaller than 0 indicates that the bit is uncharacteristic for the target class, a value larger 0 shows a strong characteristic for that bit and class. A value ~0 indicates that there is no or a weak relationship between the bit and the class.
Icon
The model; it's the input to the predictor node.

Popular Predecessors

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.