Icon

Regression and Classification Models

Logistic Regression

This workflow is an example of how to build a basic prediction / classification model using logistic regression.

URL: Logistic Regression Node: Algorithm Settings https://youtu.be/AclQdjxpGA0

Regression Model

Purpose:

Estimate approximate lifetime box office revenue after release using early audience signals and known film attributes.

Model performance:

R² ≈ 0.76 indicates the model explains most revenue variation but still has meaningful uncertainty.

Actual vs. predicted revenue:

Closer clustering around the diagonal reflects stronger predictions; spread shows natural revenue volatility.

Residual distribution:

Residuals centered near zero suggest no consistent over or under prediction.

Revenue by month:

Predictions capture seasonal trends, with higher revenues in peak release months.

How to use:

Budget, runtime, release timing, popularity, vote count, and vote average.

Classification Model

Flop

Negative ROI (movie loses money)

Moderate:

Break even to low profit (ROI between 0 and 1)

Hit:

Strong profit (ROI between 1 and 3)

Blockbuster:

Exceptional profit (ROI 3 or higher)

Recall

Of all movies that truly fall into this class, how many the model correctly identified.
(High recall = fewer missed flops or missed blockbusters)

Precision

Of all movies the model labeled as this class, how many truly belong there.
(High precision = fewer false alarms

Model timing:

This ROI classifier is designed to be used after a movie has been released, using early viewer engagement (popularity, vote count, vote average) plus known attributes like budget and runtime. It predicts the lifetime ROI class (Flop, Moderate, Hit, Blockbuster) as a decision support tool, not a final answer.

Purpose:

Estimate approximate lifetime box office revenue after release using early audience signals and known film attributes.

Model performance:

R² ≈ 0.76 indicates the model explains most revenue variation but still has meaningful uncertainty.

Actual vs. predicted revenue:

Closer clustering around the diagonal reflects stronger predictions; spread shows natural revenue volatility.

Residual distribution:

Residuals centered near zero suggest no consistent over or under prediction.

Revenue by month:

Predictions capture seasonal trends, with higher revenues in peak release months.

How to use:

Best for comparing scenarios and relative upside, not exact revenue guarantees.
Scorer
Component
Regression Dashboard
Bar Chart
Numeric Scorer
Classification Dashboard
Expression
Scatter Plot
Component
Bar chart P and R
Histogram
class recall
Sorter
Heat map
Expression
Model Nodes
Color Manager
Table View
Numeric Scorer

Nodes

Extensions

Links