Customer Distributions

The Customer Distributions node generates input data for a Market Simulation. It takes an optional Input Attributes List to create a set of Customer Distributions representing the Willingness To Pay (WTP) of Customers in the Market. Each row in the set of Output Customer Distributions corresponds to the part-worth value of a Feature, or the WTP of a Product, for a Virtual Customer.

The Input Attributes List can define the Distribution Type and Input Parameters of each Output Customer Distribution. If the Input Attributes List does not define the Output Customer Distribution, then the Input Parameters from the Configuration Dialog are used. Unlike the similar Matrix Distributions node, the Output Customer Distributions from this node will not be correlated.

For example, if the user wishes to create a Normal (Gaussian) Customer Distribution, then the Mean and Standard Deviation (SD) is set according to either the Configuration Dialog, or overridden by the 'A' column (corresponding to the Mean) and the 'B' column (corresponding to the SD) in the Input Attribute List.

Or for example, if the user wishes to create a Uniform Customer Distribution, then the Minimum Value and the Maximum Value is again set according to either the Configuration Dialog, or overridden by the 'A' column (now corresponding to the Minimum Value) and the 'B' column (now corresponding to the Maximum Value) in the Input Attribute List.

The Output Customer Distributions from this Customer Distributions node can become part of a Customer Willingness To Pay Matrix (WTP Matrix) for a set of Products. The Input WTP Matrix can feed a downstream Market Simulation node or a Market Tuning node.

The Input Attribute List is optional. Missing values will be replaced by the defaults in the Configuration Dialog. If no input table is provided, then the Customer Distributions node will generate a single Customer Distribution with a Distribution Type and Input Parameters set according to the Configuration Dialog.

The available list of Distribution Types for the user to select from includes:

Normal (Gaussian): (Wikipedia) Generates a set of part-worth values for each Virtual Customer in the shape of a Normal (Gaussian) Distribution. The part-worth values can be drawn randomly or can have evenly changing gaps within a Normal Distribution of a given Mean and Standard Deviation (SD). The output values can be truncated by the Minimum and Maximum limits (if enabled). The Distribution can be sorted in Ascending, Descending, or Random order. Configuration parameters include:

Mean (A): Any floating-point (double) value
Standard Deviation (B): Any value greater than > 0.0

Linear: (Wikipedia) Generates a set of part-worth values for each Virtual Customer in the shape of a Uniform (Linear) Distribution. The part-worth values can be drawn randomly or can be evenly spaced between the Starting Value and the Ending Value, optionally truncated by Minimum and Maximum limits. The Distribution can be sorted in Ascending, Descending, or Random order. Configuration parameters include:

Starting Value (A): Any floating-point (double) value (inclusive)
Ending Value (B): Any floating-point (double) value (inclusive)

Asymptote End: (Wikipedia) Generates a set of part-worth values from an Exponential Function of the form [a x EXP(-b * CustomerID) + c]. The values selected from this Exponential Function will be between the Start value and 0.0 zero such that the beginning of the curve steeply declines but then rounds off and hugs the end value 0.0 zero. Configuration parameters include:

Start (A): Any value greater than > 0.0
Curviness (B): The 'Curviness' of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.

Asymptote Start: (Wikipedia) Generates a set of part-worth values from an Exponential Function of the form [a x EXP(-b * CustomerID) + c]. The values selected from this Exponential Function will be between the Start value and 0.0 zero such that the curve initially hugs the Start value and then steeply declines towards 0.0 zero. Configuration parameters include:

Start (A): Any value greater than > 0.0
Curviness (B): The 'Curviness' of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.

Beta: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Beta Distribution with a user-specified Alpha and Beta:

Alpha (A): Any value greater than > 0.0
Beta (B): Any value greater than > 0.0

Binomial: (Wikipedia) Generates a set of random integer part-worth values for each Virtual Customer in the shape of a Binomial Distribution with a user-specified Number of Trials and Probability of Success. Note that the Bernoulli distribution is a special case of the binomial distribution where just a single trial is conducted (Trials = 1). Configuration parameters include:

Trials (A): Number of Trials is any integer value greater than > 0.0
Probability (B): Probability of Success is any value between 0.0 and 1.0 (exclusive)

Cauchy: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Cauchy Distribution with a user-specified Median and Scale:

Median (A): Any floating-point (double) value
Scale (B): Any value greater than > 0.0

Chi-Square: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Chi-Square Distribution with a user-specified 'Degrees of Freedom'. After the part-worth value is calculated, the fixed value from 'Input Parameter B' is added to shift the result:

Degrees of Freedom (A): Any value greater than > 0.0
Then Add Fixed Value (B): Any floating-point value added after the random value is calculated

Exponential: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of an Exponential Distribution with a user-specified Mean. After the part-worth value is calculated, the fixed value from 'Input Parameter B' is added to shift the result:

Mean (A): Any value greater than > 0.0
Then Add Fixed Value (B): Any floating-point value added after the random value is calculated

F: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of an F Distribution with a user-specified 'Degrees of Freedom Numerator' and 'Degrees of Freedom Denominator':

Degrees of Freedom Numerator (A): Any value greater than > 0.0
Degrees of Freedom Denominator (B): Any value greater than > 0.0

Gamma: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Gamma Distribution with a user-specified Shape and Scale:

Shape (A): Any value greater than > 0.0
Scale (B): Any value greater than > 0.0

Inverse Gaussian: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Inverse Gaussian Distribution with a user-specified Mu and Lambda. As Lambda tends to infinity, the Inverse Gaussian distribution becomes more like a Normal (Gaussian) distribution:

Mu (A): The Mean having any value greater than > 0.0
Lambda (B): The Shape Parameter having any value greater than > 0.0

Poisson: (Wikipedia) The Poisson Distribution can be used for modeling the number of times an event occurs in an interval of time or space. Generates a set of random part-worth values for each Virtual Customer in the shape of a Poisson Distribution with a user-specified Probability and Entropy:

Lambda (A): The Poisson Mean having any value greater than > 0.0
Entropy (B): The Convergence criterion for cumulative probabilities (set to 0.0 by default)

Quadratic: (Wikipedia) The Quadratic Distribution starts at the y-intersect, decreases (or increases) to touch the x-intersect once, then increases (or decreases) again. The Distribution follows the equation [y = a ( x^2 - b )] with only one x-intersection occurring at the minimum (or maximum) of the y-value. The Quadratic Distribution can be used to model the 'Cost To Make' (CTM) a Product where the Marginal Cost initially falls with increased production, but then starts to increase again as resources become scarce and operational inefficiencies are magnified. As the minimum value is fixed at 0.0 it may be necessary to shift the values in this Distribution before using it in a Market Simulation model.

X-Intersection (A): The CustomerID row in the Output Distribution where the curve touches the X-Axis once (the X-Intersection cannot equal = 0.0)
Y-Intersection (B): The starting value of the Output Distribution where the curve intersects the Y-Axis (the Y-Intersection cannot equal = 0.0)

Sawtooth: (Wikipedia) The Sawtooth wave distribution looks like the teeth of a plain-toothed saw. The raw (unsorted) Distribution starts at zero and ramps upwards towards the Distribution's Amplitude. It reaches the Amplitude after the Distribution's Period, then drops to zero and starts again. Configuration parameters include:

Amplitude (A): The maximum height of the wave
Period (B): The number of Customer rows in the Output Distribution (greater than > 0.0) before the wave repeats itself

Sigmoid: (Wikipedia) Has the characteristic horizontal 'S-shaped' curve and is part of the family of Logistic Functions of the form [a / ( 1 + EXP(-b * (row - Customers/2) )]. The values selected from this function will be between the Start value and 0.0 zero such that the beginning of the curve hugs the start value, then steepens, then the end of the curve hugs the end value 0.0 zero. Configuration parameters include:

Start (A): Any value greater than > 0.0
Curviness (B): The 'Curviness' of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.

Simple Bimodal: (Wikipedia) Generates a simple Bimodal Distribution (a 'two-humped' Customer Distribution) from two Normal (Gaussian) Distributions. The user specifies the 'First Mean' and the 'Second Mean' with the Standard Deviation (SD) automatically calculated to be a quarter of the distance between the two Means. The user specifies:

First Mean (A): Half of the Virtual Customers will be distributed around the 'First Mean'
Second Mean (B): Half of the Virtual Customers will be distributed around the 'Second Mean'. The 'First Mean' cannot equal the 'Second Mean'.

Sinusoidal: (Wikipedia) The smooth periodic oscillation generated from the sine function rising and falling between 0.0 and the Amplitude. The raw (unsorted) Distribution starts rising at half-Amplitude and reaches the Amplitude after a quarter-Period. It then curves downward and reaches 0.0 zero after three-quarter-Periods. Configuration parameters include:

Amplitude (A): The maximum height of the wave
Period (B): The number of Customer rows in the Output Distribution (greater than > 0.0) before the wave repeats itself

Spike: (Wikipedia) Is a vertical 'S-shaped' curve that looks similar to a rotated Sigmoid function but is generated from a pair of Exponential Functions of the form [a x EXP(-b * CustomerID) + c]. The values selected from this Exponential Function will be between the Start value and 0.0 zero such that the beginning of the curve steeply declines, then rounds off, but then steeply declines again towards the end value 0.0 zero. Note that a sorted Normal Distribution will also generate a similar looking vertical S-shaped curve. Configuration parameters include:

Start (A): Any value greater than > 0.0
Curviness (B): The 'Curviness' of the Output Customer Distribution. Decreasing the Curviness will flatten the output curve, while increasing the Curviness will cause the output to be more curvy. A Curviness = 1.0 has been pre-set to provide a reasonable curve for about 10,000 Customer rows.

Square: (Wikipedia) The Square wave distribution alternates at a steady frequency between the Amplitude and 0.0 zero. The raw (unsorted) Distribution starts at the Amplitude and drops to zero after a half-Period. After the Distribution's Period, the wave is reset to its Amplitude and starts again. Configuration parameters include:

Amplitude (A): The maximum height of the wave
Period (B): The number of Customer rows in the Output Distribution (greater than > 0.0) before the wave repeats itself

T: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a T Distribution with a user-specified Degrees of Freedom. After the part-worth value is calculated, the fixed value from 'Input Parameter B' is added to shift the result:

Degrees of Freedom (A): Any value greater than > 0.0
Then Add Fixed Value (B): Any floating-point value added after the random value is calculated

Triangle: (Wikipedia) The Triangle wave distribution raises and falls linearly between 0.0 and the Amplitude. The raw (unsorted) Distribution climbs steadily from half-Amplitude and reaches the Amplitude after a quarter-Period. It then falls steadily and reaches 0.0 zero after three-quarter-Periods. Configuration parameters include:

Amplitude (A): The maximum height of the wave
Period (B): The number of Customer rows in the Output Distribution (greater than > 1.0) before the wave repeats itself

Weibull: (Wikipedia) Generates a set of random part-worth values for each Virtual Customer in the shape of a Weibull Distribution with a user-specified Shape and Scale:

Shape (A): Any value greater than > 0.0
Scale (B): Any value greater than > 0.0

Note: technical details concerning how the data generation is performed can be found by referring to the Apache Commons Math Library.

More Help: Examples and sample workflows can be found at the Scientific Strategy website: www.scientificstrategy.com.

Options

Standard Options

Name of Customer Distribution: The column name for the newly created Customer Distribution. This name is only used if an upstream Input Attribute List (optional) has not been connected.
Number of Customers: The number of Virtual Customers to be generated by the Customer Distributions node. Each Virtual Customer will be represented by a separate row in the Output Customer Distributions Matrix. If the optional Input Customer Distributions table has been connected, then this 'Number of Customers' option will be ignored and the number of rows from the Input Customer Distributions table will be used instead.
Default Distribution [Type]: The default continuous probability Distribution used when generating individual Customer Data for all Distributions in the Output Customer Distributions table. This default Distribution Type set in the Configuration Dialog will be applied to all Customer Distributions found in the Input Attribute List. The default can be overridden by a 'Type' column in the Input Attribute List.
Default Input Parameter [A]: The default Input Parameter A for all Distributions in the Output Customer Distributions table. This default 'A' will be applied to all Customer Distributions found in the Input Attribute List. This Configuration Dialog default Input Parameter A value can be overridden by a 'A' column in the Input Attribute List.
Default Input Parameter [B]: The default Input Parameter B for all Distributions in the Output Customer Distributions table. This default 'B' will be applied to all Customer Distributions found in the Input Attribute List. This Configuration Dialog default Input Parameter B value can be overridden by a 'B' column in the Input Attribute List.
Attribute to Customer Distribution Column: Attributes listed in the Input Attribute List can be added to the Output Customer Distributions table. The Product name, Feature name, or other column can be selected as the Attribute to add. The Customer Distributions for these Attributes will all be uncorrelated.

Advanced Options

Default Distribution [Maximum]: If enabled, the data generated for the Customer Distribution will capped at this ceiling Maximum. If a randomly generated data point is greater than this Maximum value then a second randomly generated data point will be used instead. The final data point will only be set to this Maximum value after multiple attempts to generate an acceptable random data point have failed. This Configuration Dialog default can be overridden by a 'Maximum' column in the Input Attribute List.
Default Distribution [Minimum]: If enabled, the data generated for the Customer Distribution will capped at this floor Minimum. If a randomly generated data point is less than this Minimum value then a second randomly generated data point will be used instead. The final data point will only be set to this Minimum value after multiple attempts to generate an acceptable random data point have failed. This Configuration Dialog default can be overridden by a 'Minimum' column in the Input Attribute List.
Default [Sort] Order: The final sorting order for the generated Output Customer Distributions. The Sort Order 'None' is selected by default so the Distribution will maintain its raw ordering from the generating Customer Distribution Type. If 'None' is selected, and the new Distribution is set to override an existing column from the Input Customer Distributions Matrix with the user-setting 'Sort-Replace Upstream Columns' in the 'Duplicates' tab, then the new Distribution will be sorted in the same order as the old existing column. This ensures that the correlation between Customer Distributions is maintained. But the sorting of each Output Customer Distribution can also be set to Ascending, Descending, or Random order. For example, a Distribution sorted in Descending order might be used to simulate the decreasing 'Cost To Make' (CTM) Dynamic Cost of a Product. This Configuration Dialog default can be overridden by a 'Sort' column in the Input Attribute List (set to either 'Random', 'Ascending', 'Descending', or 'None').
Default [Smooth] Gap Between Data Points: Most data points generated for a Customer Distribution are randomly selected according to the 'Distribution Type' and the 'Input Parameters'. However, for some Distribution Types it is possible for each data point to be distributed smoothly with an even (or evenly changing) Step Size. For example, the data points for a 'Linear Distribution' Type can have a fixed Step Size so that they are evenly distributed. This Configuration Dialog default can be overridden by a 'Smooth' column in the Input Attribute List (set to either 'true' or 'false').
Save Randomizing Seed: A Randomizing Seed can be saved to ensure that the random Customer Distributions generated by this node are always generated in the same way. If a Customer Distributions node is copied then the user should ensure the Saved Randomizing Seed is changed or not saved - otherwise Customer Distributions may be inconsistently generated. The 'New' button will generate a new Randomized Seed. Disable the CheckBox to generate a new Randomizing Seed each time the node is run.

Duplicate Options

Duplicate Columns: When a newly generated column conflicts with an upstream Input Customer Distributions column then the new column can either replace or scale the old upstream column. The options include:
Skip Existing Columns - keep the existing upstream Customer Distributions and do not replace them with newly generated columns: output = old.
Replace Upstream Columns - the newly generated duplicate Customer Distribution columns will replace the upstream Customer Distributions: output = new.
Sort-Replace Upstream Columns (default) - each newly generated duplicate Customer Distribution column will replace the upstream Customer Distribution but with the new data sorted in the same order as the upstream Distributions being replaced: output = new (sorted by old). Note: the [Sort] order option in the 'Advanced' tab must be set to 'None' for the new distribution to be sorted by the old distribution.
Add to Upstream Columns - the new column will be added to the upstream column: output = old + new.
Multiply by Upstream Columns - the new column will be multiplied by the upstream column: output = old x new.
Scale Upstream Columns as Percentage - the new column will scale the upstream column as if it was a percentage change: output = old x (1.00 + new).

Input Ports

Input Attribute List: (optional) The set of additional Products, Features, or other Attributes to add to the Output Customer Distributions Matrix. The Input Attribute List should include the following columns:

Product (string): (optional) Unique Product Name or Product ID. The Products listed in this column can be added to the Output Customer Distributions table if the user selects this as the 'Attribute to Customer Distribution Column' in the Configuration Dialog. The 'No Sale' Product will pass through to the 'Output Attribute List' but no Customer Distribution will be generated.
Feature (string): (optional) Name of the Feature associated with the Product. The Features listed in this column can be added to the Output Customer Distributions table if the user selects this as the 'Attribute to Customer Distribution Column' in the Configuration Dialog. If the user wishes to add Customer Distributions named using a [Product].[Feature] format then this column will need to be manually added by the user upstream of the Input Attribute List.
Type (double): (optional) The Distribution Type and the shape of the generated part-worth values in the Customer Distribution for the Attribute. Any Distribution Type listed in the Configuration Dialog can be used, including "Normal", "Uniform", "Exponential", and "Simple Bimodal". If this 'Type' is missing then the default 'Default Distribution Type' (initially 'Normal' Distribution) from the Configuration Dialog will be used instead.
A (double): (optional) The 'Input Parameter A' of the part-worth values to generate in the Customer Distribution for the Product, Feature, or Attribute. For a Normal Distribution, this 'A' value represents the Mean. If this 'A' value is missing then the default 'A' value from the Configuration Dialog will be used instead. 'Mean', 'Average', 'Value', 'WTP', and 'Start' can also be used as column names.
B (double): (optional) The 'Input Parameter B' of the part-worth values to generate in the Customer Distribution for the Product Feature. For a Normal Distribution, this 'B' value represents the Standard Deviation (SD). If this 'B' value is missing then the default 'B' value from the Configuration Dialog will be used instead. 'SD', 'Variance', 'Diversity', 'Range', and 'End' can also be used as column names.
Maximum (double): (optional) The ceiling Maximum of the part-worth values generated for the Product Feature Customer Distribution. If this 'Maximum' value is missing then the default 'Maximum' value from the Configuration Dialog, if enabled, will be used instead. Otherwise the part-worth values in the Customer Distribution will not be limited to a Maximum value.
Minimum (double): (optional) The floor Minimum of the part-worth values generated for the Product Feature Customer Distribution. If this 'Minimum' value is missing then the default 'Minimum' value from the Configuration Dialog, if enabled, will be used instead. Otherwise the part-worth values in the Customer Distribution will not be limited to a Minimum value.
Sort (string): (optional) The Sort order of each Output Customer Distribution can be set to either 'Random', 'Ascending', 'Descending', or 'None'. If this 'Sort' order is missing then the default 'Sort' order from the Configuration Dialog (initially set to 'None') will be used instead.
Smooth (boolean): (optional) It is sometimes possible for the data points within a generated Customer Distribution to be smoothly distributed with an evenly changing Step Size. For example, when a 'Linear Distribution' is set to 'Smooth' the Step Size between data points is fixed. If this 'Smooth' column is missing then the default 'Smooth' CheckBox selection from the Configuration Dialog (initially unchecked for 'Randomly Distributed Data Points') will be used instead.
Price (double): (optional) Price of the Product. This value will have no impact on the generation of the Output Customer Distributions, but may be conveniently passed downstream to a Market Simulation node.
Cost (double): (optional) Cost of the Product or Feature. This value will have no impact on the generation of the Output Customer Distributions, but may be conveniently passed downstream to a Market Simulation node. The Cost cannot be negative.
Quantity (integer): (optional) Quantity Sold of the Product. This value will have no impact on the generation of the Output Customer Distributions, but may be conveniently passed downstream to a Market Simulation node. The Input Quantity Sold would typically be compared against the Output Quantity Sold predicted by a Market Simulation node for testing and tuning.

Input Customer Distributions (double): (optional) A set of upstream Customer Distributions that will be appended before the newly generated Output Customer Distributions. If this optional table has been connected, then the user-defined 'Number of Customers' option in the Configuration Dialog will be ignored and the same number rows as the Input Customer Distributions will be generated.

Distribution01, Distribution02, etc (double): The set of upstream Customer Distributions to be appended to the newly generated Output Customer Distributions. If this 'Customer Distributions' node is going to replace an upstream Customer Distributions then the newly generated data will first be sorted into the same order as the original Customer Distribution being replaced (unless forced to be sorted in either Ascending, Descending, or Random order).

Output Ports

Output Attribute List: The set of Products, Features, or other Attributes added to the Output Customer Distributions Matrix. These Attributes are directly passed-through from the Input Attribute List as a convenience to downstream nodes. For example, the Input Attribute List can include details about the 'Price' of Products or 'Cost' of Features. In addition, the Output Attribute List will contain these columns:

Attribute: The unique Product, Feature, or Attribute Name with a matching column in the Output Customer Distributions Matrix.
Type: The Distribution Type and the shape of the generated part-worth values in the Customer Distribution for the Attribute.
Mean: The Mean of the part-worth values in the Output Customer Distribution Matrix for the Product, Feature, or Attribute. The Mean is calculated after the Distribution Type is generated. In general, the relative difference of the Means between related Attributes reflects the primary degree of Vertical Differentiation between each - particularly between Normal Distributions.
SD: The Standard Deviation (SD) of the part-worth values in the Output Customer Distribution Matrix for the Product Attribute. The SD is calculated after the Distribution Type is generated. A Product lacking Vertical Differentiation (that is, having a low Mean) can still attract Customers if it has a relatively high SD, or if it has Horizontal Differentiation (that is, its Customer Distribution is uncorrelated) relative to other Products.

Output Customer Distributions (double): The set of Customer Distributions for each unique Attribute found in the Input Attribute List, or just a single Customer Distribution if no upstream Input Attribute List has been connected. The total number of Virtual Available Customers is equal to the number of rows in the Output Customer Distributions Matrix.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension Market Simulation nodes by Scientific Strategy for KNIME - Community Edition from the below update site following our NodePit Product and Node Installation Guide:

v5.6

A zipped version of the software site can be downloaded here.

Plugin provider: Decision Ready, LLC

Plugin version: 5.2.0.v202311290506

On NodePit since: 2025-08-15

Last update: 2025-08-21

KNIME versions: v5.6, v5.5, v5.4, v5.3, v5.2, v5.1, v4.7, v4.6, v4.5, v4.4, v4.3, v4.2, v4.1, v4.0, v3.7, v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!