Correlation Concatenation

The Correlation Concatenation node is designed to take any number of Input Correlation Matrices and join them into a single Output Correlation Matrix. The user can specify the degree of Cross Correlation each Matrix will have with the others Matrices when they are joined.

Concatenating Correlation Matrices is useful when the Horizontal Differentiation of Features have been independently generated but some Correlation is known to exist between them. For example, if 'Style', 'Color', and 'Ambience' Features were independently generated, then the Correlation Concatenation node could join these three Features together with some Cross Correlation.

Often there is also a relationship between the Elements within each Matrix depending upon the position of the Element. For example, travelers who stay at a luxury hotel will typically appreciate every aspect of that luxury wherever it is found. Hence, travelers who value the best 'Room' are also more likely to value the best 'Entertainment' and the best 'Food'. The top Element found in each Matrix has more Cross Correlation than other Element combinations. Similarly, economy travelers who do not place a high value on a good 'Room' are also not likely to place a high value on 'Entertainment' and 'Food'.

Typically the Matrix:Matrix Correlation will be modest (less than 0.5). Large Matrix:Matrix Correlations will require the Output Correlation Matrix to be repaired (see the 'Output Correlation Repaired Matrix' and the 'Output Correlation Error Matrix'). If large Feature Correlations are required then consider using the Differentiation Horizontal node instead.

All of the row and column names must be unique across all input tables otherwise the Matrices cannot be joined. If a specific 'Order' is not provided in the Input Matrix then the row index is used for matching Elements.

More Help: Examples and sample workflows can be found at the Scientific Strategy website: www.scientificstrategy.com.

Options

Standard Options

Maximum Cross Correlation Factor
The Maximum Correlation between matching Elements in each Input Matrix used when concatenating the matrices together. The first Element in Matrix A will have this Maximum Cross Correlation with respect to the first Element in Matrix B. Similarly, the second, third, and fourth Elements in Matrix A will also have this Maximum Cross Correlation with respect to the second, third, and fourth Elements in the Matrix B. In this way, the best-best-best Elements and the worst-worst-worst Elements will all have Maximum Cross Correlations.
Reduce Cross Correlation Decay
The Cross Correlation between non-matching Elements will decrease the further the distance between the Elements in each Matrix. For example, the Cross Correlation between the first Element in Matrix A and the second Element in Matrix B will be less than the Maximum Cross Correlation by this user-specified rate of decay. The Cross Correlation between the first Element in Matrix A and the third Element in Matrix B will be less by twice the rate of decay. For example, if the Maximum Cross Correlation Factor is 0.8 and this Correlation Decay is 1.0 then the respective Elements will have Cross Correlations of 0.8, 0.64, 0.51, 0.41, 0.33, etc. If the Correlation Decay is 2.0 then the decay will be quicker: 0.8, 0.51, 0.33, etc. If the Correlation Decay is 0.5 then the decay will be slower: 0.8, 0.72, 0.64, 0.57, 0.51, etc. A Correlation Decay of 0.0 indicates that there is no Element-to-Element relationship and the Cross Correlation of all Elements will be set to 0.8. A maximum decay of 100 will eliminate the Cross Correlation of all but the first Element.

Input Ports

Icon
Input Correlation Matrix A: The first input set of Correlations that define the relationship between Customer Distributions of the same name. The Correlation Matrix must be symmetrical such that the number of data rows match the number of columns. Each row Distribution Name should be unique among all three Input Correlation Matrices and correspond to a column of the same name. The Input Correlation Matrix should include the following columns:
  1. Distribution (string): The unique name of the Customer Distribution. This name should correspond to a column of the same name in the same Input Correlation Matrix. The Distribution column can have any name. If multiple string columns are found then the first column is treated as the Distribution name column and the other string columns are ignored. If no string columns are found then the RowID column is treated as the Distribution name column.
  2. Order (integer - optional): The specified Order of the Distribution used for matching Elements in other Correlation Matrices. If this Order is not provided then the row index will be used instead.
  3. Correlation Values (double): The correlation value between each Customer Distribution row and each Customer Distribution column. As the Correlation Matrix is expected to be symmetrical, each row-column value should be the same as each column-row value. If multiple correlations are provided for A:B or B:A then the highest-non-zero correlation will be used. Left-Lower or Right-Upper triangle matrices can also be used. The diagonal values should all be equal to 1.0.
Icon
Input Correlation Matrix B (optional): The second input set of Correlations that define the relationship between Customer Distributions of the same name. The Correlation Matrix must be symmetrical such that the number of data rows match the number of columns. Each row Distribution Name should be unique among all three Input Correlation Matrices and correspond to a column of the same name. The Input Correlation Matrix should include the following columns:
  1. Distribution (string): The unique name of the Customer Distribution. This name should correspond to a column of the same name in the same Input Correlation Matrix. The Distribution column can have any name. If multiple string columns are found then the first column is treated as the Distribution name column and the other string columns are ignored. If no string columns are found then the RowID column is treated as the Distribution name column.
  2. Order (integer - optional): The specified Order of the Distribution used for matching Elements in other Correlation Matrices. If this Order is not provided then the row index will be used instead.
  3. Correlation Values (double): The correlation value between each Customer Distribution row and each Customer Distribution column. As the Correlation Matrix is expected to be symmetrical, each row-column value should be the same as each column-row value. If multiple correlations are provided for A:B or B:A then the highest-non-zero correlation will be used. Left-Lower or Right-Upper triangle matrices can also be used. The diagonal values should all be equal to 1.0.
Icon
Input Correlation Matrix C (optional): The third input set of Correlations that define the relationship between Customer Distributions of the same name. The Correlation Matrix must be symmetrical such that the number of data rows match the number of columns. Each row Distribution Name should be unique among all three Input Correlation Matrices and correspond to a column of the same name. The Input Correlation Matrix should include the following columns:
  1. Distribution (string): The unique name of the Customer Distribution. This name should correspond to a column of the same name in the same Input Correlation Matrix. The Distribution column can have any name. If multiple string columns are found then the first column is treated as the Distribution name column and the other string columns are ignored. If no string columns are found then the RowID column is treated as the Distribution name column.
  2. Order (integer - optional): The specified Order of the Distribution used for matching Elements in other Correlation Matrices. If this Order is not provided then the row index will be used instead.
  3. Correlation Values (double): The correlation value between each Customer Distribution row and each Customer Distribution column. As the Correlation Matrix is expected to be symmetrical, each row-column value should be the same as each column-row value. If multiple correlations are provided for A:B or B:A then the highest-non-zero correlation will be used. Left-Lower or Right-Upper triangle matrices can also be used. The diagonal values should all be equal to 1.0.

Output Ports

Icon
Output Correlation Matrix: The output set of correlations that define the relationship between Customer Distributions described in all three Input Correlation Matrices. The Output Correlation Matrix will be symmetrical such that the number of data rows match the number of columns. The Output Correlation Matrix will contain these columns:
  1. Distribution: Each unique row name found in the Input Correlation Matrices corresponding to a row Customer Distribution.
  2. Order: The Order each unique row Distribution was provided or found in the Input Correlation Matrix.
  3. Correlated Distributions: Each unique column name found in the Input Correlation Matrices, along with the degree of correlation to the row Customer Distribution. Output correlations will be symmetrical and range-limited to -1.0 and +1.0.
Icon
Output Correlation Repaired Matrix: The repaired output set of correlations that define the relationship between Customer Distributions described in all three Input Correlation Matrices. Repairing is required when the correlations are unrealistic. For example, if X is highly correlated to Y (for example, X:Y = +0.99) and if X is highly correlated with Z (for example, X:Z = +0.99) then Y must be highly correlated with Z (that is, Y:Z >> 0.0). More precisely, the Correlation Matrix must have all positive definite Eigenvalues. Note that it is not necessary for downstream nodes that generate Customer Distributions (such as the Matrix Distributions node or the Feature Generation node) to use this Correlation Repaired Matrix as these downstream nodes will always first self-repair the Input Correlation Matrix. The Output Correlation Repaired Matrix will contain the same columns as the Output Correlation Matrix:
  1. Distribution: Each unique row name found in the Input Correlation Matrices corresponding to a row Customer Distribution.
  2. Order: The Order each unique row Distribution was provided or found in the Input Correlation Matrix.
  3. Correlated Distributions: Each unique column name found in the Input Correlation Matrices, along with the repaired degree of correlation to the row Customer Distribution. Output correlations will be symmetrical and range-limited to -1.0 and +1.0.
Icon
Output Correlation Error Matrix: The difference between the Output Correlation Matrix and the Output Correlation Repaired Matrix. This is a convenience output to show how the Correlation Matrix needs to be repaired before Customer Distributions can be generated. The Output Correlation Error Matrix will contain the same columns as the Output Correlation Matrix:
  1. Distribution: Each unique row name found in the Input Correlation Matrices corresponding to a row Customer Distribution.
  2. Order: The Order each unique row Distribution was provided or found in the Input Correlation Matrix.
  3. Correlated Distributions: Each unique column name found in the Input Correlation Matrices, along with the difference between the output correlation and the repaired correlation.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.