Icon

Tyranny of Coincidence

The workflow that was used in the October 15, 2020 webinar titled "Tyranny of Coincidence". The workflow simulates data then shows how conventional feature selection techniques can be unreliable in successfully eliminating irrelevant variables. A link to the webinar has also been provided under "Links".

ABT GenerationStart by creating a dataset with relationships we know exists, and relationships we know don't exist. We will see howwell various techniques work on this data! Feature SelectionAttempt several different feature selection techniques to see what does a good job with which data types. Regression significance Random Forest Utility Forward Feature Selection Backward Feature Selection Model SelectionCross Validation and Parameter Optimization ABT GenerationStart by creating a dataset with relationships we know exists, and relationships we know don't exist. We will see howwell various techniques work on this data! Empty TableSplit 50/50Define aTargetUnshuffled ModelCombine RandomVariablesAdd CategoricalEffectSplit 50/50Split 50/50Get AccuracyEvaluate onHoldoutModel withSelectedVariablesEvaluate onHoldoutSplit 50/50Model withSelectedVariablesGet AccuracyBegin LoopShuffled ModelEnd LoopCheck ShuffledVs UnshuffledCalculateP-valuesGet Best ApparentDiscoveryJoin Shuffled,UnshuffledUnshuffled ModelEvaluate onHoldoutGet AccuracyBegin LoopCheck ShuffledVs UnshuffledCalculateP-valuesGet BestApparentDiscoveryEnd LoopJoin Shuffled,UnshuffledUnshuffled ModelCreate VariableColumnShuffled ModelCreate VariableColumnCalculate ImportanceCalculate ImportanceCross ValidationTry Lots ofModelsCompareAccuraciesGet AccuracyBuild ModelPredict HoldoutAggregateResultsAdd CategoricalEffectEmpty TableCombine RandomVariablesDefine aTargetPredict HoldoutTrain Best ModelSplit 50/50Get AccuracyRemove InterceptGet AccuracyUnshuffled ModelEvaluate onHoldout Empty Table Creator Create UniformRandom Create GaussianRandom Partitioning Math Formula Linear RegressionLearner Create CategoricalRandom Create CategoricalRandom Column Appender Rule Engine Partitioning Forward FeatureSelection Partitioning Numeric Scorer Random Forest Predictor(Regression) Random Forest Learner(Regression) Column Filter Random Forest Predictor(Regression) Backward FeatureSelection Partitioning Random Forest Learner(Regression) Numeric Scorer Counting Loop Start Target Shuffling Linear RegressionLearner Loop End Rule Engine GroupBy Column Filter GroupBy Cross Joiner Linear RegressionLearner RegressionPredictor Numeric Scorer Counting Loop Start Column Filter Rule Engine GroupBy GroupBy Loop End Target Shuffling Cross Joiner Random Forest Learner(Regression) RowID Random Forest Learner(Regression) RowID Math Formula Math Formula X-Partitioner Parameter OptimizationLoop Start ParameterOptimization Loop End Numeric Scorer Gradient Boosted TreesLearner (Regression) Gradient Boosted TreesPredictor (Regression) X-Aggregator Column Filter Create CategoricalRandom Rule Engine Empty Table Creator Create UniformRandom Create GaussianRandom Column Appender Create CategoricalRandom Math Formula Gradient Boosted TreesPredictor (Regression) Gradient Boosted TreesLearner (Regression) Partitioning Numeric Scorer Rule-basedRow Filter Numeric Scorer Random Forest Learner(Regression) Random Forest Predictor(Regression) ABT GenerationStart by creating a dataset with relationships we know exists, and relationships we know don't exist. We will see howwell various techniques work on this data! Feature SelectionAttempt several different feature selection techniques to see what does a good job with which data types. Regression significance Random Forest Utility Forward Feature Selection Backward Feature Selection Model SelectionCross Validation and Parameter Optimization ABT GenerationStart by creating a dataset with relationships we know exists, and relationships we know don't exist. We will see howwell various techniques work on this data! Empty TableSplit 50/50Define aTargetUnshuffled ModelCombine RandomVariablesAdd CategoricalEffectSplit 50/50Split 50/50Get AccuracyEvaluate onHoldoutModel withSelectedVariablesEvaluate onHoldoutSplit 50/50Model withSelectedVariablesGet AccuracyBegin LoopShuffled ModelEnd LoopCheck ShuffledVs UnshuffledCalculateP-valuesGet Best ApparentDiscoveryJoin Shuffled,UnshuffledUnshuffled ModelEvaluate onHoldoutGet AccuracyBegin LoopCheck ShuffledVs UnshuffledCalculateP-valuesGet BestApparentDiscoveryEnd LoopJoin Shuffled,UnshuffledUnshuffled ModelCreate VariableColumnShuffled ModelCreate VariableColumnCalculate ImportanceCalculate ImportanceCross ValidationTry Lots ofModelsCompareAccuraciesGet AccuracyBuild ModelPredict HoldoutAggregateResultsAdd CategoricalEffectEmpty TableCombine RandomVariablesDefine aTargetPredict HoldoutTrain Best ModelSplit 50/50Get AccuracyRemove InterceptGet AccuracyUnshuffled ModelEvaluate onHoldout Empty Table Creator Create UniformRandom Create GaussianRandom Partitioning Math Formula Linear RegressionLearner Create CategoricalRandom Create CategoricalRandom Column Appender Rule Engine Partitioning Forward FeatureSelection Partitioning Numeric Scorer Random Forest Predictor(Regression) Random Forest Learner(Regression) Column Filter Random Forest Predictor(Regression) Backward FeatureSelection Partitioning Random Forest Learner(Regression) Numeric Scorer Counting Loop Start Target Shuffling Linear RegressionLearner Loop End Rule Engine GroupBy Column Filter GroupBy Cross Joiner Linear RegressionLearner RegressionPredictor Numeric Scorer Counting Loop Start Column Filter Rule Engine GroupBy GroupBy Loop End Target Shuffling Cross Joiner Random Forest Learner(Regression) RowID Random Forest Learner(Regression) RowID Math Formula Math Formula X-Partitioner Parameter OptimizationLoop Start ParameterOptimization Loop End Numeric Scorer Gradient Boosted TreesLearner (Regression) Gradient Boosted TreesPredictor (Regression) X-Aggregator Column Filter Create CategoricalRandom Rule Engine Empty Table Creator Create UniformRandom Create GaussianRandom Column Appender Create CategoricalRandom Math Formula Gradient Boosted TreesPredictor (Regression) Gradient Boosted TreesLearner (Regression) Partitioning Numeric Scorer Rule-basedRow Filter Numeric Scorer Random Forest Learner(Regression) Random Forest Predictor(Regression)

Nodes

Extensions

Links