Icon

EDA Merah (1)

Segment and Compare One Customer Subset

This section reads the data, then does two related checks on the same dataset. First, it creates a category/flag column, counts how many records fall into each group, and prepares that summary for a bar chart. In parallel, it filters the data down to a smaller subset of 45 matching rows and calculates another numeric field so the distribution of that subset can be explored with a histogram. Conceptually, this block helps you both compare group sizes and inspect the detailed values inside a selected group.

Combine Two Tables and Summarize a Labeled Subset

This section brings together two related datasets, then creates a category/flag from the combined information and refines it into a clearer text label. After that, it keeps only the target subset and counts how many records fall into each label, producing a compact summary table for comparison. Conceptually, this is used to merge context, classify each row, isolate the cases of interest, and measure their distribution.

Reshape Counts into a Trend Table

This section turns the summarized counts into a format that is easier to compare across time or groups. It reorganizes the data so each category becomes its own column, standardizes the column names, calculates a difference/gap value between categories, and then sorts the result for clearer reading. Conceptually, this prepares a compact table for spotting patterns, changes, or contrasts before the final visualization.

Profile Two Datasets and Compare a Derived Metric

This block examines two separate datasets in parallel. It first generates summary statistics for each one, then keeps the key numeric fields needed for comparison. For each dataset, it calculates a derived value from those statistics, and finally combines the results into one small table. Conceptually, this helps you compare the same metric across both datasets side by side before plotting or reporting it.

Prepare Two Focused Views of the Data

This section creates two simple summaries from the same raw dataset. One branch filters the data down to 45 relevant rows so you can inspect that subset more closely. The other branch classifies all 842 rows into categories using rules, then counts how many records fall into each category. In the end, you get a small summary table showing the distribution of those groups, ready for reporting or visualization.

Visualize Distribution Summaries

These views turn small summary tables into quick charts so you can see the distribution of records across categories. One chart shows the shares as a pie chart, while the other shows the values as a histogram. Conceptually, this helps you move from raw counts to an easier visual comparison of how the data is spread.

Compare a Derived Metric Across Datasets

Displays the final result as a bar chart, using the small combined summary table created earlier. Conceptually, this helps you quickly compare the derived metric side by side between the two datasets.

Visualize the Trend Comparison

Displays the prepared summary table as a line plot, making it easier to see how the category counts and their difference/gap change across the grouped values. Conceptually, this is the final step for spotting patterns, contrasts, and trends in the reshaped data.

Visualize Group Comparison and Subset Distribution

This part creates two complementary views of the data: a bar chart to compare the size of each group, and a histogram to show how values are distributed within a selected subset. Together, they help you quickly see both how many records are in each category and how the detailed numeric values are spread.

Summarize and Score Category Frequencies

Starting from the filtered dataset, this section creates two frequency summaries for different categorical fields. In each branch, the data is grouped to count how often each category appears, then a percentage/share is calculated so you can compare categories by relative size, not just raw counts. One summary is then sorted to rank categories from most to least common, while the other also adds a simple rule-based label to classify categories before visualization.

Prepare and Enrich the Analysis Dataset

This section loads the source files, converts a text date field into a proper date/time format, then builds an extra lookup table from another dataset by finding each unique item and tagging it with a constant flag. That lookup is joined back to the main table to enrich each record, and missing values created by the join are filled in so the result is a cleaner, analysis-ready dataset. Finally, the data is filtered down to the subset of records used for the next EDA steps.

Visualize Category Distributions

These two nodes display the results of the earlier summaries as bar charts. One chart shows the ranked frequency share of categories, while the other shows category frequencies after adding a simple rule-based classification. This helps you quickly compare which groups are most common and see how the categories are distributed visually.

Pie Chart
Bar Chart
Row Filter
CSV Reader
Concatenate
String Manipulation
CSV Reader
GroupBy
CSV Reader
Joiner
Bar Chart
Rule Engine
CSV Reader
Column Renamer
Column Renamer
GroupBy
Math Formula
String to Date&Time
Column Renamer
Joiner
Column Renamer
Bar Chart
Constant Value Column Appender
Pivot
Sorter
Line Plot
Column Renamer
CSV Reader
Math Formula
Missing Value
Sorter
GroupBy
Statistics
CSV Reader
Statistics
CSV Reader
Math Formula
Rule Engine
Math Formula
Column Filter
CSV Reader
Column Filter
Math Formula
Rule Engine
Row Filter
GroupBy
Column Renamer
Row Filter
Math Formula
Bar Chart
Histogram
CSV Reader
Row Filter
GroupBy
Column Renamer
Rule Engine
GroupBy
Histogram
CSV Reader

Nodes

Extensions

Links