Icon

kn_​example_​python_​graphic_​scatterplot_​diagonal_​reference

KNIME & Python Graphics - Scatterplot with ternd line and statistics

KNIME & Python Graphics - Scatterplot with ternd line and statistics

https://forum.knime.com/t/scatter-plot-diagonal-reference-line/73484/2?u=mlauber71
https://hub.knime.com/-/spaces/-/latest/~TQl4cycTtFLs4cnA/

Exploring the Power of Python Graphics with KNIME: A Collection of Examples
https://medium.com/p/841df87b5563


KNIME & Python Graphics - Scatterplot with ternd line and statisticshttps://forum.knime.com/t/scatter-plot-diagonal-reference-line/73484/2?u=mlauber71https://hub.knime.com/-/spaces/-/latest/~TQl4cycTtFLs4cnA/ import knime.scripting.io as knio#Import Libraryfrom io import BytesIOimport osimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltfrom scipy import statssns.set_theme(style="whitegrid")input_table = knio.input_tables[0].to_pandas()var_title = knio.flow_variables['title_graphic']var_footnote = knio.flow_variables['footnote_graphic']var_x_variable = knio.flow_variables['variable_x']var_x_label = knio.flow_variables['label_x']var_y_a_variable = knio.flow_variables['variable_y_a']var_y_a_label = knio.flow_variables['label_y_a']var_colour = knio.flow_variables['v_colour']#define figure size# sns.set(rc={"figure.figsize":(16, 9)}) #width=8, height=4sns.set(rc={'figure.figsize':(16,9)})################################################### https://seaborn.pydata.org/examples/large_distributions.htmlinput_table.dropna(subset=[var_x_variable, var_y_a_variable], inplace=True)# Calculate the Pearson correlation coefficient and its p-valuepearson_coef, pearson_p = stats.pearsonr(input_table[var_x_variable].dropna(), input_table[var_y_a_variable].dropna())# Calculate the Spearman rank correlation coefficient and its p-valuespearman_coef, spearman_p = stats.spearmanr(input_table[var_x_variable].dropna(), input_table[var_y_a_variable].dropna())# Calculate statisticsslope, intercept, r_value, p_value, std_err = stats.linregress(input_table[var_x_variable], input_table[var_y_a_variable])# Create the text for the graph# Create the text for the graphline = (f"Regression: y = {slope:.2f}x + {intercept:.2f}" f"\nR² = {r_value**2:.2f}, p = {p_value:.3f}" f"\nPearson's r: {pearson_coef:.2f} (p = {pearson_p:.3f})" f"\nSpearman's ρ: {spearman_coef:.2f} (p = {spearman_p:.3f})") # y = mx + c: This is the equation of a straight line, where:# {slope:.2f} (m) is the slope of the line. It indicates the change in y (dependent variable) for a unit change in x (independent variable).# {intercept:.2f} (c) is the y-intercept. It's the value of y when x is 0.# R² ({r_value**2:.2f}) is the coefficient of determination. It tells the proportion of the variance in the dependent variable that's predictable from the independent variable.# p ({p_value:.3f}) is the p-value. If it's below 0.05, it generally indicates that the observed relationship is statistically significant.# Pearson's r: It measures the linear correlation between two variables. A value close to 1 indicates a strong positive correlation, while a value close to -1 indicates a strong negativecorrelation. A value close to 0 indicates no linear correlation.# Spearman's ρ (rho): It assesses monotonic relationships, whether linear or not. It's based on ranked values of the data rather than the data itself. The values and interpretations for thecoefficient are the same as Pearson's, but it can capture non-linear relationships as well.# Create scatterplotg = sns.scatterplot(x=var_x_variable, y=var_y_a_variable, data=input_table, color=var_colour)# Add a trendline to the same plotsns.regplot(x=var_x_variable, y=var_y_a_variable, data=input_table, scatter=False, color="black", line_kws={"lw": 2}, ci=None, ax=g)# Add the calculated line equation and R^2 value to the plotg.annotate(line, xy=(0.05, 0.95), xycoords='axes fraction', fontsize=10, verticalalignment='top', bbox=dict(boxstyle="round,pad=0.3", edgecolor="black", facecolor="aliceblue"))# Set labels and titleg.set(xlabel=var_x_label, ylabel=var_y_a_label, title=var_title)##################################################knio.flow_variables['line_stats'] = linefig_out = g.get_figure()fig_out.set_size_inches(16, 9)fig_out.text(0.1, 0.025, var_footnote ,fontsize=10)#add overall title# g.fig.suptitle(var_title)# Create buffer to write intobuffer = BytesIO()# Create plot and write it into the bufferfig_out.savefig(buffer, format='svg')# The output is the content of the bufferoutput_image = buffer.getvalue()knio.output_images[0] = output_image# Assign the figure to the output_view variableknio.output_view = knio.view(fig_out) The diamonds dataset is a popular dataset in the Seaborn library, and it contains information about a largenumber of diamonds. Here's a breakdown of the columns in the dataset:1. **carat**: Weight of the diamond, measured in carats. A carat is equivalent to 0.2 grams.2. **cut**: Quality of the diamond cut, and it's an ordinal categorical variable. It has values: - **Fair**: Worst quality - **Good** - **Very Good** - **Premium** - **Ideal**: Best quality3. **color**: Color of the diamond, also an ordinal categorical variable. The values range from: - **J**: Worst color - ... - **D**: Best color4. **clarity**: A measurement of how clear the diamond is, another ordinal categorical variable. The values are: - **I1**: Worst clarity (inclusions are obvious under 10× magnification) - **SI2**: Slightly included 2 - **SI1**: Slightly included 1 - **VS2**: Very slightly included 2 - **VS1**: Very slightly included 1 - **VVS2**: Very, very slightly included 2 - **VVS1**: Very, very slightly included 1 - **IF**: Internally flawless, best clarity5. **depth**: Total depth percentage, calculated as `z / mean(x, y)`, where z is the depth of the diamond and xand y are the length and width. This gives an idea of the shape and proportions of the diamond.6. **table**: Width of the diamond's top relative to its widest point, represented as a percentage.7. **price**: Price of the diamond in US dollars.8. **x**: Length of the diamond in millimeters.9. **y**: Width of the diamond in millimeters.10. **z**: Depth of the diamond in millimeters.The diamonds dataset is often used for regression and classification tasks, as well as for data visualizationexercises, as it offers a mix of numeric and categorical attributes. If you plot attributes like carat against price ina scatter plot, you'll notice a positive correlation, with larger diamonds generally being more expensive. Exploring the Power of Python Graphics with KNIME: A Collection of Exampleshttps://medium.com/p/841df87b5563 locate and create/data/ folderwith absolute paths1.920 x 1.080PNG filefrom_knime_scatterplot.pngdiamonds.parquetIneractive VIEWwith Python graphics(right click) Collect LocalMetadata Image To Table Renderer to Image Table To Image Image Writer (Port) Parquet Reader Python graphics interactive - Scatterplotwith ternd line and statistics KNIME & Python Graphics - Scatterplot with ternd line and statisticshttps://forum.knime.com/t/scatter-plot-diagonal-reference-line/73484/2?u=mlauber71https://hub.knime.com/-/spaces/-/latest/~TQl4cycTtFLs4cnA/ import knime.scripting.io as knio#Import Libraryfrom io import BytesIOimport osimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltfrom scipy import statssns.set_theme(style="whitegrid")input_table = knio.input_tables[0].to_pandas()var_title = knio.flow_variables['title_graphic']var_footnote = knio.flow_variables['footnote_graphic']var_x_variable = knio.flow_variables['variable_x']var_x_label = knio.flow_variables['label_x']var_y_a_variable = knio.flow_variables['variable_y_a']var_y_a_label = knio.flow_variables['label_y_a']var_colour = knio.flow_variables['v_colour']#define figure size# sns.set(rc={"figure.figsize":(16, 9)}) #width=8, height=4sns.set(rc={'figure.figsize':(16,9)})################################################### https://seaborn.pydata.org/examples/large_distributions.htmlinput_table.dropna(subset=[var_x_variable, var_y_a_variable], inplace=True)# Calculate the Pearson correlation coefficient and its p-valuepearson_coef, pearson_p = stats.pearsonr(input_table[var_x_variable].dropna(), input_table[var_y_a_variable].dropna())# Calculate the Spearman rank correlation coefficient and its p-valuespearman_coef, spearman_p = stats.spearmanr(input_table[var_x_variable].dropna(), input_table[var_y_a_variable].dropna())# Calculate statisticsslope, intercept, r_value, p_value, std_err = stats.linregress(input_table[var_x_variable], input_table[var_y_a_variable])# Create the text for the graph# Create the text for the graphline = (f"Regression: y = {slope:.2f}x + {intercept:.2f}" f"\nR² = {r_value**2:.2f}, p = {p_value:.3f}" f"\nPearson's r: {pearson_coef:.2f} (p = {pearson_p:.3f})" f"\nSpearman's ρ: {spearman_coef:.2f} (p = {spearman_p:.3f})") # y = mx + c: This is the equation of a straight line, where:# {slope:.2f} (m) is the slope of the line. It indicates the change in y (dependent variable) for a unit change in x (independent variable).# {intercept:.2f} (c) is the y-intercept. It's the value of y when x is 0.# R² ({r_value**2:.2f}) is the coefficient of determination. It tells the proportion of the variance in the dependent variable that's predictable from the independent variable.# p ({p_value:.3f}) is the p-value. If it's below 0.05, it generally indicates that the observed relationship is statistically significant.# Pearson's r: It measures the linear correlation between two variables. A value close to 1 indicates a strong positive correlation, while a value close to -1 indicates a strong negativecorrelation. A value close to 0 indicates no linear correlation.# Spearman's ρ (rho): It assesses monotonic relationships, whether linear or not. It's based on ranked values of the data rather than the data itself. The values and interpretations for thecoefficient are the same as Pearson's, but it can capture non-linear relationships as well.# Create scatterplotg = sns.scatterplot(x=var_x_variable, y=var_y_a_variable, data=input_table, color=var_colour)# Add a trendline to the same plotsns.regplot(x=var_x_variable, y=var_y_a_variable, data=input_table, scatter=False, color="black", line_kws={"lw": 2}, ci=None, ax=g)# Add the calculated line equation and R^2 value to the plotg.annotate(line, xy=(0.05, 0.95), xycoords='axes fraction', fontsize=10, verticalalignment='top', bbox=dict(boxstyle="round,pad=0.3", edgecolor="black", facecolor="aliceblue"))# Set labels and titleg.set(xlabel=var_x_label, ylabel=var_y_a_label, title=var_title)##################################################knio.flow_variables['line_stats'] = linefig_out = g.get_figure()fig_out.set_size_inches(16, 9)fig_out.text(0.1, 0.025, var_footnote ,fontsize=10)#add overall title# g.fig.suptitle(var_title)# Create buffer to write intobuffer = BytesIO()# Create plot and write it into the bufferfig_out.savefig(buffer, format='svg')# The output is the content of the bufferoutput_image = buffer.getvalue()knio.output_images[0] = output_image# Assign the figure to the output_view variableknio.output_view = knio.view(fig_out) The diamonds dataset is a popular dataset in the Seaborn library, and it contains information about a largenumber of diamonds. Here's a breakdown of the columns in the dataset:1. **carat**: Weight of the diamond, measured in carats. A carat is equivalent to 0.2 grams.2. **cut**: Quality of the diamond cut, and it's an ordinal categorical variable. It has values: - **Fair**: Worst quality - **Good** - **Very Good** - **Premium** - **Ideal**: Best quality3. **color**: Color of the diamond, also an ordinal categorical variable. The values range from: - **J**: Worst color - ... - **D**: Best color4. **clarity**: A measurement of how clear the diamond is, another ordinal categorical variable. The values are: - **I1**: Worst clarity (inclusions are obvious under 10× magnification) - **SI2**: Slightly included 2 - **SI1**: Slightly included 1 - **VS2**: Very slightly included 2 - **VS1**: Very slightly included 1 - **VVS2**: Very, very slightly included 2 - **VVS1**: Very, very slightly included 1 - **IF**: Internally flawless, best clarity5. **depth**: Total depth percentage, calculated as `z / mean(x, y)`, where z is the depth of the diamond and xand y are the length and width. This gives an idea of the shape and proportions of the diamond.6. **table**: Width of the diamond's top relative to its widest point, represented as a percentage.7. **price**: Price of the diamond in US dollars.8. **x**: Length of the diamond in millimeters.9. **y**: Width of the diamond in millimeters.10. **z**: Depth of the diamond in millimeters.The diamonds dataset is often used for regression and classification tasks, as well as for data visualizationexercises, as it offers a mix of numeric and categorical attributes. If you plot attributes like carat against price ina scatter plot, you'll notice a positive correlation, with larger diamonds generally being more expensive. Exploring the Power of Python Graphics with KNIME: A Collection of Exampleshttps://medium.com/p/841df87b5563 locate and create/data/ folderwith absolute paths1.920 x 1.080PNG filefrom_knime_scatterplot.pngdiamonds.parquetIneractive VIEWwith Python graphics(right click)Collect LocalMetadata Image To Table Renderer to Image Table To Image Image Writer (Port) Parquet Reader Python graphics interactive - Scatterplotwith ternd line and statistics

Nodes

Extensions

Links