Icon

kn_​forum_​62280_​spark_​excel_​export

KNIME and PySpark - export an Excel file directly from within a Spark environment - also importing CSV from HDFS to Spark

KNIME and PySpark - export an Excel file directly from within a Spark environment - also importing CSV from HDFS to Spark

from openpyxl import Workbookfrom pyspark.sql.functions import coldef export_to_excel(df, file_path): # Create a new workbook and select the active worksheet wb = Workbook() ws = wb.active # Write the column headers to the worksheet for i, col_name in enumerate(df.columns): ws.cell(row=1, column=i+1, value=col_name) # Write the data rows to the worksheet rows = df.select('*').collect() for r, row in enumerate(rows): for c, val in enumerate(row): ws.cell(row=r+2, column=c+1, value=val) # Save the workbook to disk wb.save(file_path) # Create a DataFrame # you can later replace this dataframe with your own data data = [("John", 25), ("Jane", 30), ("Bob", 35)] df = spark.createDataFrame(data, ["Name", "Age"]) # cretae the var_export_path = flow_variables['v_v_path_upload_big_data'] +"my_file.xlsx" # Export the DataFrame to an Excel file export_to_excel(df, var_export_path) resultDataFrame1 = df KNIME and PySpark - export an Excel file directly from within a Spark environment - also importing CSV from HDFS to Sparkhttps://forum.knime.com/t/spark-to-excel/62280/5?u=mlauber71 /big_data//data/=> will clear big_datafolderCluster_MembershipRowIDsearch for the upload folder on the local big data system../big_data on MacOS and Linux..\big_data on Windows?export data directly from PySpark to Excelif this does make any sense ... :-)Transfer CSV filesfrom HDFS to Spark/upload//csv_folder/upload//csv_export_folder Create Local BigData Environment Metadata forBig Data Column Rename Partitioning RowID determineupload path PySparkScript Source Spark to Table CSV to Spark Data Generator Spark to Table put CSV filesto HDFS Spark to CSV from openpyxl import Workbookfrom pyspark.sql.functions import coldef export_to_excel(df, file_path): # Create a new workbook and select the active worksheet wb = Workbook() ws = wb.active # Write the column headers to the worksheet for i, col_name in enumerate(df.columns): ws.cell(row=1, column=i+1, value=col_name) # Write the data rows to the worksheet rows = df.select('*').collect() for r, row in enumerate(rows): for c, val in enumerate(row): ws.cell(row=r+2, column=c+1, value=val) # Save the workbook to disk wb.save(file_path) # Create a DataFrame # you can later replace this dataframe with your own data data = [("John", 25), ("Jane", 30), ("Bob", 35)] df = spark.createDataFrame(data, ["Name", "Age"]) # cretae the var_export_path = flow_variables['v_v_path_upload_big_data'] +"my_file.xlsx" # Export the DataFrame to an Excel file export_to_excel(df, var_export_path) resultDataFrame1 = df KNIME and PySpark - export an Excel file directly from within a Spark environment - also importing CSV from HDFS to Sparkhttps://forum.knime.com/t/spark-to-excel/62280/5?u=mlauber71 /big_data//data/=> will clear big_datafolderCluster_MembershipRowIDsearch for the upload folder on the local big data system../big_data on MacOS and Linux..\big_data on Windows?export data directly from PySpark to Excelif this does make any sense ... :-)Transfer CSV filesfrom HDFS to Spark/upload//csv_folder/upload//csv_export_folderCreate Local BigData Environment Metadata forBig Data Column Rename Partitioning RowID determineupload path PySparkScript Source Spark to Table CSV to Spark Data Generator Spark to Table put CSV filesto HDFS Spark to CSV

Nodes

Extensions

Links