Icon

kn_​example_​python_​read_​orc_​file

Read ORC file into KNIME's Python node

Read ORC file into KNIME's Python node
If you want to transfer several files from outside sources into the Python environment and not loose column types ORC is one (local) alternative from within KNIME

KNIME and Python — Setting up and managing Conda environments
https://medium.com/p/2ac217792539

Read ORC file into KNIME's Python nodeIf you want to transfer several files from outside sources into the Python environment and not loose column types ORC is one (local) alternative from within KNIME https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandashttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_orc.htmlhttps://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79fhttps://medium.com/towards-data-engineering/apache-spark-on-apple-silicon-4ac61c5caf45 import knime.scripting.io as knio# This example script creates an output table containing randomly drawn integers using numpy and pandas.import numpy as npimport pandas as pd# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_orc.htmlv_read_orc_file = knio.flow_variables['v_path_orc_file']df = pd.read_orc(path=v_read_orc_file)knio.output_tables[0] = knio.Table.from_pandas(df) import knime.scripting.io as knio# This example script creates an output table containing randomly drawn integers using numpy and pandas.import numpy as npimport pandas as pdimport pyarrow.orc as orcv_read_orc_file = knio.flow_variables['v_path_orc_file']# https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandaswith open(v_read_orc_file) as file: data = orc.ORCFile(file) df = data.read().to_pandas()knio.output_tables[0] = knio.Table.from_pandas(df) import knime.scripting.io as knioimport numpy as npimport pandas as pd# you will have to install Apache Spark on your Machine# this is the version for MacOSX# https://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79f# https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandasimport findsparkfrom pyspark.sql import SparkSessionv_read_orc_file = knio.flow_variables['v_path_orc_file']findspark.init()spark = SparkSession.builder.getOrCreate()df_spark = spark.read.orc(v_read_orc_file)df_pandas = df_spark.toPandas()knio.output_tables[0] = knio.Table.from_pandas(df_pandas) Apache Spark On Apple Siliconhttps://medium.com/towards-data-engineering/apache-spark-on-apple-silicon-4ac61c5caf45brew install apache-spark KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 Propagate Python environmentfor KNIME on MacOSX (Apple Scilicon)OR Windowswith Miniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesv_path_orc_filedummy datav_path*test_file.orctest_file.orcv_path_orc_file_from_sparkas foldertest_file.orcv_path_orc_file_from_sparkv_path_orc_file_from*test_file.orctest_file_from_spark.orc(folder)test_file_from_spark.orc(folder)v_path_orc_file_from_sparkread .orc filesORC folder again to Sparkdf_spark = spark.read.orc(v_read_orc_file)df_pandas = df_spark.toPandas()pd.read_orc(path=v_read_orc_file)since Pandas 1.0.0bundled KNIME Python versiondoes work with macOSdoes not work on MacOS or Windows...test_data_all_types=> deletes the wholelocal big data folder/big_datadecide in the configuration if you want sub-folders or parent foldersif you encouter any problems, closeKNIME and delete all data from the folder/big_data/conda_knime_spark Java EditVariable (simple) Data Generator String to Path(Variable) ORC Writer ORC Reader ORC to Spark Spark to ORC DeleteFiles/Folders Java EditVariable (simple) String to Path(Variable) Try (VariablePorts) Merge Variables Try (VariablePorts) Merge Variables DeleteFiles/Folders Catch Errors(Var Ports) Catch Errors(Var Ports) ORC Reader ORC to Spark Spark SQL Query Spark to Table Python Script Merge Variables Python Script Python Script Test Data Generator test data local big datacontext create Column Filter Read ORC file into KNIME's Python nodeIf you want to transfer several files from outside sources into the Python environment and not loose column types ORC is one (local) alternative from within KNIME https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandashttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_orc.htmlhttps://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79fhttps://medium.com/towards-data-engineering/apache-spark-on-apple-silicon-4ac61c5caf45 import knime.scripting.io as knio# This example script creates an output table containing randomly drawn integers using numpy and pandas.import numpy as npimport pandas as pd# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_orc.htmlv_read_orc_file = knio.flow_variables['v_path_orc_file']df = pd.read_orc(path=v_read_orc_file)knio.output_tables[0] = knio.Table.from_pandas(df) import knime.scripting.io as knio# This example script creates an output table containing randomly drawn integers using numpy and pandas.import numpy as npimport pandas as pdimport pyarrow.orc as orcv_read_orc_file = knio.flow_variables['v_path_orc_file']# https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandaswith open(v_read_orc_file) as file: data = orc.ORCFile(file) df = data.read().to_pandas()knio.output_tables[0] = knio.Table.from_pandas(df) import knime.scripting.io as knioimport numpy as npimport pandas as pd# you will have to install Apache Spark on your Machine# this is the version for MacOSX# https://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79f# https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandasimport findsparkfrom pyspark.sql import SparkSessionv_read_orc_file = knio.flow_variables['v_path_orc_file']findspark.init()spark = SparkSession.builder.getOrCreate()df_spark = spark.read.orc(v_read_orc_file)df_pandas = df_spark.toPandas()knio.output_tables[0] = knio.Table.from_pandas(df_pandas) Apache Spark On Apple Siliconhttps://medium.com/towards-data-engineering/apache-spark-on-apple-silicon-4ac61c5caf45brew install apache-spark KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 Propagate Python environmentfor KNIME on MacOSX (Apple Scilicon)OR Windowswith Miniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesv_path_orc_filedummy datav_path*test_file.orctest_file.orcv_path_orc_file_from_sparkas foldertest_file.orcv_path_orc_file_from_sparkv_path_orc_file_from*test_file.orctest_file_from_spark.orc(folder)test_file_from_spark.orc(folder)v_path_orc_file_from_sparkread .orc filesORC folder again to Sparkdf_spark = spark.read.orc(v_read_orc_file)df_pandas = df_spark.toPandas()pd.read_orc(path=v_read_orc_file)since Pandas 1.0.0bundled KNIME Python versiondoes work with macOSdoes not work on MacOS or Windows...test_data_all_types=> deletes the wholelocal big data folder/big_datadecide in the configuration if you want sub-folders or parent foldersif you encouter any problems, closeKNIME and delete all data from the folder/big_data/conda_knime_spark Java EditVariable (simple) Data Generator String to Path(Variable) ORC Writer ORC Reader ORC to Spark Spark to ORC DeleteFiles/Folders Java EditVariable (simple) String to Path(Variable) Try (VariablePorts) Merge Variables Try (VariablePorts) Merge Variables DeleteFiles/Folders Catch Errors(Var Ports) Catch Errors(Var Ports) ORC Reader ORC to Spark Spark SQL Query Spark to Table Python Script Merge Variables Python Script Python Script Test Data Generator test data local big datacontext create Column Filter

Nodes

Extensions

Links