Icon

kn_​example_​python_​read_​orc_​file

Read ORC file into KNIME's Python node

Read ORC file into KNIME's Python node
If you want to transfer several files from outside sources into the Python environment and not loose column types ORC is one (local) alternative from within KNIME

KNIME and Python — Setting up and managing Conda environments
https://medium.com/p/2ac217792539

Read ORC file into KNIME's Python nodeIf you want to transfer several files from outside sources into the Python environment and not loose column types ORC is one (local) alternative from within KNIME https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandashttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_orc.htmlhttps://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79fhttps://medium.com/towards-data-engineering/apache-spark-on-apple-silicon-4ac61c5caf45 import knime.scripting.io as knio# This example script creates an output table containing randomly drawn integers using numpy and pandas.import numpy as npimport pandas as pd# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_orc.htmlv_read_orc_file = knio.flow_variables['v_path_orc_file']df = pd.read_orc(path=v_read_orc_file)knio.output_tables[0] = knio.Table.from_pandas(df) import knime.scripting.io as knio# This example script creates an output table containing randomly drawn integers using numpy and pandas.import numpy as npimport pandas as pdimport pyarrow.orc as orcv_read_orc_file = knio.flow_variables['v_path_orc_file']# https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandaswith open(v_read_orc_file) as file: data = orc.ORCFile(file) df = data.read().to_pandas()knio.output_tables[0] = knio.Table.from_pandas(df) import knime.scripting.io as knioimport numpy as npimport pandas as pd# you will have to install Apache Spark on your Machine# this is the version for MacOSX# https://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79f# https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandasimport findsparkfrom pyspark.sql import SparkSessionv_read_orc_file = knio.flow_variables['v_path_orc_file']findspark.init()spark = SparkSession.builder.getOrCreate()df_spark = spark.read.orc(v_read_orc_file)df_pandas = df_spark.toPandas()knio.output_tables[0] = knio.Table.from_pandas(df_pandas) Apache Spark On Apple Siliconhttps://medium.com/towards-data-engineering/apache-spark-on-apple-silicon-4ac61c5caf45brew install apache-spark KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 locate and create/data/ folderwith absolute paths/big_data//data/v_path_orc_filedummy datav_path*test_file.orctest_file.orcv_path_orc_file_from_sparkas foldertest_file.orcv_path_orc_file_from_sparkv_path_orc_file_from*test_file.orctest_file_from_spark.orc(folder)test_file_from_spark.orc(folder)v_path_orc_file_from_sparkread .orc filesORC folder again to Sparkdf_spark = spark.read.orc(v_read_orc_file)df_pandas = df_spark.toPandas()py3_knime_sparkpd.read_orc(path=v_read_orc_file)since Pandas 1.0.0bundled KNIME Python versiondoes not work on MacOS or Windows...test_data_all_typesCollect LocalMetadata Metadata forBig Data Create Local BigData Environment Java EditVariable (simple) Data Generator String to Path(Variable) ORC Writer ORC Reader ORC to Spark Spark to ORC DeleteFiles/Folders Java EditVariable (simple) String to Path(Variable) Try (VariablePorts) Merge Variables Try (VariablePorts) Merge Variables DeleteFiles/Folders Catch Errors(Var Ports) Catch Errors(Var Ports) ORC Reader ORC to Spark Spark SQL Query Spark to Table Python Script Conda EnvironmentPropagation Merge Variables Python Script Python Script Test Data Generator test data Read ORC file into KNIME's Python nodeIf you want to transfer several files from outside sources into the Python environment and not loose column types ORC is one (local) alternative from within KNIME https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandashttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_orc.htmlhttps://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79fhttps://medium.com/towards-data-engineering/apache-spark-on-apple-silicon-4ac61c5caf45 import knime.scripting.io as knio# This example script creates an output table containing randomly drawn integers using numpy and pandas.import numpy as npimport pandas as pd# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_orc.htmlv_read_orc_file = knio.flow_variables['v_path_orc_file']df = pd.read_orc(path=v_read_orc_file)knio.output_tables[0] = knio.Table.from_pandas(df) import knime.scripting.io as knio# This example script creates an output table containing randomly drawn integers using numpy and pandas.import numpy as npimport pandas as pdimport pyarrow.orc as orcv_read_orc_file = knio.flow_variables['v_path_orc_file']# https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandaswith open(v_read_orc_file) as file: data = orc.ORCFile(file) df = data.read().to_pandas()knio.output_tables[0] = knio.Table.from_pandas(df) import knime.scripting.io as knioimport numpy as npimport pandas as pd# you will have to install Apache Spark on your Machine# this is the version for MacOSX# https://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79f# https://stackoverflow.com/questions/52889647/how-to-read-an-orc-file-stored-locally-in-python-pandasimport findsparkfrom pyspark.sql import SparkSessionv_read_orc_file = knio.flow_variables['v_path_orc_file']findspark.init()spark = SparkSession.builder.getOrCreate()df_spark = spark.read.orc(v_read_orc_file)df_pandas = df_spark.toPandas()knio.output_tables[0] = knio.Table.from_pandas(df_pandas) Apache Spark On Apple Siliconhttps://medium.com/towards-data-engineering/apache-spark-on-apple-silicon-4ac61c5caf45brew install apache-spark KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 locate and create/data/ folderwith absolute paths/big_data//data/v_path_orc_filedummy datav_path*test_file.orctest_file.orcv_path_orc_file_from_sparkas foldertest_file.orcv_path_orc_file_from_sparkv_path_orc_file_from*test_file.orctest_file_from_spark.orc(folder)test_file_from_spark.orc(folder)v_path_orc_file_from_sparkread .orc filesORC folder again to Sparkdf_spark = spark.read.orc(v_read_orc_file)df_pandas = df_spark.toPandas()py3_knime_sparkpd.read_orc(path=v_read_orc_file)since Pandas 1.0.0bundled KNIME Python versiondoes not work on MacOS or Windows...test_data_all_typesCollect LocalMetadata Metadata forBig Data Create Local BigData Environment Java EditVariable (simple) Data Generator String to Path(Variable) ORC Writer ORC Reader ORC to Spark Spark to ORC DeleteFiles/Folders Java EditVariable (simple) String to Path(Variable) Try (VariablePorts) Merge Variables Try (VariablePorts) Merge Variables DeleteFiles/Folders Catch Errors(Var Ports) Catch Errors(Var Ports) ORC Reader ORC to Spark Spark SQL Query Spark to Table Python Script Conda EnvironmentPropagation Merge Variables Python Script Python Script Test Data Generator test data

Nodes

Extensions

Links