Icon

kn_​example_​python_​date_​time_​pyarrow

KNIME and Python with Pandas and PyArrow to handle Date and Time variables

KNIME and Python with Pandas and PyArrow to handle Date and Time variables
in the sub-folder /data/ there are two Jupyter notebooks to try a few things with date and time variables with

Pandas: knime_py_pandas_date_time_columns.ipynb
PyArrow: knime_py_pyarrow_date_time_columns.ipynb



import knime.scripting.io as knioimport pandas as pdimport pyarrow as paimport pyarrow.parquet as pqimport pyarrow.compute as pcimport datetimeimport time# This example script simply outputs the node's input table.df = knio.input_tables[0].to_pyarrow()# the name of the parquet file from KNIME including the pathv_path_parquet_file = knio.flow_variables['v_path_parquet_pyarrow_file']# from KNIME yyyy-MM-dd;HH:mm:ssdf = df.append_column("Local Date Time (DateTime)", pc.strptime(df.column("Local Date Time (String)"), format='%Y-%m-%d;%H:%M:%S', unit='s'))# from KNIME yyyy-MM-dddf = df.append_column("Local Date (DateTime)", pc.strptime(df.column("Local Date (String)"), format='%Y-%m-%d', unit='s'))# export the dataframe to local Parquet filepq.write_table(df, v_path_parquet_file, compression='gzip')knio.output_tables[0] = knio.Table.from_pyarrow(df) KNIME and Python with Pandas and PyArrow to handle Date and Time variablesin the sub-folder /data/ there are two Jupyter notebooks to try a few things with date and time variables withPandas: knime_py_pandas_date_time_columns.ipynbPyArrow: knime_py_pyarrow_date_time_columns.ipynb import knime.scripting.io as knioimport pandas as pdimport datetimeimport time# This example script simply outputs the node's input table.df = knio.input_tables[0].to_pandas()# the name of the parquet file from KNIME including the pathv_path_parquet_file = knio.flow_variables['v_path_parquet_file']v_path_csv_file = knio.flow_variables['v_path_csv_file']v_path_xlsx_file = knio.flow_variables['v_path_xlsx_file']# from KNIME yyyy-MM-dd;HH:mm:ssVVdf['Zoned Date Time (DateTime)'] = pd.to_datetime(df['Zoned Date Time (String)'], format='%Y-%m-%d;%H:%M:%S%Z', errors='coerce')# from KNIME yyyy-MM-dd;HH:mm:ssdf['Local Date Time (DateTime)'] = pd.to_datetime(df['Local Date Time (String)'], format='%Y-%m-%d;%H:%M:%S', errors='coerce')# from KNIME yyyy-MM-dddf['Local Date (DateTime)'] = pd.to_datetime(df['Local Date (String)'], format='%Y-%m-%d', errors='coerce')df.to_parquet(v_path_parquet_file, compression='gzip')knio.output_tables[0] = knio.Table.from_pandas(df) data_types_list.parquettest_data_all_typesuse pandasv_path_csv_filev_path*data_types_list.csvv_path_xlsx_filedata_types_list.xlsxyyyy-MM-ddyyyy-MM-dd;HH:mm:ssVVtest_date_time.parquet=> to use in the Jupyter notenooksas an example*(String)*yyyy-MM-dd;HH:mm:ssdf_export_from_pandas_jupyter.parquetexported from Jupyter notebook/data/ knime_py_pandas_date_time_columns.ipynbdf_export_from_knime_pyarrow.parquetdf_export_from_pyarrow_jupyter.parquetexported from Jupyter notebook/data/knime_py_pyarrow_date_time_columns.ipynbuse pyarrowv_path_parquet_pyarrow_filedf_export_from_knime_pandas.parquetpandaspandaspandaspyarrowpyarrowpyarrowinternalinternaluse internal KNIMEvariable typespandaslocate and create/data/ folderwith absolute pathsJava EditVariable (simple) Test Data Generator prepare_data Column Rename Python Script Java EditVariable (simple) String to Path(Variable) CSV Reader Java EditVariable (simple) Excel Reader Date&Time to String Date&Time to String Parquet Writer Column Filter Date&Time to String Parquet Reader Parquet Reader Parquet Reader Python Script Java EditVariable (simple) Parquet Reader Try (VariablePorts) Catch Errors(Var Ports) Merge Variables Try (VariablePorts) Catch Errors(Var Ports) Merge Variables Try (VariablePorts) Catch Errors(Var Ports) Python Script Merge Variables Collect LocalMetadata import knime.scripting.io as knioimport pandas as pdimport pyarrow as paimport pyarrow.parquet as pqimport pyarrow.compute as pcimport datetimeimport time# This example script simply outputs the node's input table.df = knio.input_tables[0].to_pyarrow()# the name of the parquet file from KNIME including the pathv_path_parquet_file = knio.flow_variables['v_path_parquet_pyarrow_file']# from KNIME yyyy-MM-dd;HH:mm:ssdf = df.append_column("Local Date Time (DateTime)", pc.strptime(df.column("Local Date Time (String)"), format='%Y-%m-%d;%H:%M:%S', unit='s'))# from KNIME yyyy-MM-dddf = df.append_column("Local Date (DateTime)", pc.strptime(df.column("Local Date (String)"), format='%Y-%m-%d', unit='s'))# export the dataframe to local Parquet filepq.write_table(df, v_path_parquet_file, compression='gzip')knio.output_tables[0] = knio.Table.from_pyarrow(df) KNIME and Python with Pandas and PyArrow to handle Date and Time variablesin the sub-folder /data/ there are two Jupyter notebooks to try a few things with date and time variables withPandas: knime_py_pandas_date_time_columns.ipynbPyArrow: knime_py_pyarrow_date_time_columns.ipynb import knime.scripting.io as knioimport pandas as pdimport datetimeimport time# This example script simply outputs the node's input table.df = knio.input_tables[0].to_pandas()# the name of the parquet file from KNIME including the pathv_path_parquet_file = knio.flow_variables['v_path_parquet_file']v_path_csv_file = knio.flow_variables['v_path_csv_file']v_path_xlsx_file = knio.flow_variables['v_path_xlsx_file']# from KNIME yyyy-MM-dd;HH:mm:ssVVdf['Zoned Date Time (DateTime)'] = pd.to_datetime(df['Zoned Date Time (String)'], format='%Y-%m-%d;%H:%M:%S%Z', errors='coerce')# from KNIME yyyy-MM-dd;HH:mm:ssdf['Local Date Time (DateTime)'] = pd.to_datetime(df['Local Date Time (String)'], format='%Y-%m-%d;%H:%M:%S', errors='coerce')# from KNIME yyyy-MM-dddf['Local Date (DateTime)'] = pd.to_datetime(df['Local Date (String)'], format='%Y-%m-%d', errors='coerce')df.to_parquet(v_path_parquet_file, compression='gzip')knio.output_tables[0] = knio.Table.from_pandas(df) data_types_list.parquettest_data_all_typesuse pandasv_path_csv_filev_path*data_types_list.csvv_path_xlsx_filedata_types_list.xlsxyyyy-MM-ddyyyy-MM-dd;HH:mm:ssVVtest_date_time.parquet=> to use in the Jupyter notenooksas an example*(String)*yyyy-MM-dd;HH:mm:ssdf_export_from_pandas_jupyter.parquetexported from Jupyter notebook/data/ knime_py_pandas_date_time_columns.ipynbdf_export_from_knime_pyarrow.parquetdf_export_from_pyarrow_jupyter.parquetexported from Jupyter notebook/data/knime_py_pyarrow_date_time_columns.ipynbuse pyarrowv_path_parquet_pyarrow_filedf_export_from_knime_pandas.parquetpandaspandaspandaspyarrowpyarrowpyarrowinternalinternaluse internal KNIMEvariable typespandaslocate and create/data/ folderwith absolute pathsJava EditVariable (simple) Test Data Generator prepare_data Column Rename Python Script Java EditVariable (simple) String to Path(Variable) CSV Reader Java EditVariable (simple) Excel Reader Date&Time to String Date&Time to String Parquet Writer Column Filter Date&Time to String Parquet Reader Parquet Reader Parquet Reader Python Script Java EditVariable (simple) Parquet Reader Try (VariablePorts) Catch Errors(Var Ports) Merge Variables Try (VariablePorts) Catch Errors(Var Ports) Merge Variables Try (VariablePorts) Catch Errors(Var Ports) Python Script Merge Variables Collect LocalMetadata

Nodes

Extensions

Links