Icon

kn_​example_​python_​read_​arff_​file

use the new (KNIME 4.6) integrated Python extension to read and write ARFF file into KNIME, export it again as Parquet, put it into SQLite database and read it back

use the new (KNIME 4.6) integrated Python extension to read and write ARFF file into KNIME, export it again as Parquet, put it into SQLite database and read it back

How to Set Up Your Python Extensions
https://www.knime.com/blog/how-to-setup-the-python-extension

use the new (KNIME 4.6) integrated Python extension to read and write ARFF file into KNIME, export it again as Parquet, put it into SQLite database and read it backHow to Set Up Your Python Extensionshttps://www.knime.com/blog/how-to-setup-the-python-extension '''Example: import an ARFF file written from KNIME into Python and export it back to SQLite and also Parquet'''# Import knime_io to access node inputs and outputs.import knime.scripting.io as knioimport numpy as np # linear algebraimport os # accessing directory structureimport pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)import pyarrow.parquet as pqimport sqlite3import arff# the ../data/ path from the KNIME flow variables - in the new KNIME knio style ....var_path_data = knio.flow_variables['var_path_data']# the name of the ARFF file including the path (originally exported from KNIME)v_path_arff_file = knio.flow_variables['v_path_arff_file']# import the local ARFF file into Pythondataset_import = arff.load(open(v_path_arff_file, 'rt'))# get the data from the -well- data part of the imported dictionarydata_import = np.array(dataset_import['data'])# get the attributes from the dictionaryattributes_import = np.array(dataset_import['attributes'])# get the names of the columns from the attributesnames_import = attributes_import[:,0]# convert the imported data into Pandas dataframe using the column names in 'names_import'df = pd.DataFrame(data_import, columns=names_import)# export the file again to Parquet - this time using GZIP# you could also try and use a different pathv_second_parquet_file = knio.flow_variables['var_path_data'] + "from_knime_python_test_file.parquet"# import the local parquet file into Python as 'df' dataframedf.to_parquet(v_second_parquet_file, compression='gzip')# ----- SQLite ----------------------------------------# ----- location of SQLite Database# define some basic variables# the location of the SQLite databasevar_loc_sqlite = var_path_data + "db.sqlite"# define a function to connect you to the SQLite DB# by default the schema used is -well- defaultdef sqlite_connectme(sqlite_databse="default"): try: # First we define the Connection to the physical SQLite file on the file system global sqlite_connection sqlite_connection = sqlite3.connect(sqlite_databse) # then we have to define a cursor global sqlite_cursor sqlite_cursor = sqlite_connection.cursor() except: print("failed to connect " + sqlite_databse) pass# establish a connection to the DBsqlite_connectme(sqlite_databse=var_loc_sqlite) # define a function that cleans up the SQLite table for the file you want to insertdef sqlite_cleanup(sqlite_tabame="default"): try: sqlite_connection.execute("DROP TABLE IF EXISTS " + sqlite_tabame + ";") sqlite_connection.commit() except: print("failed to clean SQLite DB") pass# what name should the new database havev_sqlite_table_name = "test_file"# make sure the table is not theresqlite_cleanup(sqlite_tabame=v_sqlite_table_name)# insert the df dataframe into the SQLite tabledf.to_sql(name=v_sqlite_table_name, con=sqlite_connection, if_exists="replace")sqlite_connection.commit()# this will not have an immediate effect under KNIME# but you could work with these functions if you need them# list all tables that are in the SQLite file we accessed earliersqlite_table_list = sqlite_cursor.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()print(sqlite_table_list)# Retrieve column information of our desired file# Every column will be represented by a tuple with the following attributes:# (id, name, type, notnull, default_value, primary_key)# http://sebastianraschka.com/Articles/2014_sqlite_in_python_tutorial.html#querying-the-database---selecting-rowssqlite_cursor.execute('PRAGMA TABLE_INFO({})'.format(v_sqlite_table_name))sqlite_table_variables = [tup[1] for tup in sqlite_cursor.fetchall()]print(sqlite_table_variables)# fetch data from SQLite data we named earlier# hint: the SQLite DB stores the variable typesdf_sqlite = pd.read_sql_query("select * from " + v_sqlite_table_name + ";", sqlite_connection)# see what types of variables are theredf_sqlite_types = df_sqlite.dtypesprint(df_sqlite_types)# close the SQLite data base at the endsqlite_connection.close()# Pass the transformed table to the first output port of the node.knio.output_tables[0] = knio.Table.from_pandas(df_sqlite) in the subfolder /data/ beneath this workflow there is a Jupyter notebook that would alsodemonstrate the import and export of ARFF fileskn_example_python_read_arff_file.ipynb KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 read ARFF file andexport to KNIME and SQLiteand again to Parquetv_path_arff_fileknime://knime.workflow/data/db.sqlitefrom SQLite database /data/db.sqlitedefault.test_filev_path_*../data/test_file.parquetimport the originalparquet file../data/from_python_test_file.parquetimport the parquet filewritten with Pythontest_data_all_typesfrom _knime_test_file.arfffrom _knime_test_file.arffv_path_parquet_filev_path_*from_python_myfile.arffv_path_arff_file_jupyterImport the Parquet fileand export it again as ARFFfrom within Pythonfrom_knime_python_test_file.arfffrom_knime_python_test_file.arff../data/from_knime_test_file.parquetgzip compressed Python Script collect meta data Java EditVariable (simple) SQLite Connector DB Table Selector DB Reader String to Path(Variable) Merge Variables Parquet Reader Parquet Reader Test Data Generator prepare_data ARFF Reader ARFF Writer Rule-basedRow Filter Java EditVariable (simple) String to Path(Variable) ARFF Reader Java EditVariable (simple) Python Script ARFF Reader Parquet Writer use the new (KNIME 4.6) integrated Python extension to read and write ARFF file into KNIME, export it again as Parquet, put it into SQLite database and read it backHow to Set Up Your Python Extensionshttps://www.knime.com/blog/how-to-setup-the-python-extension '''Example: import an ARFF file written from KNIME into Python and export it back to SQLite and also Parquet'''# Import knime_io to access node inputs and outputs.import knime.scripting.io as knioimport numpy as np # linear algebraimport os # accessing directory structureimport pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)import pyarrow.parquet as pqimport sqlite3import arff# the ../data/ path from the KNIME flow variables - in the new KNIME knio style ....var_path_data = knio.flow_variables['var_path_data']# the name of the ARFF file including the path (originally exported from KNIME)v_path_arff_file = knio.flow_variables['v_path_arff_file']# import the local ARFF file into Pythondataset_import = arff.load(open(v_path_arff_file, 'rt'))# get the data from the -well- data part of the imported dictionarydata_import = np.array(dataset_import['data'])# get the attributes from the dictionaryattributes_import = np.array(dataset_import['attributes'])# get the names of the columns from the attributesnames_import = attributes_import[:,0]# convert the imported data into Pandas dataframe using the column names in 'names_import'df = pd.DataFrame(data_import, columns=names_import)# export the file again to Parquet - this time using GZIP# you could also try and use a different pathv_second_parquet_file = knio.flow_variables['var_path_data'] + "from_knime_python_test_file.parquet"# import the local parquet file into Python as 'df' dataframedf.to_parquet(v_second_parquet_file, compression='gzip')# ----- SQLite ----------------------------------------# ----- location of SQLite Database# define some basic variables# the location of the SQLite databasevar_loc_sqlite = var_path_data + "db.sqlite"# define a function to connect you to the SQLite DB# by default the schema used is -well- defaultdef sqlite_connectme(sqlite_databse="default"): try: # First we define the Connection to the physical SQLite file on the file system global sqlite_connection sqlite_connection = sqlite3.connect(sqlite_databse) # then we have to define a cursor global sqlite_cursor sqlite_cursor = sqlite_connection.cursor() except: print("failed to connect " + sqlite_databse) pass# establish a connection to the DBsqlite_connectme(sqlite_databse=var_loc_sqlite) # define a function that cleans up the SQLite table for the file you want to insertdef sqlite_cleanup(sqlite_tabame="default"): try: sqlite_connection.execute("DROP TABLE IF EXISTS " + sqlite_tabame + ";") sqlite_connection.commit() except: print("failed to clean SQLite DB") pass# what name should the new database havev_sqlite_table_name = "test_file"# make sure the table is not theresqlite_cleanup(sqlite_tabame=v_sqlite_table_name)# insert the df dataframe into the SQLite tabledf.to_sql(name=v_sqlite_table_name, con=sqlite_connection, if_exists="replace")sqlite_connection.commit()# this will not have an immediate effect under KNIME# but you could work with these functions if you need them# list all tables that are in the SQLite file we accessed earliersqlite_table_list = sqlite_cursor.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()print(sqlite_table_list)# Retrieve column information of our desired file# Every column will be represented by a tuple with the following attributes:# (id, name, type, notnull, default_value, primary_key)# http://sebastianraschka.com/Articles/2014_sqlite_in_python_tutorial.html#querying-the-database---selecting-rowssqlite_cursor.execute('PRAGMA TABLE_INFO({})'.format(v_sqlite_table_name))sqlite_table_variables = [tup[1] for tup in sqlite_cursor.fetchall()]print(sqlite_table_variables)# fetch data from SQLite data we named earlier# hint: the SQLite DB stores the variable typesdf_sqlite = pd.read_sql_query("select * from " + v_sqlite_table_name + ";", sqlite_connection)# see what types of variables are theredf_sqlite_types = df_sqlite.dtypesprint(df_sqlite_types)# close the SQLite data base at the endsqlite_connection.close()# Pass the transformed table to the first output port of the node.knio.output_tables[0] = knio.Table.from_pandas(df_sqlite) in the subfolder /data/ beneath this workflow there is a Jupyter notebook that would alsodemonstrate the import and export of ARFF fileskn_example_python_read_arff_file.ipynb KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 read ARFF file andexport to KNIME and SQLiteand again to Parquetv_path_arff_fileknime://knime.workflow/data/db.sqlitefrom SQLite database /data/db.sqlitedefault.test_filev_path_*../data/test_file.parquetimport the originalparquet file../data/from_python_test_file.parquetimport the parquet filewritten with Pythontest_data_all_typesfrom _knime_test_file.arfffrom _knime_test_file.arffv_path_parquet_filev_path_*from_python_myfile.arffv_path_arff_file_jupyterImport the Parquet fileand export it again as ARFFfrom within Pythonfrom_knime_python_test_file.arfffrom_knime_python_test_file.arff../data/from_knime_test_file.parquetgzip compressedPython Script collect meta data Java EditVariable (simple) SQLite Connector DB Table Selector DB Reader String to Path(Variable) Merge Variables Parquet Reader Parquet Reader Test Data Generator prepare_data ARFF Reader ARFF Writer Rule-basedRow Filter Java EditVariable (simple) String to Path(Variable) ARFF Reader Java EditVariable (simple) Python Script ARFF Reader Parquet Writer

Nodes

Extensions

Links