Icon

kn_​forum_​69854_​dbc_​dbf_​import_​datasus_​br

Use KNIME, R (read.dbc) and Python (dbfread) to import DBC (compressed DBF) from DATASUS (Department of Informatics of Brazilian Health System)

Use KNIME, R (read.dbc) and Python (dbfread) to import DBC (compressed DBF) from DATASUS (Department of Informatics of Brazilian Health System)
https://forum.knime.com/t/widget-to-upload-multiple-files/69854/5?u=mlauber71

----
KNIME and R — installation across operating systems — some remarks
https://medium.com/p/6494a2a498cc

KNIME and Python — Setting up and managing Conda environments
https://medium.com/p/2ac217792539

# read.dbc - An R package for reading data in the DBC (compressed DBF) formatused by DATASUS# https://github.com/danicat/read.dbc# install.packages("read.dbc")library("read.dbc")v_dbc_file_path <- knime.flow.in[["File path"]]v_dbf_file_path <- knime.flow.in[["v_dbf_file"]]# The call return logi = TRUE on successif( dbc2dbf(input.file = v_dbc_file_path , output.file = v_dbf_file_path) ) { print("File decompressed!")}knime.out <- data.frame("v_dbc_file_path" = v_dbc_file_path, "v_dbf_file_path" =v_dbf_file_path) import knime.scripting.io as knioimport numpy as npimport pandas as pdimport pyarrow.parquet as pqfrom dbfread import DBF# Define the file path and name of the dBASE dbf filefile_path = knio.flow_variables['v_dbf_file']parquet_file_path = knio.flow_variables['v_parquet_file']# Use the DBF function from dbfread to read the filetable = DBF(file_path)# Convert the table to a pandas DataFramedf = pd.DataFrame(iter(table))# Display the first 5 rows of the DataFrameprint(df.head())# import the local parquet file into Python as 'df' dataframedf.to_parquet(parquet_file_path, compression='gzip')knio.output_tables[0] = knio.Table.from_pandas(df) Use KNIME, R (read.dbc) and Python (dbfread) to import DBC (compressed DBF) from DATASUS (Department of Informatics of Brazilian Health System)https://forum.knime.com/t/widget-to-upload-multiple-files/69854/5?u=mlauber71 DATASUS (https://datasus.saude.gov.br/) is the name of the Department of Informatics ofBrazilian Health System. It is the agency resposible for publishing Brazilian publichealthcare data. Besides DATASUS, the Brazilian National Agency for SupplementaryHealth (ANS) also uses this file format for its public data. KNIME and R — installation across operating systems — some remarkshttps://medium.com/p/6494a2a498ccKNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 locate and create/data/ folderwith absolute pathsdata/source/*.dbcstart looplibrary("read.dbc")v_dbf_filepy3_knimedbfread!!!! EdItjust the first 1000 lines=> remove in productionv_parquet_filev_table_filev_table_filepathend loopPARS2112.parquet=> check if Parquet file Export has worked Collect LocalMetadata List Files/Folders URL to File Path Path to URI Table Row ToVariable Loop Start R Source (Table) Java Snippet(simple) Conda EnvironmentPropagation Merge Variables Python Script Row Filter Java Snippet(simple) Java Snippet(simple) Table Writer String to Path(Variable) Variable Loop End Parquet Reader # read.dbc - An R package for reading data in the DBC (compressed DBF) formatused by DATASUS# https://github.com/danicat/read.dbc# install.packages("read.dbc")library("read.dbc")v_dbc_file_path <- knime.flow.in[["File path"]]v_dbf_file_path <- knime.flow.in[["v_dbf_file"]]# The call return logi = TRUE on successif( dbc2dbf(input.file = v_dbc_file_path , output.file = v_dbf_file_path) ) { print("File decompressed!")}knime.out <- data.frame("v_dbc_file_path" = v_dbc_file_path, "v_dbf_file_path" =v_dbf_file_path) import knime.scripting.io as knioimport numpy as npimport pandas as pdimport pyarrow.parquet as pqfrom dbfread import DBF# Define the file path and name of the dBASE dbf filefile_path = knio.flow_variables['v_dbf_file']parquet_file_path = knio.flow_variables['v_parquet_file']# Use the DBF function from dbfread to read the filetable = DBF(file_path)# Convert the table to a pandas DataFramedf = pd.DataFrame(iter(table))# Display the first 5 rows of the DataFrameprint(df.head())# import the local parquet file into Python as 'df' dataframedf.to_parquet(parquet_file_path, compression='gzip')knio.output_tables[0] = knio.Table.from_pandas(df) Use KNIME, R (read.dbc) and Python (dbfread) to import DBC (compressed DBF) from DATASUS (Department of Informatics of Brazilian Health System)https://forum.knime.com/t/widget-to-upload-multiple-files/69854/5?u=mlauber71 DATASUS (https://datasus.saude.gov.br/) is the name of the Department of Informatics ofBrazilian Health System. It is the agency resposible for publishing Brazilian publichealthcare data. Besides DATASUS, the Brazilian National Agency for SupplementaryHealth (ANS) also uses this file format for its public data. KNIME and R — installation across operating systems — some remarkshttps://medium.com/p/6494a2a498ccKNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 locate and create/data/ folderwith absolute pathsdata/source/*.dbcstart looplibrary("read.dbc")v_dbf_filepy3_knimedbfread!!!! EdItjust the first 1000 lines=> remove in productionv_parquet_filev_table_filev_table_filepathend loopPARS2112.parquet=> check if Parquet file Export has workedCollect LocalMetadata List Files/Folders URL to File Path Path to URI Table Row ToVariable Loop Start R Source (Table) Java Snippet(simple) Conda EnvironmentPropagation Merge Variables Python Script Row Filter Java Snippet(simple) Java Snippet(simple) Table Writer String to Path(Variable) Variable Loop End Parquet Reader

Nodes

Extensions

Links