v1.10 - 10 January 2022
PLEASE NOTE THIS COMPONENT IS STILL PROTOTYPE AND SUBJECT TO SOME CHANGES - FEEDBACK WELCOME!
Python Script updated with some improvements, and also recoded so that it can be tested (in part) outside of KNIME (e.g. using VSCode) for easier development.
Reads the supplied XML file, using the specified path as a local file system file, but if that fails, attempts to read it as a URL.
This component uses Python 3 so you must have Python 3 installed and available in your KNIME environment. It makes use of the following Python modules: cElementTree, pandas, urllib
The XML data is output in grouped tabular format, which means that the rows should be ungrouped (use an upgroup node). Those data items that are expected to be repeated across all rows for a "group" should be excluded from the selection of columns to be ungrouped. In that way, repeated data is "copied down" where appropriate across row items.
Outputs of the columns and their paths is generated on the "Column Paths" port and on the "Path to Column Mapping" port. The Column Paths port is "by column name" and so if there is column-name clash (which can occur if more than one element in the XML has the same element-name) the resulting rows on this port will be deficient, as will the resulting data output.
The "Path to Column Mapping" port shows the same information, but is "path centric" and so will contain any columns for which "name clash" has occurred.
The "Column Name Clash" port will identify clashing names. This port should return no data if no name clash occurred, but can be used to quickly verify that all expected columns have been handled correctly.
The name of a csv "Column Name to Path" mapping file can be supplied, which allows you to specify which elements/columns to return, based on their paths. By specifying a different column name here, the column will be renamed on the output.
Paths follow a basic "pseudo xpath" format. No additional xpath syntax should be used as it will not be recognised, and will result in data in the file being ignored.
Element paths are defined by the format //element1/element2/element3
Attribute paths are defined by the format //element1/element2/element3/@attributename
Rows in the Column Name - Path mapping table can be "commented out". To do this, all that is necessary is that the path be "invalidated", and this can easily be achieved by, for example, adding a '#' to the end of the line
e.g. In the following example, the paths for the * and orderperson lines have been "invalidated" so are ignored
Column Name,Path
*,*#
Order Id,//shiporder/@orderid
orderperson,//shiporder/orderperson#
The path will change if you specify a different collection subtree, and/or root. If you are having difficulty working out the correct path, execute the node and take a look in the Column Paths output port to see what the paths are with the current configuration.
v 1.0 (Prototype) @takbb Brian Bates
This is a fully functioning prototype, but may well be suitable for your needs. If you wish to use it, please test it with your data to see that it works well for you before relying on it!
Please provide feedback on any issues found, or any suggestions for improvement, or usability.
To use this component in KNIME, download it from the below URL and open it in KNIME:
Download ComponentDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.