Extract Data from Bank Statements (PDF) into JSON files with the help of Ollama / Llama3 LLM
- list PDFs or other documents (csv, txt, log) from your drive that roughly have a similar layout and you expect an LLM to be able to extract data
- formulate a concise prompt (and instruction) and try to force the LLM to give back a JSON file with always the same structure (Mistral seems to be very good at that)
- convert the single document to a Vector Store either into Chroma or Meta's FAISS with the helop of Ollama and a suitable embedding model (mxbai-embed-large)
- Use Ollama wrapper (via Python and KNIME node) to put document and query before the LLM
- collect the data back from Python into KNIME
- extract the data from JSON files, either with the help of Regex or just convert the JSON with KKNIME nodes
- make sure they have the same structure
=> you need to have Python environment installed and Ollama and you need to have the models pulled locally and Ollama running!!!
If you experience problems with the model download: Check your Proxy settings and then kill all running Ollama jobs in your task manager and try again
------
Run in Terminal window to start Ollama. You can also try and use other models (https://ollama.com). You can also just pull the model
ollama pull llama3:instruct
ollama run llama3:instruct
To get the embedding model you run this command in the terminal window
ollama pull mxbai-embed-large
Ollama and Llama3 - A Streamlit App to convert your files into local Vector Stores and chat with them using the latest LLMs
https://medium.com/p/c5340fcd6ad0
Medium - Chat with local Llama3 Model via Ollama in KNIME Analytics Platform - Also extract Logs into structured JSON Files
https://medium.com/p/aca61e4a690a
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.