Datalchemy
Contact us
Open main menu
Our training courses
Our publications
Our team
Contact us
Français
Datalchemy
Close menu
Our training courses
Our publications
Our team
English
Français
Project : Automatic classification of unstructured documents
Directed Acyclic Graph (DAG)
LLM
LLM Grammar
React
sharepoint
Vinci Energies
Energy
AI prototype development
Context
Within its multi-company organization, VINCI Energies uses SharePoint to centralize all documents relating to a given project (e-mails, invoices, reports, etc.).
While some files are imported automatically from the accounting software, others have to be added manually by the people responsible for the files, which slows down the process.
Need
Automatically suggest, via an AI solution, the appropriate SharePoint folder number and location for each document to speed up filing.
First launch a proof of concept (POC) before considering large-scale industrialization.
Completed work
Data architecture:
implementation of a minimal Directed Acyclic Graph (DAG) to generate various representations and statistical analyses of data.
Existing baseline:
integration and evaluation of the classification solution initially developed by VINCI Energies, by studying its behavior according to data variations.
Iterative business modeling:
progressive definition of the final schema of information to be extracted (customer IDs, dates, addresses, telephones, etc.).
LLM pipelines & “grammars”:
creation of transformation workflows based on Large Language Models, designed to give business meaning to unstructured documents.
Optimization via Optuna:
optimization of the weighting of the various extracted information to improve classification results.
Interactive demonstrator:
development of a simple interface to visualize and test classification results in real time.
Results
First correct suggestion:
76% of files are automatically attached to the right folder in the first suggestion.
Top 5:
the right file is in the top 5 for 93% of files.
Industrialization:
VINCI Energies teams are very satisfied with the project, which is now being industrialized.
Vinci Energies
Energy
AI prototype development
They trust us