Context

  • Within its multi-company organization, VINCI Energies uses SharePoint to centralize all documents relating to a given project (e-mails, invoices, reports, etc.).
  • While some files are imported automatically from the accounting software, others have to be added manually by the people responsible for the files, which slows down the process.

Need

  • Automatically suggest, via an AI solution, the appropriate SharePoint folder number and location for each document to speed up filing.
  • First launch a proof of concept (POC) before considering large-scale industrialization.

Completed work

  • Data architecture: implementation of a minimal Directed Acyclic Graph (DAG) to generate various representations and statistical analyses of data.
  • Existing baseline: integration and evaluation of the classification solution initially developed by VINCI Energies, by studying its behavior according to data variations.
  • Iterative business modeling: progressive definition of the final schema of information to be extracted (customer IDs, dates, addresses, telephones, etc.).
  • LLM pipelines & “grammars”: creation of transformation workflows based on Large Language Models, designed to give business meaning to unstructured documents.
  • Optimization via Optuna: optimization of the weighting of the various extracted information to improve classification results.
  • Interactive demonstrator: development of a simple interface to visualize and test classification results in real time.

Results

  • First correct suggestion: 76% of files are automatically attached to the right folder in the first suggestion.
  • Top 5: the right file is in the top 5 for 93% of files.
  • Industrialization: VINCI Energies teams are very satisfied with the project, which is now being industrialized.