{"id":4540,"date":"2025-05-16T17:17:19","date_gmt":"2025-05-16T15:17:19","guid":{"rendered":"https:\/\/datalchemy.net\/nos-prestations\/automatic-classification-of-unstructured-documents\/"},"modified":"2025-08-19T10:18:41","modified_gmt":"2025-08-19T08:18:41","slug":"automatic-classification-of-unstructured-documents","status":"publish","type":"projet","link":"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/","title":{"rendered":"Automatic classification of unstructured documents"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><strong>Context<\/strong><\/h2>\n\n<ul class=\"wp-block-list\">\n<li>Within its multi-company organization, VINCI Energies uses SharePoint to centralize all documents relating to a given project (e-mails, invoices, reports, etc.).<\/li>\n\n\n\n<li>While some files are imported automatically from the accounting software, others have to be added manually by the people responsible for the files, which slows down the process.<\/li>\n<\/ul>\n\n<h2 class=\"wp-block-heading\"><strong>Need<\/strong><\/h2>\n\n<ul class=\"wp-block-list\">\n<li>Automatically suggest, via an AI solution, the appropriate SharePoint folder number and location for each document to speed up filing.<\/li>\n\n\n\n<li>First launch a proof of concept (POC) before considering large-scale industrialization.<\/li>\n<\/ul>\n\n<h2 class=\"wp-block-heading\"><strong>Completed work<\/strong><\/h2>\n\n<ul class=\"wp-block-list\">\n<li><strong>Data architecture:<\/strong> implementation of a minimal Directed Acyclic Graph (DAG) to generate various representations and statistical analyses of data.<\/li>\n\n\n\n<li><strong>Existing baseline:<\/strong> integration and evaluation of the classification solution initially developed by VINCI Energies, by studying its behavior according to data variations.<\/li>\n\n\n\n<li><strong>Iterative business modeling:<\/strong> progressive definition of the final schema of information to be extracted (customer IDs, dates, addresses, telephones, etc.).<\/li>\n\n\n\n<li><strong>LLM pipelines &amp; &#8220;grammars&#8221;:<\/strong> creation of transformation workflows based on Large Language Models, designed to give business meaning to unstructured documents.<\/li>\n\n\n\n<li><strong>Optimization via Optuna: <\/strong>optimization of the weighting of the various extracted information to improve classification results.  <\/li>\n\n\n\n<li><strong>Interactive demonstrator:<\/strong> development of a simple interface to visualize and test classification results in real time.<\/li>\n<\/ul>\n\n<h2 class=\"wp-block-heading\"><strong>Results<\/strong><\/h2>\n\n<ul class=\"wp-block-list\">\n<li><strong>First correct suggestion:<\/strong> 76% of files are automatically attached to the right folder in the first suggestion.<\/li>\n\n\n\n<li><strong>Top 5:<\/strong> the right file is in the top 5 for 93% of files.<\/li>\n\n\n\n<li><strong>Industrialization: <\/strong>VINCI Energies teams are very satisfied with the project, which is now being industrialized.<\/li>\n<\/ul>\n\n<p><\/p>\n","protected":false},"featured_media":3919,"template":"","meta":{"_acf_changed":false,"inline_featured_image":false},"tags":[180,194,215,181,220],"class_list":["post-4540","projet","type-projet","status-publish","has-post-thumbnail","hentry","tag-directed-acyclic-graph-dag-en","tag-llm-en","tag-llm-grammar-en","tag-react-en","tag-sharepoint-en"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Automatic classification of unstructured documents - Datalchemy<\/title>\n<meta name=\"description\" content=\"Development of a tool to automatically determine which folder an incoming document belongs to.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Automatic classification of unstructured documents - Datalchemy\" \/>\n<meta property=\"og:description\" content=\"Development of a tool to automatically determine which folder an incoming document belongs to.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/\" \/>\n<meta property=\"og:site_name\" content=\"Datalchemy\" \/>\n<meta property=\"article:modified_time\" content=\"2025-08-19T08:18:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/05\/projet-vinci-energies.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/\",\"url\":\"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/\",\"name\":\"Automatic classification of unstructured documents - Datalchemy\",\"isPartOf\":{\"@id\":\"https:\/\/datalchemy.net\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/05\/projet-vinci-energies.png\",\"datePublished\":\"2025-05-16T15:17:19+00:00\",\"dateModified\":\"2025-08-19T08:18:41+00:00\",\"description\":\"Development of a tool to automatically determine which folder an incoming document belongs to.\",\"breadcrumb\":{\"@id\":\"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/#primaryimage\",\"url\":\"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/05\/projet-vinci-energies.png\",\"contentUrl\":\"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/05\/projet-vinci-energies.png\",\"width\":1024,\"height\":1024,\"caption\":\"Classification automatique de documents\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/datalchemy.net\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Projects\",\"item\":\"https:\/\/datalchemy.net\/en\/our-services\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Automatic classification of unstructured documents\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/datalchemy.net\/en\/#website\",\"url\":\"https:\/\/datalchemy.net\/en\/\",\"name\":\"Datalchemy\",\"description\":\"Expertise, accompagnement  et R&amp;D en data et IA\",\"publisher\":{\"@id\":\"https:\/\/datalchemy.net\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/datalchemy.net\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/datalchemy.net\/en\/#organization\",\"name\":\"Datalchemy\",\"url\":\"https:\/\/datalchemy.net\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/datalchemy.net\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/08\/logo-datalchemy.gif\",\"contentUrl\":\"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/08\/logo-datalchemy.gif\",\"width\":696,\"height\":696,\"caption\":\"Datalchemy\"},\"image\":{\"@id\":\"https:\/\/datalchemy.net\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.linkedin.com\/company\/sas-datalchemy\/\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Automatic classification of unstructured documents - Datalchemy","description":"Development of a tool to automatically determine which folder an incoming document belongs to.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/","og_locale":"en_US","og_type":"article","og_title":"Automatic classification of unstructured documents - Datalchemy","og_description":"Development of a tool to automatically determine which folder an incoming document belongs to.","og_url":"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/","og_site_name":"Datalchemy","article_modified_time":"2025-08-19T08:18:41+00:00","og_image":[{"width":1024,"height":1024,"url":"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/05\/projet-vinci-energies.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/","url":"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/","name":"Automatic classification of unstructured documents - Datalchemy","isPartOf":{"@id":"https:\/\/datalchemy.net\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/#primaryimage"},"image":{"@id":"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/#primaryimage"},"thumbnailUrl":"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/05\/projet-vinci-energies.png","datePublished":"2025-05-16T15:17:19+00:00","dateModified":"2025-08-19T08:18:41+00:00","description":"Development of a tool to automatically determine which folder an incoming document belongs to.","breadcrumb":{"@id":"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/#primaryimage","url":"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/05\/projet-vinci-energies.png","contentUrl":"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/05\/projet-vinci-energies.png","width":1024,"height":1024,"caption":"Classification automatique de documents"},{"@type":"BreadcrumbList","@id":"https:\/\/datalchemy.net\/en\/our-services\/automatic-classification-of-unstructured-documents\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/datalchemy.net\/en\/"},{"@type":"ListItem","position":2,"name":"Projects","item":"https:\/\/datalchemy.net\/en\/our-services\/"},{"@type":"ListItem","position":3,"name":"Automatic classification of unstructured documents"}]},{"@type":"WebSite","@id":"https:\/\/datalchemy.net\/en\/#website","url":"https:\/\/datalchemy.net\/en\/","name":"Datalchemy","description":"Expertise, accompagnement  et R&amp;D en data et IA","publisher":{"@id":"https:\/\/datalchemy.net\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/datalchemy.net\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/datalchemy.net\/en\/#organization","name":"Datalchemy","url":"https:\/\/datalchemy.net\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datalchemy.net\/en\/#\/schema\/logo\/image\/","url":"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/08\/logo-datalchemy.gif","contentUrl":"https:\/\/datalchemy.net\/wp-content\/uploads\/2025\/08\/logo-datalchemy.gif","width":696,"height":696,"caption":"Datalchemy"},"image":{"@id":"https:\/\/datalchemy.net\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.linkedin.com\/company\/sas-datalchemy\/"]}]}},"_links":{"self":[{"href":"https:\/\/datalchemy.net\/en\/wp-json\/wp\/v2\/projet\/4540","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datalchemy.net\/en\/wp-json\/wp\/v2\/projet"}],"about":[{"href":"https:\/\/datalchemy.net\/en\/wp-json\/wp\/v2\/types\/projet"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/datalchemy.net\/en\/wp-json\/wp\/v2\/media\/3919"}],"wp:attachment":[{"href":"https:\/\/datalchemy.net\/en\/wp-json\/wp\/v2\/media?parent=4540"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datalchemy.net\/en\/wp-json\/wp\/v2\/tags?post=4540"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}