Context
The Direction de l’information légale et administrative (DILA), a department of the French Prime Minister and publisher of Legifrance, is responsible for the daily publication and consolidation of all French legislative texts. To lighten the manual workload of legal experts, the project aims to automate this legal consolidation process by exploiting the initial, modifying and consolidated texts, distributed across three major databases, through a proof of concept (POC).
The schedule is very tight, with a 4-month deadline for the project.
Work performed
The POC was divided into six parts:
- Data mapping: inventory at different levels (text, article, paragraph).
- Dataset constitution: generation of differentials, annotation and structuring of pairs (modifying item / original / consolidated).
- Data architecture: implementation of versioning, development of statistical monitoring and visualization tools.
- Flow automation: definition and coding of actions, development of consolidation automaton.
- Parsing networks: design and training of models for micro- and macro-parsing, evaluation of modification localization and textual refinement, then final integration into the test tool.
- Automatic and manual testing of results by an experienced legal expert
Results
The POC efficiently consolidates common cases (word replacements, paragraph additions/deletions, article creations, etc.), with particularly high-performance parsing, while offering a modular architecture that can be tested on the entire dataset. This structure guarantees controlled industrialization: each component can evolve independently, the technologies employed (Python, PyTorch, Flask…) are open source and proven, and the limited use of deep learning for parsing minimizes the risk of regression. The tool is therefore ready to be transformed into an operational solution for DILA.