Datalchemy
Contact us
Open main menu
Our training courses
Our publications
Our team
Contact us
Français
Datalchemy
Close menu
Our training courses
Our publications
Our team
English
Français
Project : Data architecture, monitoring and dashboards
Argo workflow
DVC
FastAPI
Kubernetes
postgreSQL
Python
s3
School 42
Teaching
Architecture and data analysis
Context
Data audits carried out upstream by Datalchemy validated the feasibility of exploiting École 42 data for analysis, reporting and, ultimately, machine learning.
The first step is to set up a data architecture that enables us to extract a clean, controlled subset of production, without impacting online systems, and to build up a usable history.
The priority need identified is the overhaul of campus dashboards (replacing existing Google reports), with an ambitious deadline set for February 2024.
Issues
Very large volumes (data collected on all campuses worldwide).
Limited history: backups overwrite previous data.
Lack of validation and control over archived data.
Project objectives
Design an automated architecture to extract data from multiple sources.
Propose a simple system for defining and launching new extractions.
Track the evolution of data over time (versioning, time stamping).
Implement control and validation mechanisms for extracted elements.
Facilitate feedback and management of errors detected during the pipeline.
Completed work
Setting up dedicated S3 buckets :
for dumps with automatic file rotation,
for workflow data.
Creation of DVC repositories in production (with management of alternative branches) for dataset versioning.
Deployment of a PostgreSQL database dedicated to monitoring extraction workflows.
Provisioning a Kubernetes cluster with
Isolated namespace
Pipeline orchestration via Argo Workflows and Cron.
Results
Operational infrastructure ready to feed new campus dashboards.
Reproducible and extensible pipeline for other use cases (reporting, ML).
Guaranteed data traceability and control thanks to DVC versioning and workflow logs.
A solid foundation for later deployment of advanced analysis and predictive models.
School 42
Teaching
Architecture and data analysis
They trust us