Automated Model Training Pipeline v2.0

About the project:

In the original project scenario, our data scientists had to undertake a laborious and time-consuming multi-step process before they could commence actual model training. Firstly, they needed to create features from relatively large data sets for specific time windows. Our supportive engineering team helped to implement a process to manage this task utilizing EMR on EKS. Once the features were prepared, the data scientists then needed to train various models and select the optimal one for deployment. Upon experimentation, I noticed that certain model architectures consistently performed well with our data and seemed to generalize well across different scenarios, so I constructed some "champion" model architectures using deductive reasoning. Recognizing the tediousness and inefficiency of the existing process, I decided to introduce automation into the system to streamline it. Although there is no frontend at the time, the provided logs illustrate the operations of the service. The automated service triggers the data/feature creation process. Once the Metaflow DAG completes and the data has been prepared, the experiment is automatically initiated in MLFlow. This process triggers multiple model training runs, each with distinct model configurations based on the successful architectures I'd identified earlier. The result is a significantly accelerated process, with candidate models available for review and deployment within hours instead of days or weeks. This automation will greatly improve our productivity and efficiency in training and deploying machine learning models.


Technology used:

Python, Kubernetes API, MLflow, MetaFlow, AWS, S3