this post is mainly purposed for Continuous Delivery for Machine Learning on local laptop
relevant technologies
Mlflow
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. the following docker compse file will start three services:
Minio to simulate S3 storage
SQL to store mlflow data
Mlflow itself for both server and UI
#docker compose file to deploy mlflow
version: "3.7"
services:
s3:
image: minio/minio
container_name: s3
volumes:
- ./buckets:/data:consistent
expose:
- "9000"
ports:
- "9000:9000"
environment:
- MINIO_ACCESS_KEY=${MINIO_ACCESS_KEY}
- MINIO_SECRET_KEY=${MINIO_SECRET_KEY}
command: minio server /data
networks:
- mlflow
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
db:
image: mysql
container_name: db
command: --innodb_use_native_aio=0
networks:
- mlflow
environment:
- MYSQL_DATABASE=${MYSQL_DATABASE}
- MYSQL_USER=${MYSQL_USER}
- MYSQL_PASSWORD=${MYSQL_PASSWORD}
- MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD}
# volumes:
# - ./dbdata:/var/lib/mysql
mlflow:
restart: always
build: .
image: mlflow_server
container_name: mlflow_server
ports:
- "5000:5000"
networks:
- mlflow
environment:
- MLFLOW_S3_ENDPOINT_URL=http://s3:9000
- AWS_ACCESS_KEY_ID=${MINIO_ACCESS_KEY}
- AWS_SECRET_ACCESS_KEY=${MINIO_SECRET_KEY}
command: mlflow server --backend-store-uri mysql+pymysql://${MYSQL_USER}:${MYSQL_PASSWORD}@db:3306/${MYSQL_DATABASE} --default-artifact-root s3://mlflow/ --host 0.0.0.0
depends_on:
- db
networks:
mlflow:
you can login the mlflow_server container to start training script
seldon
Seldon provides out-of-the-box inference servers for libraries like sklearn and tensorflow start seldon with:
docker build -t seldon-app .
docker run -p 5001:5000 -it seldon-app # port 5001 in case of mlflow
#requirements.txt
sklearn
seldon-core
#model.py
import pickle
import sklearn
class Model(object):
def __init__(self):
print("Initialising")
with open('model.pkl', "rb") as f:
self.model = pickle.load(f)
def predict(self,X,features_names):
print("Predict called")
return self.model.predict(X)
#Dockerfile
FROM python:3.6
WORKDIR /app
COPY ./requirements.txt /app
RUN pip install -r requirements.txt
COPY . /app
EXPOSE 5000
ENV MODEL_NAME Model
ENV API_TYPE REST
ENV SERVICE_TYPE MODEL
ENV PERSISTANCE 0
CMD exec seldon-core-microservice ${MODEL_NAME} ${API_TYPE} --service-type ${SERVICE_TYPE}
Argo CD
automate model.pkl deployment on k8s
#install on k8s
kubectl create ns argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
Roboflow
a tool that simplifies data preparation and training process. Roboflow is free for small datasets. official webpage is here: https://public.roboflow.ai
features are:
- Convert VOC XML annotations to COCO JSON annotations (and vice versa)
- See if your labels are in-frame (and one-click correct them if they are not)
- Preprocess images: resizing, grayscale, auto-orientation, contrast adjustments
- Augment images to increase your training data: flip, rotate, brighten / darken, crop, shear, blur, and add random noise
- Generate annotation formats like TFRecords, CreateML and Turi Create, and custom YOLOv3 implementations (flat text files or Darknet)
- Version datasets and share them with your team
- Share datasets across your organization
- Easily use your data across models built in Tensorflow, PyTorch, fast.ai, Keras, and more.
- Obtain and share public datasets