Quick Start Guide
Get your ML pipeline running on AWS in under 10 minutes
Prerequisites
Before you begin, ensure you have:
- Python 3.8+ installed on your system
- AWS CLI installed and configured with your credentials
- AWS Account with appropriate permissions for SageMaker, Glue, DynamoDB, Lambda
- Git access to the ModelKnife repository (contact admin for access)
⚠️ Important: Make sure your AWS user has the necessary permissions for creating IAM roles, S3 buckets, and deploying AWS services. See the Installation Guide for detailed permission requirements.
Step 1: Install ModelKnife
Install ModelKnife (see the Installation Guide for OS‑specific steps and options):
Install
pip install git+ssh://git@github.com/naoo-AI/modelknife.git
Verify your CLI is available:
mk --help
Step 2: Team Setup (First Time)
If you're the first person setting up ModelKnife for your team, run the initialization command:
Complete Team Setup
mk setup init
This command creates:
- IAM configuration files with standardized role naming using account ID
-
AWS IAM groups (
mlknife-developers
,mlknife-admin
) - IAM roles for SageMaker, Glue, Step Functions, Lambda with consistent naming
- S3 buckets for artifacts and model storage
- EventBridge permissions for scheduling
- Adds you to the admin group
✅ Team Members: If someone has already run
mk setup init
, you automatically use shared IAM roles by name - no configuration import needed! Just ask your admin to add you to the
mlknife-developers
group:
mk team add-user --user YOUR_USERNAME --group
mlknife-developers
Details on admin vs developer setup are in Team Setup. Permission scope and least‑privilege roles are described in Installation ▸ AWS Permissions Setup.
Step 3: Verify Your Access
Check that everything is set up correctly:
Check Setup Status
mk setup status
mk team status
You should see your access level (Developer or Admin) and AWS configuration details.
Step 4: Create Your First Configuration
Create a complete ML configuration with both infrastructure services and ML pipelines. This demonstrates ModelKnife's dual approach: stable infrastructure (services) + iterative ML workflows (pipelines).
name: mk-iris-pipeline
author: your-name
description: Complete Iris example with services and pipelines
parameters:
environment: dev
base_path: ${env.BUCKET}/${env.PREFIX}
data_path: ${parameters.base_path}
# Named shared configs referenced explicitly
executors:
python_processor:
type: sagemaker_processor
class: sagemaker.sklearn.processing.SKLearnProcessor
instance_type: ml.c5.xlarge
framework_version: 0.23-1
# Services: Infrastructure for model serving (mk s deploy)
services:
# Lambda function for Iris model inference
iris-inference-lambda:
type: lambda_function
repository: ./lambda_functions
configuration:
function_name: iris-inference-${parameters.environment}
runtime: python3.9
entry_point: iris_inference.lambda_handler
code_path: iris_inference.py
build_layer: true
timeout: 30
memory_size: 256
environment:
MODEL_S3_PATH: ${parameters.data_path}/iris/model
tags:
service_type: model_inference
model_type: iris_classifier
modules:
prepare-iris:
executor: ${executors.python_processor}
repository: ./glue_jobs
entry_point: iris_preprocess.py
output_names: output_path
description: "Glue job: generate Iris CSV in S3"
job_parameters:
output_path: ${parameters.data_path}/iris/features
train-iris:
executor: ${executors.python_processor}
repository: ./sagemaker_scripts
entry_point: iris_train.py
input_names: features_path
output_names: model_output_path
job_parameters:
features_path: ${parameters.data_path}/iris/features
model_output_path: ${parameters.data_path}/iris/model
description: "SageMaker: train a simple classifier on Iris"
depends_on:
- prepare-iris
No explicit role needed
Processors that run on SageMaker require an execution role, but you don’t need to specify it in the YAML. ModelKnife automatically uses your account’s default SageMaker role configured via mk setup init
. If setup hasn’t been run, ask your admin to run it or run it yourself with appropriate permissions.
What This Configuration Does
The configuration demonstrates ModelKnife's dual lifecycle approach:
Services (Infrastructure - Deploy Once)
- iris-inference-lambda: A Lambda function for real-time model inference. It loads the trained model from S3 and serves predictions via API calls.
Pipelines (ML Workflows - Iterate Frequently)
- prepare-iris (Glue): Runs as an AWS Glue job that generates an Iris CSV at
${parameters.data_path}/iris/features
. - train-iris (SageMaker): Reads the CSV from
${parameters.data_path}/iris/features
, trains a multiclass logistic regression model, and savesiris_model.pkl
to${parameters.data_path}/iris/model
.
This shows the complete ML lifecycle: the pipeline modules handle batch training, while the service provides real-time inference. ModelKnife uses your account's default roles configured via mk setup init
, so no explicit role ARNs are required.
About the Iris Example
The Iris module is intentionally minimal: it downloads a small, well‑known dataset and writes it to S3. This lets you validate your environment (roles, S3 access, network) with the least friction before expanding your pipeline.
Download Starter Project
Prefer a ready‑to‑run template? Download the quickstart project (includes an Iris pipeline), unzip, set your S3 bucket variables, and deploy.
curl -L -o modelknife-quickstart.zip https://naoo-ai.github.io/modelknife/downloads/modelknife-quickstart.zip
unzip modelknife-quickstart.zip -d modelknife-quickstart
cd modelknife-quickstart
# Set your S3 bucket/prefix used by templates
export BUCKET=s3://your-bucket
export PREFIX=quickstart
# Iris example (no inputs required)
mk p deploy
mk p run
Or download directly: Download ZIP
Step 5: Deploy Infrastructure Services
Deploy the stable infrastructure services (Lambda function in this example):
Deploy Services
mk s deploy
This creates your Lambda function for real-time model inference. You only need to do this once per environment - services provide stable infrastructure.
Step 6: Deploy ML Pipeline
Deploy your ML processing modules:
Deploy Pipeline
mk p deploy
This creates your SageMaker processing jobs and Step Functions workflow.
Step 7: Run Your Pipeline
Execute your ML pipeline:
Run Pipeline
mk p run
Monitor the execution:
mk p status
mk p runs
Step 8: Test Your Inference Service
Once the model is trained, you can test the Lambda function:
Test Lambda Function
# Test the Lambda function with sample Iris data
aws lambda invoke \
--function-name iris-inference-dev \
--cli-binary-format raw-in-base64-out \
--payload '{"sepal_length":5.1,"sepal_width":3.5,"petal_length":1.4,"petal_width":0.2}' \
response.json
cat response.json
You should see a prediction like: {"prediction": "setosa", "confidence": 0.95}
Step 9: Visualize Your Workflow
Generate an interactive visualization of your pipeline:
Generate Visualization
mk p visualize
This opens an interactive DAG (Directed Acyclic Graph) in your browser showing your pipeline structure and dependencies.
Hands-on: Iris Quickstart
The quickest way to validate your setup is the Iris example. It requires no input data — the module downloads the dataset and writes outputs to S3.
- Download the template and set your S3 variables
curl -L -o modelknife-quickstart.zip https://naoo-ai.github.io/modelknife/downloads/modelknife-quickstart.zip
unzip modelknife-quickstart.zip
cd docs/website/examples/quickstart-template
export BUCKET=s3://your-bucket
export PREFIX=quickstart
- Deploy and run both Iris modules
mk p deploy
mk p run --modules prepare-iris,train-iris
- Monitor outputs
mk p status
mk p runs --limit 5
# Outputs:
# - Dataset: ${BUCKET}/${PREFIX}/iris/features/iris.csv
# - Model: ${BUCKET}/${PREFIX}/iris/model/iris_model.pkl
Common Commands
Here are the most frequently used ModelKnife commands:
Pipeline Management
# Deploy pipeline
mk p deploy
# Run entire pipeline
mk p run
# Run specific modules
mk p run --modules data-preprocessing,model-training
# Check status
mk p status
# View execution history
mk p runs
# Schedule pipeline
mk p schedule set --cron "0 9 * * 1-5" --timezone UTC
Service Management
# Deploy all services
mk s deploy
# Deploy specific service
mk s deploy --service feature_store
# Check service status
mk s status
# Validate configuration
mk s validate
Team & Configuration
# View configuration
mk conf show
# Check team status
mk team status
# List team members
mk team list-users
Next Steps
Now that you have ModelKnife running, explore these topics:
- Core Concepts - Understanding services, pipelines, modules, and executors
- Configuration Reference - Complete YAML configuration guide
- Examples - Real-world ML pipeline examples
- Team Setup - Advanced team management and access control
🎉 Congratulations! You've successfully deployed
your first ML pipeline with ModelKnife. You can now iterate on your
ML code by editing your processing scripts and running
mk p deploy
to update your pipeline.