Multi-Profile Configuration
Use profiles for both multi-environment deployments and multi-pipeline configurations to reduce duplication and enable flexible ML workflows
Profile Strategy
ModelKnife profiles serve two key purposes: multi-environment deployment (dev, staging, prod) and multi-pipeline configuration (different ML workflows sharing common infrastructure). This allows you to reduce configuration duplication and maintain flexible, scalable ML systems.
Two Profile Use Cases
- Multi-Environment: Same pipeline across dev/staging/prod environments
- Multi-Pipeline: Different ML workflows sharing base configuration (e.g., user-interest, media-annotation, post-enrichment)
Multi-Environment
Same ML pipeline deployed across different environments with environment-specific configurations.
- Dev, staging, production environments
- Different resource sizes
- Environment-specific parameters
- Isolated infrastructure
Multi-Pipeline
Different ML workflows sharing common base configuration and infrastructure services.
- Shared executors and services
- Pipeline-specific modules
- Reduced configuration duplication
- Modular ML workflows
Hybrid Approach
Combine both strategies: multiple pipelines each deployed across multiple environments.
- Pipeline-specific profiles
- Environment-specific overrides
- Maximum flexibility
- Enterprise-scale ML systems
Profile Configuration Patterns
Multi-Environment Pattern
Same pipeline across different environments
Create base mlknife-compose.yaml
and environment-specific overrides:
# mlknife-compose.yaml - Base configuration
name: hermes-search-engine
author: david.liu
description: Search engine with user and post search capabilities
parameters:
env: dev # Will be overridden by profiles
base_path: s3://naoo-ai/
version: v1
data_path: ${parameters.base_path}${parameters.env}/hermes-search/${parameters.version}
executors:
glue_etl:
type: glue_etl
job_name: "ml-data-preprocessing"
runtime: python3.9
role: AWSGlueServiceRole
number_of_workers: 2 # Will be overridden for prod
worker_type: G.1X
services:
search_backend_service:
type: search_service
configuration:
service_name: "hermes-search-${parameters.env}"
instance_type: "t3.medium" # Will be overridden
search_api_gateway:
type: api_gateway
configuration:
api_name: "hermes-search-api-${parameters.env}"
stage_name: ${parameters.env}
modules:
build_search_index:
executor: ${executors.glue_etl}
entry_point: ./src/build_search_index.py
job_parameters:
output_path: ${parameters.data_path}/search_index/
Environment-Specific Overrides
Create production profile with optimized settings
Create mlknife-compose_prod.yaml
with production overrides:
# mlknife-compose_prod.yaml - Production overrides
name: hermes-search-engine-prod
author: david.liu
parameters:
env: prod # Override environment
# Disable development-only services
disabled_services:
- search_api_gateway # Use production API Gateway instead
# Production-specific services (if any)
services:
search_backend_service:
configuration:
instance_type: "m5.large" # Larger for production
instance_count: 3 # High availability
auto_scaling:
min_capacity: 2
max_capacity: 10
executors:
glue_etl:
number_of_workers: 10 # Scale up for production
worker_type: G.2X # More powerful workers
modules: {} # Inherit all modules from base
Multi-Environment Benefits
- Same pipeline logic across environments
- Environment-specific resource sizing
- Selective service enablement/disablement
- Consistent deployment patterns
Multi-Pipeline Pattern
Different ML workflows sharing base configuration
Create pipeline-specific profiles for different ML workflows:
# mlknife-compose_user-interest.yaml - User interest pipeline
name: user-interest
description: "Pipeline for user interest analysis and personalization"
parameters:
base_data_path: s3://naoo-ai/prod/feed_pixel
lookback_window: 3d
model_version: "1.0"
rec_result_base_path: ${parameters.base_data_path}/recommendation_results/${parameters.model_version}
# Inherit base executors and add pipeline-specific defaults
module_defaults:
executor: ${executors.glue_etl}
repository: ../modules/python_module
# Pipeline-specific modules
modules:
build_user_post_interaction_behaviors:
number_of_workers: 4
worker_type: G.2X
entry_point: ./src/glue_jobs/user_interest/build_user_post_interaction_behaviors.py
job_parameters:
lookback_window: ${parameters.lookback_window}
raw_user_event_path: s3://gaia-naoo-data/prod/cleaned/user_event_raw/
output_path: ${parameters.base_data_path}/user_post_interaction_behaviors/
depends_on: []
generate_user_interests_daily:
entry_point: ./src/glue_jobs/user_interest/generate_user_interests_daily.py
job_parameters:
max_interests: 10
long_term_lookback_window: 6m
output_path: ${parameters.base_data_path}/user_interest_daily/
depends_on:
- build_user_post_interaction_behaviors
Additional Pipeline Profiles
More specialized ML workflows
Create additional pipeline profiles for different ML use cases:
# mlknife-compose_media-annotation.yaml - Media annotation pipeline
name: media-annotation
description: "Pipeline for media content annotation and metadata extraction"
parameters:
base_data_path: s3://naoo-ai/prod/feed_pixel
batch_size: 100
model_endpoint: bedrock-claude-3
# Use different executor for media processing
module_defaults:
executor: ${executors.bedrock_batch_infer}
repository: ../modules/python_module
# Media-specific modules
modules:
extract_media_metadata:
entry_point: ./src/glue_jobs/media_annotation/extract_media_metadata.py
job_parameters:
input_path: ${parameters.base_data_path}/raw_media/
output_path: ${parameters.base_data_path}/media_metadata/
batch_size: ${parameters.batch_size}
depends_on: []
generate_media_annotations:
executor: ${executors.bedrock_batch_infer}
entry_point: ./src/bedrock_jobs/generate_media_annotations.py
job_parameters:
model_endpoint: ${parameters.model_endpoint}
input_path: ${parameters.base_data_path}/media_metadata/
output_path: ${parameters.base_data_path}/media_annotations/
depends_on:
- extract_media_metadata
Multi-Pipeline Benefits
- Share common executors and services across pipelines
- Each pipeline has focused, specific modules
- Reduce configuration duplication significantly
- Enable modular, composable ML architectures
- Independent deployment and scaling per use case
Profile Deployment Workflows
Multi-Environment Workflow
Deploy same pipeline across environments
# Deploy base configuration (dev by default)
mk s deploy
mk p deploy
# Deploy to production with profile override
mk s deploy -p prod
mk p deploy -p prod
# Run in production
mk p run -p prod
# Compare configurations
mk show -p prod # Show production config
mk show # Show base config
Multi-Pipeline Workflow
Deploy different pipeline variants
# Deploy base services (shared across pipelines)
mk s deploy
# Deploy user interest pipeline
mk p deploy -p user-interest
mk p run -p user-interest
# Deploy media annotation pipeline
mk p deploy -p media-annotation
mk p run -p media-annotation
# Deploy post enrichment pipeline
mk p deploy -p post-enrichment
mk p run -p post-enrichment
# Run pipelines on schedule
mk p schedule set -p user-interest --cron "0 2 * * *"
mk p schedule set -p media-annotation --cron "0 4 * * *"
Hybrid: Multiple Pipelines × Multiple Environments
Ultimate flexibility with both patterns combined
# Deploy base services to dev
mk s deploy
# Test user-interest pipeline in dev
mk p deploy -p user-interest
mk p run -p user-interest
# When ready, deploy to production with both profiles
# This would need: mlknife-compose_user-interest_prod.yaml
mk s deploy -p prod
mk p deploy -p user-interest-prod
mk p run -p user-interest-prod
# Monitor across environments and pipelines
mk p status -p user-interest # Dev environment
mk p status -p user-interest-prod # Prod environment
Profile Naming Convention
For hybrid scenarios, use descriptive profile names like:
mlknife-compose_user-interest.yaml
- Pipeline variantmlknife-compose_prod.yaml
- Environment variantmlknife-compose_user-interest-prod.yaml
- Both combined
Profile Management Commands
Profile Selection and Status
# Multi-environment profiles
mk s status -p prod # Production environment
mk p status -p prod # Check production pipeline status
# Multi-pipeline profiles
mk p status -p user-interest # User interest pipeline
mk p status -p media-annotation # Media annotation pipeline
mk show -p post-enrichment # Show post enrichment config
# Profile discovery
mk show # Shows base configuration
ls mlknife-compose*.yaml # List available profiles
# Set default profile for session
export MLKNIFE_PROFILE=user-interest
Profile Comparisons and Visualization
# Compare different pipelines
mk show -p user-interest --detailed
mk show -p media-annotation --detailed
# Visualize pipeline differences
mk p visualize -p user-interest
mk p visualize -p post-enrichment
# Export configurations for comparison
mk show -p prod --json > prod-config.json
mk show -p user-interest --json > user-interest-config.json
# Check which modules differ between profiles
mk show -p user-interest | grep modules
mk show | grep modules # Base configuration
Pipeline Orchestration
# Schedule multiple pipelines with dependencies
mk p schedule set -p user-interest --cron "0 1 * * *"
mk p schedule set -p post-enrichment --cron "0 3 * * *" # After user-interest
mk p schedule set -p media-annotation --cron "0 5 * * *"
# Monitor pipeline runs across profiles
mk p runs -p user-interest --limit 5
mk p runs -p media-annotation --limit 5
# Run pipelines with shared data dependencies
# User interest generates data used by post enrichment
mk p run -p user-interest
# Wait for completion, then:
mk p run -p post-enrichment
# Check pipeline dependencies
mk p visualize -p post-enrichment # Shows dependency graph
Profile Best Practices
Multi-Environment Best Practices
- Environment Isolation: Use separate AWS accounts or strict IAM policies
- Resource Naming: Include environment in resource names (e.g.,
hermes-search-prod
) - Configuration Validation: Always run
mk p deploy --dry-run
before production - Graduated Deployment: Test in dev → staging → production progression
- Environment Parity: Keep staging as close to production as possible
Multi-Pipeline Best Practices
- Shared Infrastructure: Define common executors and services in base configuration
- Pipeline Modularity: Each profile should represent a focused ML use case
- Data Dependencies: Document and manage data flows between pipelines
- Scheduling Coordination: Schedule dependent pipelines with appropriate delays
- Resource Optimization: Use appropriate executor types for each pipeline's needs
Configuration Management
- Version Control: Store all profile configurations in Git with descriptive names
- Profile Documentation: Comment each profile's purpose and key differences
- Parameter Consistency: Use consistent parameter naming across profiles
- Secrets Management: Never hardcode credentials; use AWS Parameter Store
- Regular Cleanup: Remove unused profile configurations to avoid confusion
Real-World Profile Examples
Common Profile Patterns
mlknife-compose.yaml
- Base configuration with shared servicesmlknife-compose_prod.yaml
- Production environment overridesmlknife-compose_user-interest.yaml
- User behavior analysis pipelinemlknife-compose_media-annotation.yaml
- Media content processing pipelinemlknife-compose_post-enrichment.yaml
- Content enrichment pipelinemlknife-compose_data-clean.yaml
- Data cleaning and validation pipeline