Multi-Profile Configuration

Use profiles for both multi-environment deployments and multi-pipeline configurations to reduce duplication and enable flexible ML workflows

Profile Strategy

ModelKnife profiles serve two key purposes: multi-environment deployment (dev, staging, prod) and multi-pipeline configuration (different ML workflows sharing common infrastructure). This allows you to reduce configuration duplication and maintain flexible, scalable ML systems.

Two Profile Use Cases

  • Multi-Environment: Same pipeline across dev/staging/prod environments
  • Multi-Pipeline: Different ML workflows sharing base configuration (e.g., user-interest, media-annotation, post-enrichment)

Multi-Environment

Same ML pipeline deployed across different environments with environment-specific configurations.

  • Dev, staging, production environments
  • Different resource sizes
  • Environment-specific parameters
  • Isolated infrastructure

Multi-Pipeline

Different ML workflows sharing common base configuration and infrastructure services.

  • Shared executors and services
  • Pipeline-specific modules
  • Reduced configuration duplication
  • Modular ML workflows

Hybrid Approach

Combine both strategies: multiple pipelines each deployed across multiple environments.

  • Pipeline-specific profiles
  • Environment-specific overrides
  • Maximum flexibility
  • Enterprise-scale ML systems

Profile Configuration Patterns

1

Multi-Environment Pattern

Same pipeline across different environments

Create base mlknife-compose.yaml and environment-specific overrides:

mlknife-compose.yaml - Base Environment Configuration
# mlknife-compose.yaml - Base configuration
name: hermes-search-engine
author: david.liu
description: Search engine with user and post search capabilities

parameters:
  env: dev  # Will be overridden by profiles
  base_path: s3://naoo-ai/
  version: v1
  data_path: ${parameters.base_path}${parameters.env}/hermes-search/${parameters.version}

executors:
  glue_etl:
    type: glue_etl
    job_name: "ml-data-preprocessing"
    runtime: python3.9
    role: AWSGlueServiceRole
    number_of_workers: 2  # Will be overridden for prod
    worker_type: G.1X

services:
  search_backend_service:
    type: search_service
    configuration:
      service_name: "hermes-search-${parameters.env}"
      instance_type: "t3.medium"  # Will be overridden
      
  search_api_gateway:
    type: api_gateway
    configuration:
      api_name: "hermes-search-api-${parameters.env}"
      stage_name: ${parameters.env}

modules:
  build_search_index:
    executor: ${executors.glue_etl}
    entry_point: ./src/build_search_index.py
    job_parameters:
      output_path: ${parameters.data_path}/search_index/
2

Environment-Specific Overrides

Create production profile with optimized settings

Create mlknife-compose_prod.yaml with production overrides:

mlknife-compose_prod.yaml - Production Overrides
# mlknife-compose_prod.yaml - Production overrides
name: hermes-search-engine-prod
author: david.liu

parameters:
  env: prod  # Override environment

# Disable development-only services
disabled_services:
  - search_api_gateway  # Use production API Gateway instead

# Production-specific services (if any)
services:
  search_backend_service:
    configuration:
      instance_type: "m5.large"  # Larger for production
      instance_count: 3  # High availability
      auto_scaling:
        min_capacity: 2
        max_capacity: 10

executors:
  glue_etl:
    number_of_workers: 10  # Scale up for production
    worker_type: G.2X  # More powerful workers

modules: {}  # Inherit all modules from base

Multi-Environment Benefits

  • Same pipeline logic across environments
  • Environment-specific resource sizing
  • Selective service enablement/disablement
  • Consistent deployment patterns
3

Multi-Pipeline Pattern

Different ML workflows sharing base configuration

Create pipeline-specific profiles for different ML workflows:

mlknife-compose_user-interest.yaml - User Interest Pipeline
# mlknife-compose_user-interest.yaml - User interest pipeline
name: user-interest
description: "Pipeline for user interest analysis and personalization"

parameters:
  base_data_path: s3://naoo-ai/prod/feed_pixel
  lookback_window: 3d
  model_version: "1.0"
  rec_result_base_path: ${parameters.base_data_path}/recommendation_results/${parameters.model_version}

# Inherit base executors and add pipeline-specific defaults
module_defaults:
  executor: ${executors.glue_etl}
  repository: ../modules/python_module

# Pipeline-specific modules
modules:
  build_user_post_interaction_behaviors:
    number_of_workers: 4
    worker_type: G.2X
    entry_point: ./src/glue_jobs/user_interest/build_user_post_interaction_behaviors.py
    job_parameters:
      lookback_window: ${parameters.lookback_window}
      raw_user_event_path: s3://gaia-naoo-data/prod/cleaned/user_event_raw/
      output_path: ${parameters.base_data_path}/user_post_interaction_behaviors/
    depends_on: []

  generate_user_interests_daily:
    entry_point: ./src/glue_jobs/user_interest/generate_user_interests_daily.py
    job_parameters:
      max_interests: 10
      long_term_lookback_window: 6m
      output_path: ${parameters.base_data_path}/user_interest_daily/
    depends_on:
      - build_user_post_interaction_behaviors
4

Additional Pipeline Profiles

More specialized ML workflows

Create additional pipeline profiles for different ML use cases:

mlknife-compose_media-annotation.yaml - Media Processing Pipeline
# mlknife-compose_media-annotation.yaml - Media annotation pipeline
name: media-annotation
description: "Pipeline for media content annotation and metadata extraction"

parameters:
  base_data_path: s3://naoo-ai/prod/feed_pixel
  batch_size: 100
  model_endpoint: bedrock-claude-3

# Use different executor for media processing
module_defaults:
  executor: ${executors.bedrock_batch_infer}
  repository: ../modules/python_module

# Media-specific modules
modules:
  extract_media_metadata:
    entry_point: ./src/glue_jobs/media_annotation/extract_media_metadata.py
    job_parameters:
      input_path: ${parameters.base_data_path}/raw_media/
      output_path: ${parameters.base_data_path}/media_metadata/
      batch_size: ${parameters.batch_size}
    depends_on: []

  generate_media_annotations:
    executor: ${executors.bedrock_batch_infer}
    entry_point: ./src/bedrock_jobs/generate_media_annotations.py
    job_parameters:
      model_endpoint: ${parameters.model_endpoint}
      input_path: ${parameters.base_data_path}/media_metadata/
      output_path: ${parameters.base_data_path}/media_annotations/
    depends_on:
      - extract_media_metadata

Multi-Pipeline Benefits

  • Share common executors and services across pipelines
  • Each pipeline has focused, specific modules
  • Reduce configuration duplication significantly
  • Enable modular, composable ML architectures
  • Independent deployment and scaling per use case

Profile Deployment Workflows

5

Multi-Environment Workflow

Deploy same pipeline across environments

Environment-Based Deployment
# Deploy base configuration (dev by default)
mk s deploy
mk p deploy

# Deploy to production with profile override
mk s deploy -p prod
mk p deploy -p prod

# Run in production
mk p run -p prod

# Compare configurations
mk show -p prod  # Show production config
mk show          # Show base config
6

Multi-Pipeline Workflow

Deploy different pipeline variants

Pipeline-Specific Deployment
# Deploy base services (shared across pipelines)
mk s deploy

# Deploy user interest pipeline
mk p deploy -p user-interest
mk p run -p user-interest

# Deploy media annotation pipeline
mk p deploy -p media-annotation 
mk p run -p media-annotation

# Deploy post enrichment pipeline
mk p deploy -p post-enrichment
mk p run -p post-enrichment

# Run pipelines on schedule
mk p schedule set -p user-interest --cron "0 2 * * *"
mk p schedule set -p media-annotation --cron "0 4 * * *"
7

Hybrid: Multiple Pipelines × Multiple Environments

Ultimate flexibility with both patterns combined

Complex Multi-Profile Deployment
# Deploy base services to dev
mk s deploy

# Test user-interest pipeline in dev
mk p deploy -p user-interest
mk p run -p user-interest

# When ready, deploy to production with both profiles
# This would need: mlknife-compose_user-interest_prod.yaml
mk s deploy -p prod
mk p deploy -p user-interest-prod
mk p run -p user-interest-prod

# Monitor across environments and pipelines
mk p status -p user-interest      # Dev environment
mk p status -p user-interest-prod # Prod environment

Profile Naming Convention

For hybrid scenarios, use descriptive profile names like:

  • mlknife-compose_user-interest.yaml - Pipeline variant
  • mlknife-compose_prod.yaml - Environment variant
  • mlknife-compose_user-interest-prod.yaml - Both combined

Profile Management Commands

Profile Selection and Status

Profile Selection Commands
# Multi-environment profiles
mk s status -p prod            # Production environment
mk p status -p prod            # Check production pipeline status

# Multi-pipeline profiles  
mk p status -p user-interest   # User interest pipeline
mk p status -p media-annotation # Media annotation pipeline
mk show -p post-enrichment     # Show post enrichment config

# Profile discovery
mk show                        # Shows base configuration
ls mlknife-compose*.yaml       # List available profiles

# Set default profile for session
export MLKNIFE_PROFILE=user-interest

Profile Comparisons and Visualization

Profile Comparison Commands
# Compare different pipelines
mk show -p user-interest --detailed
mk show -p media-annotation --detailed

# Visualize pipeline differences
mk p visualize -p user-interest
mk p visualize -p post-enrichment

# Export configurations for comparison
mk show -p prod --json > prod-config.json
mk show -p user-interest --json > user-interest-config.json

# Check which modules differ between profiles
mk show -p user-interest | grep modules
mk show | grep modules  # Base configuration

Pipeline Orchestration

Multi-Pipeline Orchestration
# Schedule multiple pipelines with dependencies
mk p schedule set -p user-interest --cron "0 1 * * *"
mk p schedule set -p post-enrichment --cron "0 3 * * *"  # After user-interest
mk p schedule set -p media-annotation --cron "0 5 * * *"

# Monitor pipeline runs across profiles
mk p runs -p user-interest --limit 5
mk p runs -p media-annotation --limit 5

# Run pipelines with shared data dependencies
# User interest generates data used by post enrichment
mk p run -p user-interest
# Wait for completion, then:
mk p run -p post-enrichment

# Check pipeline dependencies
mk p visualize -p post-enrichment  # Shows dependency graph

Profile Best Practices

Multi-Environment Best Practices

  • Environment Isolation: Use separate AWS accounts or strict IAM policies
  • Resource Naming: Include environment in resource names (e.g., hermes-search-prod)
  • Configuration Validation: Always run mk p deploy --dry-run before production
  • Graduated Deployment: Test in dev → staging → production progression
  • Environment Parity: Keep staging as close to production as possible

Multi-Pipeline Best Practices

  • Shared Infrastructure: Define common executors and services in base configuration
  • Pipeline Modularity: Each profile should represent a focused ML use case
  • Data Dependencies: Document and manage data flows between pipelines
  • Scheduling Coordination: Schedule dependent pipelines with appropriate delays
  • Resource Optimization: Use appropriate executor types for each pipeline's needs

Configuration Management

  • Version Control: Store all profile configurations in Git with descriptive names
  • Profile Documentation: Comment each profile's purpose and key differences
  • Parameter Consistency: Use consistent parameter naming across profiles
  • Secrets Management: Never hardcode credentials; use AWS Parameter Store
  • Regular Cleanup: Remove unused profile configurations to avoid confusion

Real-World Profile Examples

Common Profile Patterns

  • mlknife-compose.yaml - Base configuration with shared services
  • mlknife-compose_prod.yaml - Production environment overrides
  • mlknife-compose_user-interest.yaml - User behavior analysis pipeline
  • mlknife-compose_media-annotation.yaml - Media content processing pipeline
  • mlknife-compose_post-enrichment.yaml - Content enrichment pipeline
  • mlknife-compose_data-clean.yaml - Data cleaning and validation pipeline