Real-World Examples

Complete, working examples for common ML use cases and infrastructure patterns

E-commerce Recommendation Pipeline

Complete Pipeline Multi-Service Production-Ready

A complete end-to-end ML pipeline for product recommendations using collaborative filtering and content-based approaches. This example demonstrates multi-service orchestration across AWS Glue (ETL), SageMaker (training), and Spark (similarity processing).

AWS Glue ETL for data cleaning
SageMaker for model training
Spark for similarity computation
Multi-language support (Python/Scala)
Automatic dependency resolution
Step Functions orchestration

Pipeline Architecture

Data Cleaning
(Glue)
Feature Engineering
(SageMaker)
Similarity Search
(Spark)
Model Training
(SageMaker)

Configuration (mlknife-compose.yaml)

ecommerce-recommendation/mlknife-compose.yaml
name: ecommerce-recommendation-pipeline
author: ml-team
description: End-to-end product recommendation system

parameters:
  environment: dev
  data_path: s3://ml-pipeline-data/ecommerce/
  model_path: s3://ml-models/recommendation/

executors:
  glue_etl:
    type: glue_etl
    job_name: "ecommerce-data-preprocessing"
    runtime: python3.9
    role: AWSGlueServiceRole
    glue_version: "5.0"
    worker_type: G.2X
    number_of_workers: 5
    
  python_processor:
    type: sagemaker_processor
    class: sagemaker.sklearn.processing.SKLearnProcessor
    role: ${pipeline.role}
    instance_type: ml.c5.2xlarge
    framework_version: 1.2-1
    max_runtime_in_seconds: 7200
    
  spark_processor:
    type: sagemaker_processor
    class: sagemaker.spark.processing.PySparkProcessor
    role: ${pipeline.role}
    instance_type: ml.c5.xlarge
    instance_count: 2
    max_runtime_in_seconds: 7200

services:
  recommendation_endpoint:
    type: sagemaker_endpoint
    repository: ../services/
    configuration:
      endpoint_name: "ecommerce-recommendations-${parameters.environment}"
      model_name: "recommendation-model-v1"
      instance_type: ml.m5.large
      initial_instance_count: 2
    depends_on: []
    tags:
      service_type: sagemaker_endpoint

modules:
  data_cleaning:
    repository: ../modules
    executor: ${executors.glue_etl}
    entry_point: ./glue_jobs/clean_ecommerce_data.py
    description: "Clean and validate raw ecommerce data"
    job_parameters:
      input_path: ${parameters.data_path}raw/
      output_path: ${parameters.data_path}cleaned/
    depends_on: []
      
  feature_engineering:
    repository: ../modules
    executor: ${executors.python_processor}
    entry_point: ./jobs/build_features.py
    description: "Engineer features for model training"
    depends_on: [data_cleaning]
    job_parameters:
      input_path: ${parameters.data_path}cleaned/
      output_path: ${parameters.data_path}features/
      
  similarity_computation:
    executor: ${executors.spark_processor}
    entry_point: com.company.SimilarityJob
    build_command: "mvn clean package"
    depends_on: [feature_engineering]
    job_parameters:
      input_path: ${parameters.data_path}features/
      output_path: ${parameters.data_path}similarities/
      
  model_training:
    executor: ${executors.python_processor}
    entry_point: train_recommendation_model.py
    depends_on: [similarity_computation]
    job_parameters:
      features_path: ${parameters.data_path}features/
      similarity_path: ${parameters.data_path}similarities/
      model_path: ${parameters.model_path}
      recommendation_table: "${services.recommendation_table.outputs.table_name}"

Try This Example

cd examples/pipeline-examples/ecommerce-recommendation/config/
mk p deploy
mk p visualize

Located in: examples/pipeline-examples/ecommerce-recommendation/

View Source Code

Basic DynamoDB Setup

Infrastructure Beginner

A simple service deployment example showing DynamoDB table creation with automatic security configuration. Perfect for understanding the basics of ModelKnife service deployment.

DynamoDB table with GSI
Automatic IAM role creation
Environment-based naming
Simple configuration

Complete Configuration

basic-dynamodb/mlknife-compose.yaml
name: basic-dynamodb-example
author: team
description: Simple DynamoDB table deployment

parameters:
  environment: dev
  table_name: "user-profiles-${parameters.environment}"

services:
  user_profiles_table:
    type: dynamodb_table
    configuration:
      table_name: ${parameters.table_name}
      partition_key: "user_id"
      partition_key_type: "S"
      sort_key: "created_at"
      sort_key_type: "S"
      billing_mode: "PAY_PER_REQUEST"
      
      # Global Secondary Index
      global_secondary_indexes:
        - index_name: "email-index"
          partition_key: "email"
          partition_key_type: "S"
          projection_type: "ALL"
          
      # Tags for resource management
      tags:
        Environment: ${parameters.environment}
        Project: basic-example
        Owner: ml-team

Try This Example

cd examples/service-examples/basic-dynamodb/
mk s validate
mk s deploy
mk s status

Located in: examples/service-examples/basic-dynamodb/

View Source Code

More Examples

Additional examples and templates available in the repository

Bedrock Batch Inference

Batch AI

Large-scale batch inference using AWS Bedrock with managed processing and automatic scaling.

Bedrock S3 Step Functions
View Source

Social Media Platform

Complete App

Full social media application with user management, content storage, and real-time APIs.

DynamoDB Lambda API Gateway S3
View Source

Need a Custom Example?

If you have a specific use case that isn't covered by these examples, let us know! We're continuously adding new examples based on community needs.