Real-World Examples

Complete, working examples for common ML use cases and infrastructure patterns

E-commerce Recommendation Pipeline

Complete Pipeline Multi-Service Production-Ready

A complete end-to-end ML pipeline for product recommendations using collaborative filtering and content-based approaches. This example demonstrates multi-service orchestration across AWS Glue (ETL), SageMaker (training), and Spark (similarity processing).

AWS Glue ETL for data cleaning

SageMaker for model training

Spark for similarity computation

Multi-language support (Python/Scala)

Automatic dependency resolution

Step Functions orchestration

Pipeline Architecture

Data Cleaning
(Glue)

→

Feature Engineering
(SageMaker)

→

Similarity Search
(Spark)

→

Model Training
(SageMaker)

Configuration (mlknife-compose.yaml)

ecommerce-recommendation/mlknife-compose.yaml

name: ecommerce-recommendation-pipeline
author: ml-team
description: End-to-end product recommendation system

parameters:
  environment: dev
  data_path: s3://ml-pipeline-data/ecommerce/
  model_path: s3://ml-models/recommendation/

executors:
  glue_etl:
    type: glue_etl
    job_name: "ecommerce-data-preprocessing"
    runtime: python3.9
    role: AWSGlueServiceRole
    glue_version: "5.0"
    worker_type: G.2X
    number_of_workers: 5
    
  python_processor:
    type: sagemaker_processor
    class: sagemaker.sklearn.processing.SKLearnProcessor
    role: ${pipeline.role}
    instance_type: ml.c5.2xlarge
    framework_version: 1.2-1
    max_runtime_in_seconds: 7200
    
  spark_processor:
    type: sagemaker_processor
    class: sagemaker.spark.processing.PySparkProcessor
    role: ${pipeline.role}
    instance_type: ml.c5.xlarge
    instance_count: 2
    max_runtime_in_seconds: 7200

services:
  recommendation_endpoint:
    type: sagemaker_endpoint
    repository: ../services/
    configuration:
      endpoint_name: "ecommerce-recommendations-${parameters.environment}"
      model_name: "recommendation-model-v1"
      instance_type: ml.m5.large
      initial_instance_count: 2
    depends_on: []
    tags:
      service_type: sagemaker_endpoint

modules:
  data_cleaning:
    repository: ../modules
    executor: ${executors.glue_etl}
    entry_point: ./glue_jobs/clean_ecommerce_data.py
    description: "Clean and validate raw ecommerce data"
    job_parameters:
      input_path: ${parameters.data_path}raw/
      output_path: ${parameters.data_path}cleaned/
    depends_on: []
      
  feature_engineering:
    repository: ../modules
    executor: ${executors.python_processor}
    entry_point: ./jobs/build_features.py
    description: "Engineer features for model training"
    depends_on: [data_cleaning]
    job_parameters:
      input_path: ${parameters.data_path}cleaned/
      output_path: ${parameters.data_path}features/
      
  similarity_computation:
    executor: ${executors.spark_processor}
    entry_point: com.company.SimilarityJob
    build_command: "mvn clean package"
    depends_on: [feature_engineering]
    job_parameters:
      input_path: ${parameters.data_path}features/
      output_path: ${parameters.data_path}similarities/
      
  model_training:
    executor: ${executors.python_processor}
    entry_point: train_recommendation_model.py
    depends_on: [similarity_computation]
    job_parameters:
      features_path: ${parameters.data_path}features/
      similarity_path: ${parameters.data_path}similarities/
      model_path: ${parameters.model_path}
      recommendation_table: "${services.recommendation_table.outputs.table_name}"

Try This Example

cd examples/pipeline-examples/ecommerce-recommendation/config/
mk p deploy
mk p visualize

Located in: examples/pipeline-examples/ecommerce-recommendation/

View Source Code

Semantic Search Service

Infrastructure OpenSearch Bedrock

A complete search service with OpenSearch Serverless, Lambda APIs, and Bedrock embeddings (Amazon Titan). Supports hybrid search (vector + keyword) with automatic vectorization during indexing.

OpenSearch Serverless collection with vector search

Lambda functions for indexing and search

API Gateway proxy integration with CORS

Bedrock embeddings (Amazon Titan embed-text-v2)

Language support via selected embedding model

Infrastructure Architecture

API Gateway

→

Lambda APIs

→

OpenSearch Serverless

↕

Bedrock (Titan Embeddings)

Configuration Highlights

Key Service Configurations

services:
  search_backend_service:
    type: search_service
    configuration:
      service_name: "hermes-user-search-${parameters.version}-${parameters.env}"
      search_type: "hybrid_search"
      performance_tier: "balanced"
      indices:
        - name: "posts"
          fields:
            - name: "summary"
              type: "text"
              searchable: true
            - name: "embedding_en"
              type: "vector"
              dimensions: 1024
              similarity_function: "cosine"
      embedding_config:
        model_id: "cohere.embed-multilingual-v3"
        service: "bedrock"
        auto_vectorize: true
      languages: ["english", "german", "french", "italian", "spanish", "chinese", "turkish"]

  search_api:
    type: lambda_function
    configuration:
      function_name: "hermes-search-api-${parameters.version}-${parameters.env}"
      runtime: "python3.9"
      timeout: 60
      environment:
        OPENSEARCH_ENDPOINT: "${services.search_backend_service.outputs.search_endpoint}"
        BEDROCK_EMBED_MODEL_ID: "cohere.embed-multilingual-v3"

  search_api_gateway:
    type: api_gateway
    configuration:
      api_name: "hermes-search-api-gw-${parameters.version}-${parameters.env}"
      resources:
        - path: "posts/search"
          methods: ["GET", "POST", "OPTIONS"]
          integration:
            type: "AWS_PROXY"
            lambda_function: "hermes-search-api-${parameters.version}-${parameters.env}"

Try This Example

cd examples/service-examples/semantic-search-service/
mk s deploy
mk s status

Located in: examples/service-examples/semantic-search-service/

View Source Code

Basic DynamoDB Setup

Infrastructure Beginner

A simple service deployment example showing DynamoDB table creation with automatic security configuration. Perfect for understanding the basics of ModelKnife service deployment.

DynamoDB table with GSI

Automatic IAM role creation

Environment-based naming

Simple configuration

Complete Configuration

basic-dynamodb/mlknife-compose.yaml

name: basic-dynamodb-example
author: team
description: Simple DynamoDB table deployment

parameters:
  environment: dev
  table_name: "user-profiles-${parameters.environment}"

services:
  user_profiles_table:
    type: dynamodb_table
    configuration:
      table_name: ${parameters.table_name}
      partition_key: "user_id"
      partition_key_type: "S"
      sort_key: "created_at"
      sort_key_type: "S"
      billing_mode: "PAY_PER_REQUEST"
      
      # Global Secondary Index
      global_secondary_indexes:
        - index_name: "email-index"
          partition_key: "email"
          partition_key_type: "S"
          projection_type: "ALL"
          
      # Tags for resource management
      tags:
        Environment: ${parameters.environment}
        Project: basic-example
        Owner: ml-team

Try This Example

cd examples/service-examples/basic-dynamodb/
mk s validate
mk s deploy
mk s status

Located in: examples/service-examples/basic-dynamodb/

View Source Code

More Examples

Additional examples and templates available in the repository

Bedrock Batch Inference

Batch AI

Large-scale batch inference using AWS Bedrock with managed processing and automatic scaling.

Bedrock S3 Step Functions

View Source

Social Media Platform

Complete App

Full social media application with user management, content storage, and real-time APIs.

DynamoDB Lambda API Gateway S3

View Source

Need a Custom Example?

If you have a specific use case that isn't covered by these examples, let us know! We're continuously adding new examples based on community needs.