Common Issues & Solutions

Quick solutions to the most frequently encountered issues when working with ModelKnife

🔐 Permission Issues

Access Denied Errors

Problem: Getting "Access Denied" when running commands

Solution:

# Check your current access level
mk team status

# Verify you're in the correct IAM group
mk setup status

# If not in any group, ask admin to add you
mk team add --user YOUR_USERNAME --group mlknife-developers

⚙️ Configuration Issues

Configuration File Not Found

Problem: "mlknife-compose.yaml not found"

Solution:

# Check current directory
ls -la mlknife-compose.yaml

# Or specify config file explicitly
mk p deploy --config /path/to/mlknife-compose.yaml

# Check if using legacy filename
ls -la pipeline-compose.yaml

Invalid YAML Syntax

Problem: YAML parsing errors

Solution:

# Validate YAML syntax
mk fmt --validate

# Check for common issues:
# - Incorrect indentation (use spaces, not tabs)
# - Missing quotes around special characters
# - Unmatched brackets or parentheses

🚀 Deployment Issues

Pipeline Already Exists

Problem: "Pipeline with this name already exists"

Solution:

# Check existing deployments
mk p status

# Update existing pipeline instead of creating new
mk p deploy --update

# Or delete old pipeline first
mk p delete --confirm

Resource Limits Exceeded

Problem: AWS resource limits hit

Solution:

# Check current resource usage
aws service-quotas list-service-quotas --service-code sagemaker

# Request limit increase through AWS console
# Or use smaller instance types temporarily

💾 Data Issues

S3 Access Denied

Problem: Cannot read/write S3 data

Solution:

# Check S3 bucket permissions
aws s3 ls s3://your-bucket-name/

# Verify IAM role has S3 access
mk conf show --section iam

# Test with specific bucket
mk p validate --check-s3-access

Data Format Issues

Problem: Pipeline fails due to unexpected data format

Solution:

# Add data validation step
mk p deploy --enable-data-validation

# Check data sample
aws s3 cp s3://bucket/data/sample.json - | head -n 5

# Use schema validation in your code

🔧 Performance Issues

Out of Memory Errors

Problem: Processing jobs fail with OOM

Solution:

# Increase memory allocation
# In mlknife-compose.yaml:
executors:
  python_processor:
    instance_type: "ml.r5.large"  # Memory optimized
    volume_size_in_gb: 100        # More storage

🖥️ Cross-Platform Compatibility Issues

Platform-Specific Problems

ModelKnife works across Windows, macOS, and Linux, but each platform has unique challenges. Here are solutions for the most common cross-platform issues.

🪟 Windows-Specific Issues

Problem: "python is not recognized as an internal or external command"

Cause: Python not in system PATH or incorrect installation

Solutions:

# Check if Python is installed
py -3 --version
where python

# If Python is installed but not in PATH, add it manually
# Replace with your actual Python installation path
setx PATH "%PATH%;C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python311"
setx PATH "%PATH%;C:\Users\%USERNAME%\AppData\Local\Programs\Python\Python311\Scripts"

# Restart PowerShell and test
python --version
pip --version

Prevention: When installing Python, always check "Add Python to PATH" option

Problem: Docker volume mounting fails with Windows paths

Cause: Windows drive letters and backslashes incompatible with Docker

Solutions:

# Enable WSL 2 integration in Docker Desktop
# Settings → General → Use WSL 2 based engine

# Check WSL 2 status
wsl --list --verbose

# Update WSL 2 if needed
wsl --update

# Test Docker volume mounting
docker run --rm -v ${PWD}:/workspace ubuntu:latest ls /workspace

Note: ModelKnife automatically converts Windows paths for Docker compatibility

Problem: "Access denied" or permission errors

Cause: Windows UAC or antivirus blocking operations

Solutions:

# Run PowerShell as Administrator
# Right-click PowerShell → "Run as administrator"

# Use virtual environment to avoid system-wide installs
python -m venv mlknife-env
mlknife-env\Scripts\activate
pip install git+https://github.com/naoo-AI/modelknife.git

# Check antivirus exclusions for Python/Git directories

🍎 macOS-Specific Issues

Problem: "command not found" after Homebrew installation

Cause: Homebrew not in PATH, especially on Apple Silicon Macs

Solutions:

# Check current PATH
echo $PATH

# Apple Silicon Macs - add Homebrew to PATH
echo 'export PATH="/opt/homebrew/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

# Intel Macs - check /usr/local/bin is in PATH
echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

# Verify Homebrew installation
brew --version
which python3

Problem: "xcode-select: error: invalid developer directory"

Cause: Missing or corrupted Xcode Command Line Tools

Solutions:

# Install Xcode Command Line Tools
xcode-select --install

# If already installed but corrupted, reset
sudo xcode-select --reset
sudo xcode-select --install

# Accept Xcode license
sudo xcodebuild -license accept

# Verify Git works
git --version

Problem: Docker Desktop permission or resource issues

Cause: Insufficient permissions or system resources

Solutions:

# Grant Docker full disk access
# System Preferences → Security & Privacy → Privacy → Full Disk Access → Docker

# Increase Docker resources
# Docker Desktop → Preferences → Resources → Advanced
# Increase Memory to at least 4GB, CPU to 2+ cores

# Restart Docker Desktop
# Docker menu → Restart

🐧 Linux-Specific Issues

Problem: "Permission denied" when running docker commands

Cause: User not in docker group

Solutions:

# Check if user is in docker group
groups $USER

# Add user to docker group
sudo usermod -aG docker $USER

# Log out and back in for changes to take effect
# Or use newgrp to activate group in current session
newgrp docker

# Start Docker daemon if not running
sudo systemctl start docker
sudo systemctl enable docker

# Test Docker without sudo
docker run hello-world

Problem: "python: command not found" but python3 works

Cause: Linux distributions use python3 by default

Solutions:

# Always use python3 and pip3 on Linux
python3 --version
pip3 --version

# Create permanent aliases
echo 'alias python=python3' >> ~/.bashrc
echo 'alias pip=pip3' >> ~/.bashrc
source ~/.bashrc

# Or install python-is-python3 (Ubuntu/Debian)
sudo apt install python-is-python3

Problem: "externally-managed-environment" error during pip install

Cause: Modern Linux distributions protect system Python

Solutions:

# Use virtual environment (recommended)
python3 -m venv mlknife-env
source mlknife-env/bin/activate
pip install git+https://github.com/naoo-AI/modelknife.git

# Or install to user directory
pip3 install --user git+https://github.com/naoo-AI/modelknife.git

# Add user bin to PATH if using --user
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

🔧 Build Tool Issues Across Platforms

Problem: Go, Node.js, Maven, or other build tools not found

Cause: Platform-specific installation differences

Solutions by Platform:

Windows Build Tools
# Install via winget
winget install GoLang.Go
winget install OpenJS.NodeJS
winget install Microsoft.VisualStudio.2022.BuildTools

# Or via Chocolatey
choco install golang nodejs maven gradle

# Verify installations
go version
node --version
npm --version
macOS Build Tools
# Install via Homebrew
brew install go node maven gradle

# Verify installations
go version
node --version
mvn --version
Linux Build Tools
# Ubuntu/Debian
sudo apt install golang-go nodejs npm maven gradle

# RHEL/CentOS/Fedora
sudo dnf install golang nodejs npm maven gradle

# Verify installations
go version
node --version
mvn --version

🌐 Network and Proxy Issues

Problem: Downloads fail behind corporate firewall/proxy

Solutions:

# Configure pip proxy
pip install --proxy http://proxy.company.com:8080 git+https://github.com/naoo-AI/modelknife.git

# Configure git proxy
git config --global http.proxy http://proxy.company.com:8080
git config --global https.proxy https://proxy.company.com:8080

# Configure Docker proxy (add to ~/.docker/config.json)
{
  "proxies": {
    "default": {
      "httpProxy": "http://proxy.company.com:8080",
      "httpsProxy": "https://proxy.company.com:8080"
    }
  }
}

🛠️ General Troubleshooting Steps

  1. Check Status: mk p status
  2. View Pipeline Details: mk p show --detailed
  3. Check Configuration: mk conf show
  4. View Execution History: mk p runs
  5. Check Format: mk fmt --validate
  6. Enable Verbose Mode: export MLKNIFE_VERBOSE=1
  7. Check Platform Compatibility: mk setup status --verbose
  8. Redeploy Pipeline: mk p delete --confirm && mk p deploy

Platform-Specific Help

For detailed platform-specific installation and setup instructions, see: