Application Servers

Your application's backend code and Docker containers run on sherpa.sh's application servers. These servers sit behind our CDN and load balancer within our Kubernetes cluster in your selected region, providing a robust execution environment for your applications.

How It Works

When you deploy your application, sherpa.sh runs your backend code or Docker containers on our application servers. These servers handle all the compute-intensive work while our CDN and load balancer manage traffic distribution and static content delivery.

Architecture Flow:

  1. User request → CDN (static files) or Load Balancer (dynamic content)

  2. Load Balancer → Application Server instances

  3. Application Servers → Your backend code/containers

  4. Response flows back through the same path

For detailed architecture information, see our Architecture Overview page.

Default Resource Configuration

Every application server gets deployed as a swarm of docker containers behind a loadbalancer. There is a maximum resource allocation per individual container.

Resource Allocation

# Per instance limits
CPU: 1 core maximum
Memory: 1GB maximum

Auto-scaling Behavior

The amount of replicas of your containers that get created depends on your plan selected.

# Hobby and Starter Plan Settings
Minimum instances: 1
Maximum instances: 5
CPU scaling threshold: 80%

How scaling works: When your backend code uses more than 80% CPU across instances, new application servers automatically spin up to handle the load.

If you need more instance replicas or different CPU thresholdes, reach out to support in discord.sherpa.sh.

Infrastructure Types

Shared Application Servers (Default)

What you get: Your containers/code run on shared infrastructure alongside other applications.

Benefits:

  • Zero configuration: Deploy immediately

  • Automatic load distribution: Traffic spreads across instances

  • Cost-effective: Shared infrastructure costs

  • Built-in monitoring: Performance metrics included

Limitations:

  • Ephemeral storage: No persistent file writes

  • Shared resources: CPU/memory shared with other applications

  • Standard resource limits: Fixed allocation per instance

Best for: Stateless APIs, web applications, microservices

Dedicated Application Servers

What you get: Exclusive physical servers running only your application containers.

Available Configurations:

  • Compute: 2-96 CPU cores, 4-256GB memory

  • Storage: 80GB-300TB persistent disk

  • Network: 1-10Gbps dedicated bandwidth

  • Transfer: 20TB monthly included

Benefits:

  • Guaranteed performance: No resource contention

  • Persistent storage: File system writes supported

  • Custom sizing: Tailored to your workload

  • Isolation: Enhanced security and performance

Best for: Databases, file processing, high-traffic applications

Regional Deployment

Your application servers run in the region you select during deployment. View our available regions.

Benefits of regional deployment:

  • Reduced latency: Servers closer to your users

  • Data compliance: Meet regional data requirements

  • Improved performance: Faster database connections

Managing Application Servers

Viewing Server Status

  1. Navigate to your application dashboard

  2. Go to Resources > Application Server

  3. Monitor instance count, CPU usage, and memory consumption

Requesting Dedicated Servers

  1. Create support ticket with requirements:

    • Expected traffic volume

    • Resource requirements (CPU/memory)

    • Storage needs

    • Performance requirements

Best Practices

Stateless Design

Design your application to work seamlessly across multiple server instances by avoiding in-memory state storage.

Good: External State Management

// pages/api/users/[id].js
import { getUser } from '../../../lib/database';

export default async function handler(req, res) {
  const { id } = req.query;
  
  // Fetch from external database, not server memory
  const user = await getUser(id);
  
  if (!user) {
    return res.status(404).json({ error: 'User not found' });
  }
  
  res.status(200).json(user);
}

Avoid: In-Memory State

// Don't do this - data lost during scaling
let userCache = {}; // Lost when new instances start

export default async function handler(req, res) {
  const { id } = req.query;
  
  if (!userCache[id]) {
    userCache[id] = await getUser(id); // Won't persist across instances
  }
  
  res.status(200).json(userCache[id]);
}

Health Check Implementation

Sherpa.sh automatically checks your application health by requesting the root URL (/). Ensure this endpoint returns a valid response.

Required Health Check Setup

// pages/index.js or app/page.js (App Router)
export default function Home() {
  return (
    <div>
      <h1>Application Status: Healthy</h1>
      <p>Server is running normally</p>
    </div>
  );
}

// Or for API-only applications
// pages/index.js
export default function handler(req, res) {
  res.status(200).json({ 
    status: 'healthy',
    timestamp: new Date().toISOString(),
    version: process.env.npm_package_version || '1.0.0'
  });
}

Advanced Health Check with Dependencies

// pages/index.js
import { checkDatabase } from '../lib/database';
import { checkExternalAPI } from '../lib/external-services';

export default async function handler(req, res) {
  try {
    // Check critical dependencies
    await checkDatabase();
    await checkExternalAPI();
    
    res.status(200).json({
      status: 'healthy',
      checks: {
        database: 'connected',
        external_api: 'responding'
      },
      timestamp: new Date().toISOString()
    });
  } catch (error) {
    res.status(503).json({
      status: 'unhealthy',
      error: error.message,
      timestamp: new Date().toISOString()
    });
  }
}

Efficient Resource Usage

Optimize your application for auto-scaling by implementing efficient async patterns and resource management.

Database Connection Management

// lib/database.js
import { Pool } from 'pg';

// Use connection pooling for database efficiency
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20, // Maximum connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

export async function queryDatabase(query, params) {
  const client = await pool.connect();
  try {
    const result = await client.query(query, params);
    return result.rows;
  } finally {
    client.release(); // Always release connection
  }
}

Async API Route Optimization

// pages/api/products/search.js
export default async function handler(req, res) {
  const { query, category } = req.query;
  
  try {
    // Run parallel requests for better performance
    const [products, categories, recommendations] = await Promise.all([
      searchProducts(query),
      getCategories(category),
      getRecommendations(query)
    ]);
    
    res.status(200).json({
      products,
      categories,
      recommendations
    });
  } catch (error) {
    console.error('Search error:', error);
    res.status(500).json({ error: 'Search failed' });
  }
}

Monitoring and Optimization

Performance Metrics

  • CPU utilization: Monitor for scaling triggers

  • Memory usage: Track memory leaks and optimization opportunities

  • Response time: Measure application performance

  • Instance count: Understand scaling patterns

Optimization Strategies

  • Database optimization: Use connection pooling and query optimization

  • Caching: Implement Redis or in-memory caching

  • Async processing: Use background tasks for heavy operations

  • Resource monitoring: Set up alerts for resource usage

Next Steps

  • Review architecture: Read our Architecture Overview for complete system understanding

  • Set up monitoring: Configure application performance monitoring

  • Implement caching: Add Redis or similar caching layer

  • Plan scaling: Consider dedicated servers for growing applications

  • Optimize performance: Profile your application for bottlenecks

Need help optimizing your application server setup? Our support team provides detailed performance analysis and recommendations for your specific use case.

Last updated