Rate Limiting and Auth Hardening in Production
Back to blog

Rate Limiting and Auth Hardening in Production

·Dan Castrillo

the attacks were already happening

at Upwork we started noticing patterns in our logs. credential stuffing attacks hitting the login endpoint thousands of times per minute. bots cycling through leaked email/password combos from other breaches. session hijacking attempts where someone would grab a token and replay it from a different continent 30 seconds later.

the volume stood out. these were distributed attacks coming from rotating IP pools, each individual IP making just enough requests to look plausible. our existing protections (basic per-IP rate limits) were useless against this.

token bucket rate limiting

we needed smarter rate limiting. the naive approach is "N requests per minute per IP" but that falls apart when attackers rotate IPs. rate limiting per user identity, with burst tolerance, works better. a real user might click a button three times fast because the UI lagged. locking them out for that is wrong.

the token bucket algorithm. each user gets a bucket that holds a fixed number of tokens. every request consumes one token. tokens refill at a constant rate. short bursts go through (the bucket has capacity) but sustained abuse drains the bucket dry.

class TokenBucket {
  private tokens: number
  private lastRefill: number
 
  constructor(
    private capacity: number,
    private refillRate: number // tokens per second
  ) {
    this.tokens = capacity
    this.lastRefill = Date.now()
  }
 
  private refill(): void {
    const now = Date.now()
    const elapsed = (now - this.lastRefill) / 1000
    this.tokens = Math.min(
      this.capacity,
      this.tokens + elapsed * this.refillRate
    )
    this.lastRefill = now
  }
 
  consume(): boolean {
    this.refill()
    if (this.tokens > 0) {
      this.tokens--
      return true
    }
    return false
  }
}

we set the capacity to 20 and the refill rate to 2 tokens per second for general API endpoints. so a user can burst up to 20 requests instantly, but sustained throughput is capped at 2/sec. for the login endpoint specifically we went way more aggressive: capacity of 5, refill rate of 0.1 (one token every 10 seconds).

sliding window for login attempts

token buckets work for general rate limiting but for login attempts we wanted something with more memory. a sliding window counter tracks failed attempts over the last 15 minutes. after 5 failures in that window, the account locks out for increasing durations.

the Go implementation on the backend:

type SlidingWindow struct {
    mu       sync.Mutex
    attempts []time.Time
    window   time.Duration
    limit    int
}
 
func (sw *SlidingWindow) Allow() bool {
    sw.mu.Lock()
    defer sw.mu.Unlock()
 
    now := time.Now()
    cutoff := now.Add(-sw.window)
 
    // prune expired attempts
    valid := sw.attempts[:0]
    for _, t := range sw.attempts {
        if t.After(cutoff) {
            valid = append(valid, t)
        }
    }
    sw.attempts = valid
 
    if len(sw.attempts) >= sw.limit {
        return false
    }
    sw.attempts = append(sw.attempts, now)
    return true
}

the nice thing about sliding windows is they don't have the boundary problem that fixed windows have. with a fixed 15-minute window, an attacker can make 5 attempts at minute 14:59 and 5 more at minute 15:01, effectively 10 attempts in 2 seconds. sliding windows don't have that gap.

auth hardening: tokens and sessions

rate limiting slows attackers down. we also needed to make stolen credentials less useful. three changes made the biggest difference.

short-lived access tokens with refresh rotation. access tokens expire after 15 minutes. when you use a refresh token to get a new access token, the old refresh token gets invalidated and you get a new one. a stolen refresh token can only be used once. the moment the real user or the attacker uses it, the other one's token is dead. if we detect both trying to use the same refresh token, we invalidate the entire session.

CSRF protection on state-changing endpoints. every form submission and POST request requires a CSRF token tied to the user's session. this killed an entire class of attacks where a malicious page tricks a logged-in user into submitting a form on our domain.

session invalidation on password change. this sounds obvious but it wasn't happening before. when a user changes their password, every active session except the current one gets terminated immediately. so if someone got into your account and you change your password, they're out. no lingering sessions.

what the numbers showed

after rolling all of this out over about three weeks, unauthorized access attempts dropped by 80%. the credential stuffing attacks didn't stop (they never stop) but the success rate went from "occasionally works" to "never works." the sliding window on login attempts locked bots out after 5 tries, which made the economics of the attack terrible for them.

token rotation was the single most impactful change. before that, a stolen token was good for days. after, it was good for 15 minutes at best, and using the refresh token would alert us to the compromise.

security isn't one big thing. it's a bunch of small things that compound. any one of these protections can be bypassed in isolation. stacking them creates enough friction that attackers move on to easier targets.

Related Posts