the attacks were already happening

so here's something nobody tells you when you're building auth systems — you don't get to decide when to care about security. the attackers decide for you. at Upwork we started noticing patterns in our logs that were... not great. credential stuffing attacks hitting the login endpoint thousands of times per minute. bots cycling through leaked email/password combos from other breaches. session hijacking attempts where someone would grab a token and replay it from a different continent 30 seconds later.

the thing that really got my attention was the volume. we weren't talking about some kid with a script. these were distributed attacks coming from rotating IP pools, each individual IP making just enough requests to look plausible. our existing protections — basic per-IP rate limits — were basically useless against this.

token bucket rate limiting

the first thing we needed was smarter rate limiting. the naive approach is "N requests per minute per IP" but that falls apart when attackers rotate IPs. what you actually want is rate limiting per user identity, and you want it to be bursty-tolerant. a real user might click a button three times fast because the UI lagged. you don't want to lock them out for that.

enter the token bucket algorithm. each user gets a bucket that holds a fixed number of tokens. every request consumes one token. tokens refill at a constant rate. so you can handle short bursts (the bucket has capacity) but sustained abuse drains the bucket dry.

class TokenBucket {
  private tokens: number
  private lastRefill: number
 
  constructor(
    private capacity: number,
    private refillRate: number // tokens per second
  ) {
    this.tokens = capacity
    this.lastRefill = Date.now()
  }
 
  private refill(): void {
    const now = Date.now()
    const elapsed = (now - this.lastRefill) / 1000
    this.tokens = Math.min(
      this.capacity,
      this.tokens + elapsed * this.refillRate
    )
    this.lastRefill = now
  }
 
  consume(): boolean {
    this.refill()
    if (this.tokens > 0) {
      this.tokens--
      return true
    }
    return false
  }
}

we set the capacity to 20 and the refill rate to 2 tokens per second for general API endpoints. so a user can burst up to 20 requests instantly, but sustained throughput is capped at 2/sec. for the login endpoint specifically we went way more aggressive — capacity of 5, refill rate of 0.1 (one token every 10 seconds).

token buckets are great for general rate limiting but for login attempts we wanted something with more memory. a sliding window counter that tracks failed attempts over the last 15 minutes. after 5 failures in that window, you're locked out for increasing durations.

the Go implementation on the backend looked something like this:

type SlidingWindow struct {
    mu       sync.Mutex
    attempts []time.Time
    window   time.Duration
    limit    int
}
 
func (sw *SlidingWindow) Allow() bool {
    sw.mu.Lock()
    defer sw.mu.Unlock()
 
    now := time.Now()
    cutoff := now.Add(-sw.window)
 
    // prune expired attempts
    valid := sw.attempts[:0]
    for _, t := range sw.attempts {
        if t.After(cutoff) {
            valid = append(valid, t)
        }
    }
    sw.attempts = valid
 
    if len(sw.attempts) >= sw.limit {
        return false
    }
    sw.attempts = append(sw.attempts, now)
    return true
}

the nice thing about sliding windows is they don't have the boundary problem that fixed windows have. with a fixed 15-minute window, an attacker can make 5 attempts at minute 14:59 and 5 more at minute 15:01 — effectively 10 attempts in 2 seconds. sliding windows don't have that gap.

auth hardening: tokens and sessions

rate limiting slows attackers down. but we also needed to make stolen credentials less useful. three changes made the biggest difference.

short-lived access tokens with refresh rotation. access tokens expire after 15 minutes. when you use a refresh token to get a new access token, the old refresh token is invalidated and you get a new one. this means a stolen refresh token can only be used once — the moment the real user or the attacker uses it, the other one's token is dead. and if we detect both trying to use the same refresh token, we invalidate the entire session.

CSRF protection on state-changing endpoints. every form submission and POST request requires a CSRF token tied to the user's session. this killed an entire class of attacks where someone would trick a logged-in user into submitting a form on our domain from a malicious page.

session invalidation on password change. this sounds obvious but it wasn't happening before. when a user changes their password, every active session except the current one gets terminated immediately. so if someone got into your account and you change your password, they're out. no lingering sessions.

what the numbers showed

after rolling all of this out over about three weeks, the results were pretty clear. unauthorized access attempts dropped by 80%. the credential stuffing attacks didn't stop — they never stop — but the success rate went from "occasionally works" to "basically never works." the sliding window on login attempts meant bots would get locked out after 5 tries and have to wait, which made the economics of the attack terrible for them.

the token rotation was probably the single most impactful change. before that, a stolen token was good for days. after, it was good for 15 minutes at best, and using the refresh token would immediately alert us to the compromise.

the lesson is that security isn't one big thing — it's a bunch of small things that compound. any one of these protections can be bypassed in isolation. but stacking them together creates enough friction that attackers move on to easier targets. you don't need to be unbreakable, you just need to be more annoying than the next guy.

the attacks were already happening

token bucket rate limiting

sliding window for login attempts

auth hardening: tokens and sessions

what the numbers showed