1️⃣ Requirements

Functional Requirements

Core features:

πŸ‘‰ β€œUsers should be able to…”

  • Search for events
  • View event details
  • Book tickets

Non-Functional Requirements (WHERE MOST PEOPLE FAIL ⚠️)

❌ Wrong: Scalability, availability, reliability

βœ… Correct (context-specific):

1. Consistency vs Availability

  • Booking β†’ Strong Consistency
    • No double booking ❗
  • Search/View β†’ High Availability
    • Slight delay acceptable

2. Read vs Write Ratio

  • Reads >> Writes (β‰ˆ 100:1)
    • Many users browse
    • Few actually book

3. Traffic Pattern

  • Normal traffic β†’ low
  • Event drops (Taylor Swift, World Cup) β†’ HUGE spike

πŸ‘‰ System must handle bursty traffic


2️⃣ Core Entities

Keep it simple initially:

  • Event
  • Venue
  • Performer
  • Ticket

3️⃣ APIs Design

Map APIs β†’ Functional Requirements

1. View Event

GET /events/{eventId} β†’ Event & Venue & Performer & Ticket[]

2. Search Events

GET /events/search?term=&location=&date=&type= β†’ Partial<Event> [ ]

3. Booking (2-Phase Process πŸ”₯)

POST /booking/reserve
Body: { ticketId }

POST /booking/confirm
Body: { ticketId, paymentDetails }


4️⃣ HLD

ticket-master.excalidraw

Seat blocking

Issue:

  • User reserves β†’ leaves β†’ seat blocked forever ❌

Solution 1: Timestamp + Query

  • Add column reserved_at + check > 10 mins ❌ Messy logic

Solution 2: Cron Job

  • Runs every 10 mins β†’ release tickets from reserves to available

❌ Delta Delay issue of 10 min

βœ… Best Solution: Distributed Lock (Redis πŸ”₯)

πŸ‘‰ Goal: Prevent 2 users from reserving same seat at same time.

ticketId β†’ locked (TTL = 10 min)
βœ” Auto-expiry
βœ” Clean design
βœ” Scalable

SET ticket:101 userA NX EX 600

  1. User Opens Event Page GET /events/101/tickets
  2. Backend Checks DB SQL QUERY
TicketStatus
1available
2booked
3available
  1. Backend Checks Redis Locks ticket:3 = userA (TTL left 7 min) Final Response to User
TicketFinal UI State
1available βœ…
2booked ❌
3reserved ❌

What Actually Happens If Redis Fails?
1. ❌ Immediate impact:
  • All active locks are lost
  • System forgets reservations

πŸ‘‰ Result:

  • Multiple users may try to book same ticket
  • Temporary overbooking attempts
2. Why System Still Doesn’t Break Completely ?

Because: πŸ‘‰ Final consistency is enforced by database (PostgreSQL)

DB Protection:
  • ACID transactions
  • Unique constraint on ticket
Example flow:
  1. User A β†’ payment success β†’ tries to book
  2. User B β†’ also tries

πŸ‘‰ DB ensures: Only one transaction succeeds βœ” One wins
❌ Others fail

3. User Impact (IMPORTANT ⚠️)
  • Users who thought they reserved β†’ will lose it
  • Some users:
    • Complete payment β†’ get error ❌
    • Bad experience 😬

πŸ‘‰ This is called:

Consistency preserved, UX degraded

4. How to Handle This (Real Systems)

βœ… Option 1: Redis High Availability (Basic Fix)

  • Use Redis Cluster / Sentinel
  • Replication + failover

βœ” Reduces failure chance
❌ Still possible edge cases

βœ… Option 2: Graceful Degradation

When Redis is down:

  • Skip reservation step
  • Allow direct booking

πŸ‘‰ DB becomes single source of truth

βœ” System still works
❌ Higher contention

βœ… Option 3: Retry + Compensation

If payment succeeds but booking fails:

  • Refund automatically
  • Show message:

β€œSeat already taken, amount well be refunded”

  • βœ” Its common in real systems

4️⃣ Deep Dive

Problem:

SQL LIKE β†’ full DB scan ❌

Solution:

  • Move Search to search optimized database β†’ Elasticsearch / AWS OpenSearch
  • It builds inverted index to searching document by terms really quickly
  • Fast text search
  • Supports:
    • text
    • location
    • filters

Example: Event text:

Taylor Swift Live Concert Mumbai

Elasticsearch tokenizes into:

taylor
swift
live
concert
mumbai

Then maps:

swift -> [event1, event5]
concert -> [event1, event8]
...

So lookup becomes extremely fast.

Data Sync with primery data store:

  1. Dual write (DB + ES) ❌
  2. CDC (Change Data Capture) βœ… better

CDC itself only captures database changes. We still need a consumer component, either a CDC connector or an indexing service, to transform the change event and update Elasticsearch through its indexing APIs.

Are writes frequent? If: 100M searches/day100 event updates/day Then: Kafka = probably unnecessary

If: Millions of ticketmaster events updates/day Then: `Kafka = good idea β€œIf indexing traffic becomes high, I can introduce Kafka as a buffer (temporary holding area between two systems).”

  • Cashing - CDN (fastest for common API call) e.g GET /events/search?query=taylor+swift
  • AWS Elastic Search has option - Node Query cashing
  • Redis cache - betwen search service and elastic search

Strong Interview Answer

Initially I’d use SQL search for MVP, but for highly popular searches like Taylor Swift, full-table scans won’t scale. I’d move search to Elasticsearch/OpenSearch for inverted-index-based retrieval. Since search traffic is highly read-heavy and repetitive, I’d add CDN and Redis caching to absorb spikes. Search services would scale horizontally behind a load balancer, and API Gateway would enforce rate limiting to protect the system.


Deep Dive 2: Stale Seat Map Problem

Problem

User opens event page:

10:00:00 AM
GET /events/101/tickets

Response:

Seat A1 β†’ Available
Seat A2 β†’ Available
Seat A3 β†’ Available

⚠️ What Goes Wrong?

After 2 seconds:

User B books Seat A1

Database now:

Seat A1 β†’ Booked

But User A’s screen still shows:

Seat A1 β†’ Available

because the page was loaded earlier. πŸ‘‰ Client data becomes stale.

Strong Interview Answer

The seat map can become stale because ticket availability changes frequently. To keep clients synchronized, I would establish a real-time channel using Server-Sent Events (or WebSockets). Whenever a ticket is booked or reserved, the server pushes updates to connected clients so unavailable seats are immediately disabled in the UI.

Solution 1: Polling

Every few seconds: GET /events/101/tickets Refresh seat availability.

Problem

  • Too many requests
  • Expensive for popular events

Solution 2: Long Polling

Client sends request:

GET /events/101/updates

Server keeps connection open.

When seat changes:

Seat A1 booked

Server responds immediately.

Client ←──────── Server

Persistent connection. Whenever ticket state changes:

Seat A1 β†’ booked
Seat A2 β†’ reserved

Server pushes update instantly. No need for repeated requests.

Flow:

User books seat
       ↓
Booking Service
       ↓
DB updated
       ↓
Event Service
       ↓
SSE push
       ↓
All connected clients

UI updates in real-time.


🎯 Deep Dive 3: Handling Taylor Swift / World Cup Ticket Rush

Problem

Normally, users open the event page and see available seats.

Seat A1 β†’ Available
Seat A2 β†’ Available
Seat A3 β†’ Available

But for a huge event like:

  • Taylor Swift concert
  • World Cup Final
  • Super Bowl

Millions of users arrive at the same time.

⚠️ What Happens?

Suppose:

100,000 seats
10,000,000 users

All users load the seat map simultaneously. Initially everyone sees:

Seat A1 β†’ Available
Seat A2 β†’ Available
...

Then within seconds:

User1 books A1
User2 books A2
User3 books A3
...

Real-time updates start arriving. The seat map rapidly turns into:

A1 ❌
A2 ❌
A3 ❌
A4 ❌
A5 ❌
...

To many users it feels like:

β€œI entered the page and everything instantly became unavailable.”

This creates a terrible user experience.

Interview Answer

For highly popular events such as Taylor Swift concerts or World Cup finals, allowing everyone to enter the seat-selection page creates a poor experience because seats disappear instantly. Instead, I would introduce a virtual waiting queue. Users enter the queue first, and only a controlled number of users are allowed into the booking flow at a time. This protects the backend and provides a fairer and more predictable user experience.

🚨 Why Scaling More Servers Doesn’t Solve It

Many candidates say:

Let's add more servers.

But the issue isn’t server capacity. The issue is:

Too many users competing for too few seats.

Even with infinite servers:

100,000 seats
10,000,000 users

Most users will still lose.

πŸš€ Solution: Virtual Waiting Queue

Note: this will be enable for only Taylor Swift/ High Traffic Events only β€” not for all Instead of letting everyone enter immediately:

Users
   ↓
Virtual Queue
   ↓
Ticket Page

Flow

Step 1

Users arrive:

10M users
Step 2

Put them into queue.

Position #1
Position #2
Position #3
...

Store in Redis Sorted Set.

Step 3

Only allow a small batch in. Example:

Allow first 1000 users
Step 4

When seats are booked:

1000 users leave

Next batch enters:

Next 1000 users
Benefits
Protect Backend

Instead of:

10M users
     ↓
Booking Service

we get:

1000 users
     ↓
Booking Service
Better User Experience

Instead of:

Everything became unavailable instantly

User sees:

You are #12,541 in queue.
Estimated wait: 8 minutes.

Much more predictable.

How Queue Is Implemented

Simple answer:

Redis Sorted Set

Store:

userId -> timestamp

or

userId -> random priority

❌ Bad Math Good Math

Most candidates do:

DAU = 100M
QPS = 10K
Storage = 5TB

Then say:

β€œOkay, it’s a large-scale system.”

And move on. The interviewer learns nothing.

❌ Bad Math

Doing calculations just because system design books told you to. Example:

100M users
1KB per event
10TB storage

Then…

...

No design decision changed. Waste of time.


βœ… Good Math

Do math only when it affects your design.

His example:

Should I shard PostgreSQL?

Now math matters.

10M events
100K tickets/event

Calculate:

Total storage
Total QPS

Then conclude:

Single DB won't work
Need sharding

Now math influenced architecture.


🧠 For Ticketmaster

Good places to do math:

1. Search Traffic

10M users
1 search/sec

Can Elasticsearch handle it? Need cache? Need CDN?

2. Booking Traffic

100K seats
10M users

Need waiting queue? Answer = yes.

3. Database Size

1M events
50K tickets/event

How many ticket rows? Can one PostgreSQL instance handle it? Need partitioning/sharding?

Paraphrased:

Don’t do back-of-the-envelope calculations at the beginning just to check a box. Do calculations when they help you make a design decision.

πŸ”₯ Interview Trick

If interviewer asks:

β€œAny estimations?”

Do:

Let me estimate whether a single database can handle ticket storage before deciding if I need sharding.

That’s much stronger than:

DAU = 100M
Storage = 10TB
Moving on...