0. Ticket Master

1️⃣ Requirements

Functional Requirements

Core features:

👉 “Users should be able to…”
- Search for events
- View event details
- Book tickets

Non-Functional Requirements (WHERE MOST PEOPLE FAIL ⚠️)

❌ Wrong: Scalability, availability, reliability
✅ Correct (context-specific):

1. Consistency vs Availability

Booking → Strong Consistency
- No double booking ❗
Search/View → High Availability
- Slight delay acceptable

2. Read vs Write Ratio

Reads >> Writes (≈ 100:1)
- Many users browse
- Few actually book

3. Traffic Pattern

Normal traffic → low
Event drops (Taylor Swift, World Cup) → HUGE spike

👉 System must handle bursty traffic

4. Low Latency search

2️⃣ Core Entities

Keep it simple initially:

Event
Venue
Performer
Ticket

3️⃣ APIs Design

Map APIs → Functional Requirements

1. View Event

GET /events/{eventId} → Event & Venue & Performer & Ticket[]

2. Search Events

GET /events/search?term=&location=&date=&type= → Partial<Event> [ ]

3. Booking (2-Phase Process 🔥)

POST /booking/reserve
Body: { ticketId }

POST /booking/confirm
Body: { ticketId, paymentDetails }

4️⃣ HLD

ticket-master.excalidraw

Seat blocking

Issue:

User reserves → leaves → seat blocked forever ❌

Solution 1: Timestamp + Query

Add column reserved_at + check > 10 mins ❌ Messy logic

Solution 2: Cron Job

Runs every 10 mins → release tickets from reserves to available

❌ Delta Delay issue of 10 min

✅ Best Solution: Distributed Lock (Redis 🔥)

👉 Goal: Prevent 2 users from reserving same seat at same time.

ticketId → locked (TTL = 10 min)
✔ Auto-expiry
✔ Clean design
✔ Scalable

SET ticket:101 userA NX EX 600

User Opens Event Page GET /events/101/tickets
Backend Checks DB SQL QUERY

Ticket	Status
1	available
2	booked
3	available

Backend Checks Redis Locks ticket:3 = userA (TTL left 7 min) Final Response to User

Ticket	Final UI State
1	available ✅
2	booked ❌
3	reserved ❌

What Actually Happens If Redis Fails?

1. ❌ Immediate impact:

All active locks are lost
System forgets reservations

👉 Result:

Multiple users may try to book same ticket
Temporary overbooking attempts

2. Why System Still Doesn’t Break Completely ?

Because: 👉 Final consistency is enforced by database (PostgreSQL)

DB Protection:

ACID transactions
Unique constraint on ticket

Example flow:

User A → payment success → tries to book
User B → also tries

👉 DB ensures: Only one transaction succeeds ✔ One wins
❌ Others fail

3. User Impact (IMPORTANT ⚠️)

Users who thought they reserved → will lose it
Some users:
- Complete payment → get error ❌
- Bad experience 😬

Consistency preserved, UX degraded

4. How to Handle This (Real Systems)

✅ Option 1: Redis High Availability (Basic Fix)
- Use Redis Cluster / Sentinel
- Replication + failover
- ✔ Reduces failure chance
- ❌ Still possible edge cases
✅ Option 2: Graceful Degradation
- When Redis is down:
  - Skip reservation step
  - Allow direct booking
- 👉 DB becomes single source of truth
- ✔ System still works
- ❌ Higher contention
✅ Option 3: Retry + Compensation
- If payment succeeds but booking fails:
  - Refund automatically
  - Show message:
    
    “Seat already taken, amount well be refunded”
- ✔ Its common in real systems

5️⃣ Deep Dive

🔥 Deep Dive 1: Search Optimization - low-latency search

Problem:

SQL LIKE → full DB scan ❌

Solution:

Move Search to search optimized database → Elasticsearch / AWS OpenSearch
It builds inverted index to searching document by terms really quickly
Fast text search
Supports:
- text
- location
- filters

Example: Event text:

Taylor Swift Live Concert Mumbai

Elasticsearch tokenizes into:

taylor
swift
live
concert
mumbai

Then maps:

swift -> [event1, event5]
concert -> [event1, event8]
...

So lookup becomes extremely fast.

Data Sync with primery data store:

Dual write (DB + ES) ❌
CDC (Change Data Capture) ✅ better

CDC itself only captures database changes. We still need a consumer component, either a CDC connector or an indexing service, to transform the change event and update Elasticsearch through its indexing APIs.

Are writes frequent? If: 100M searches/day & 100 event updates/day Then: Kafka = probably unnecessary

If: Millions of ticketmaster events updates/day Then: `Kafka = good idea “If indexing traffic becomes high, I can introduce Kafka as a buffer (temporary holding area between two systems).”

Popular Query / Hot Query search:

Cashing - CDN (fastest for common API call) e.g GET /events/search?query=taylor+swift
AWS Elastic Search has option - Node Query cashing
Redis cache - betwen search service and elastic search

Strong Interview Answer

Initially I’d use SQL search for MVP, but for highly popular searches like Taylor Swift, full-table scans won’t scale. I’d move search to Elasticsearch/OpenSearch for inverted-index-based retrieval. Since search traffic is highly read-heavy and repetitive, I’d add CDN and Redis caching to absorb spikes. Search services would scale horizontally behind a load balancer, and API Gateway would enforce rate limiting to protect the system.

🔥 Deep Dive 2: Stale Seat Map Problem

Problem

User opens event page:

10:00:00 AM
GET /events/101/tickets

Response:

Seat A1 → Available
Seat A2 → Available
Seat A3 → Available

⚠️ What Goes Wrong?

After 2 seconds:

User B books Seat A1

Database now:

Seat A1 → Booked

But User A’s screen still shows:

Seat A1 → Available

because the page was loaded earlier. 👉 Client data becomes stale.

Strong Interview Answer

The seat map can become stale because ticket availability changes frequently. To keep clients synchronized, I would establish a real-time channel using Server-Sent Events (or WebSockets). Whenever a ticket is booked or reserved, the server pushes updates to connected clients so unavailable seats are immediately disabled in the UI.

Solution 1: Polling

Every few seconds: GET /events/101/tickets Refresh seat availability.

Problem

Too many requests
Expensive for popular events

Solution 2: Long Polling

Client sends request:

GET /events/101/updates

Server keeps connection open.

When seat changes:

Seat A1 booked

Server responds immediately.

Solution 3: SSE (Recommended 🔥)

Client ←──────── Server

Persistent connection. Whenever ticket state changes:

Seat A1 → booked
Seat A2 → reserved

Server pushes update instantly. No need for repeated requests.

Flow:

User books seat
       ↓
Booking Service
       ↓
DB updated
       ↓
Event Service
       ↓
SSE push
       ↓
All connected clients

UI updates in real-time.

🔥 Deep Dive 3: Handling Taylor Swift / World Cup Ticket Rush

Problem

Normally, users open the event page and see available seats.

Seat A1 → Available
Seat A2 → Available
Seat A3 → Available

But for a huge event like:

Taylor Swift concert
World Cup Final
Super Bowl

Millions of users arrive at the same time.

⚠️ What Happens?

Suppose:

100,000 seats
10,000,000 users

All users load the seat map simultaneously. Initially everyone sees:

Seat A1 → Available
Seat A2 → Available
...

Then within seconds:

User1 books A1
User2 books A2
User3 books A3
...

Real-time updates start arriving. The seat map rapidly turns into:

A1 ❌
A2 ❌
A3 ❌
A4 ❌
A5 ❌
...

To many users it feels like:

“I entered the page and everything instantly became unavailable.”

This creates a terrible user experience.

Strong Interview Answer

For highly popular events such as Taylor Swift concerts or World Cup finals, allowing everyone to enter the seat-selection page creates a poor experience because seats disappear instantly. Instead, I would introduce a virtual waiting queue. Users enter the queue first, and only a controlled number of users are allowed into the booking flow at a time. This protects the backend and provides a fairer and more predictable user experience.

🚨 Why Scaling More Servers Doesn’t Solve It

Many candidates say:

Let's add more servers.

But the issue isn’t server capacity. The issue is:

Too many users competing for too few seats.

Even with infinite servers:

100,000 seats
10,000,000 users

Most users will still lose.

🚀 Solution: Virtual Waiting Queue

Note: this will be enable for only Taylor Swift/ High Traffic Events only — not for all Instead of letting everyone enter immediately:

Users
   ↓
Virtual Queue
   ↓
Ticket Page

Flow

Step 1

Users arrive:

10M users

Step 2

Put them into queue.

Position #1
Position #2
Position #3
...

Store in Redis Sorted Set.

Step 3

Only allow a small batch in. Example:

Allow first 1000 users

Step 4

When seats are booked:

1000 users leave

Next batch enters:

Next 1000 users

Benefits

Protect Backend

Instead of:

10M users
     ↓
Booking Service

we get:

1000 users
     ↓
Booking Service

Better User Experience

Instead of:

Everything became unavailable instantly

User sees:

You are #12,541 in queue.
Estimated wait: 8 minutes.

Much more predictable.

How Queue Is Implemented

Simple answer:

Redis Sorted Set

Store:

userId -> timestamp

userId -> random priority

6️⃣ ❌ Bad Math Good Math

Most candidates do:

DAU = 100M
QPS = 10K
Storage = 5TB

Then say:

“Okay, it’s a large-scale system.”

And move on. The interviewer learns nothing.

❌ Bad Math

Doing calculations just because system design books told you to. Example:

100M users
1KB per event
10TB storage

Then…

...

No design decision changed. Waste of time.

✅ Good Math

Do math only when it affects your design. example:

Should I shard PostgreSQL?

Now math matters.

10M events
100K tickets/event

Calculate:

Total storage
Total QPS

Then conclude:

Single DB won't work
Need sharding

Now math influenced architecture.

🧠 For Ticketmaster

Good places to do math:

1. Search Traffic

10M users
1 search/sec

Can Elasticsearch handle it? Need cache? Need CDN?

2. Booking Traffic

100K seats
10M users

Need waiting queue? Answer = yes.

3. Database Size

1M events
50K tickets/event

How many ticket rows? Can one PostgreSQL instance handle it? Need partitioning/sharding?

Paraphrased:

Don’t do back-of-the-envelope calculations at the beginning just to check a box. Do calculations when they help you make a design decision.

🔥 Interview Trick
If interviewer asks:

“Any estimations?”

Do:
Let me estimate whether a single database can handle ticket storage before deciding if I need sharding.
That’s much stronger than:
DAU = 100M
Storage = 10TB
Moving on...

🔥 Deep Dive 4: Reduce PostgreSQL Read Load

Observation

Reads >> Writes (100:1 or more)
Event, Venue, Performer data changes rarely

Strong Interview Answer

Since event metadata changes infrequently, I would cache Event, Venue, and Performer data in Redis and only hit PostgreSQL on cache misses. Ticket availability remains in the database because it changes frequently.

Cache What Cache Key?

✅ Event
✅ Venue
✅ Performer
❌ Ticket Availability (changes frequently)

event:{eventId}
    ↓
Event + Venue + Performer

Cache Invalidation

DB Update
    ↓
Update/Invalidate Redis

Benefits

Reduces DB load
Faster response times
Handles millions of reads

7️⃣ 🚀 Scaling Optimizations Summary

Search Optimization

SQL LIKE → Elasticsearch/OpenSearch
Use inverted indexes
Cache popular searches using CDN/Redis

Stale Seat Map

Use SSE/WebSockets
Push seat updates in real time

Popular Event Surge

Virtual Waiting Queue
Redis Sorted Set
Controlled user entry

Read Optimization

Cache Event/Venue/Performer in Redis
Reduce PostgreSQL reads

Booking Consistency

Redis Distributed Lock (10 min TTL)
PostgreSQL transaction as final source of truth

8️⃣ 🎤 Interview Conclusion

At the end of the interview, you should quickly verify that your design satisfies both Functional Requirements and Non-Functional Requirements.

✅ Functional Requirements Covered

Search Events
- Elasticsearch/OpenSearch
- CDN/Redis caching
View Event Details
- Event Service
- Redis cache for Event/Venue/Performer
Book Tickets
- Reserve seat using Redis Distributed Lock
- Confirm booking using PostgreSQL transaction

✅ Non-Functional Requirements Covered

Strong Consistency
- PostgreSQL transactions
- No double booking
High Availability
- Search and View APIs are cache-backed
Scalability
- Horizontally scalable services
- Virtual waiting queue for traffic spikes
Low Latency Search
- Elasticsearch + CDN/Redis caching
Real-Time Updates
- SSE for seat availability

Summary

We designed a Ticketmaster-like ticket booking system using a microservices architecture. Search is powered by Elasticsearch, event metadata is cached in Redis, and ticket booking uses Redis distributed locks with PostgreSQL transactions to prevent double booking. To handle large traffic spikes such as Taylor Swift concerts or World Cup finals, we introduced a virtual waiting queue and real-time seat updates via SSE. This design satisfies both the functional requirements and the scalability, consistency, and availability requirements of the system.

Om's Brain

Explorer

1️⃣ Requirements

Functional Requirements

Core features:

Non-Functional Requirements (WHERE MOST PEOPLE FAIL ⚠️)

1. Consistency vs Availability

2. Read vs Write Ratio

3. Traffic Pattern

4. Low Latency search

2️⃣ Core Entities

3️⃣ APIs Design

1. View Event

2. Search Events

3. Booking (2-Phase Process 🔥)

4️⃣ HLD

Seat blocking

Solution 1: Timestamp + Query

Solution 2: Cron Job

✅ Best Solution: Distributed Lock (Redis 🔥)

What Actually Happens If Redis Fails?

1. ❌ Immediate impact:

2. Why System Still Doesn’t Break Completely ?

DB Protection:

Example flow:

3. User Impact (IMPORTANT ⚠️)

4. How to Handle This (Real Systems)

5️⃣ Deep Dive

🔥 Deep Dive 1: Search Optimization - low-latency search

Problem:

Solution:

Data Sync with primery data store:

Popular Query / Hot Query search:

🔥 Deep Dive 2: Stale Seat Map Problem

Problem

⚠️ What Goes Wrong?

Solution 1: Polling

Problem

Solution 2: Long Polling

Solution 3: SSE (Recommended 🔥)

🔥 Deep Dive 3: Handling Taylor Swift / World Cup Ticket Rush

Problem

⚠️ What Happens?

🚨 Why Scaling More Servers Doesn’t Solve It

🚀 Solution: Virtual Waiting Queue

Flow

Step 1

Step 2

Step 3

Step 4

Benefits

Protect Backend

Better User Experience

How Queue Is Implemented

6️⃣ ❌ Bad Math Good Math

❌ Bad Math

✅ Good Math

Should I shard PostgreSQL?

🧠 For Ticketmaster

1. Search Traffic

2. Booking Traffic

3. Database Size

🔥 Deep Dive 4: Reduce PostgreSQL Read Load

Observation

Cache What Cache Key?

Cache Invalidation

Benefits

7️⃣ 🚀 Scaling Optimizations Summary

Search Optimization

Stale Seat Map

Popular Event Surge

Read Optimization

Booking Consistency

8️⃣ 🎤 Interview Conclusion

✅ Functional Requirements Covered

✅ Non-Functional Requirements Covered

Table of Contents

Mindmap

Graph View