1οΈβ£ Requirements
Functional Requirements
Core features:
π βUsers should be able toβ¦β
- Search for events
- View event details
- Book tickets
Non-Functional Requirements (WHERE MOST PEOPLE FAIL β οΈ)
β Wrong: Scalability, availability, reliability
β Correct (context-specific):
1. Consistency vs Availability
- Booking β Strong Consistency
- No double booking β
- Search/View β High Availability
- Slight delay acceptable
2. Read vs Write Ratio
- Reads >> Writes (β 100:1)
- Many users browse
- Few actually book
3. Traffic Pattern
- Normal traffic β low
- Event drops (Taylor Swift, World Cup) β HUGE spike
π System must handle bursty traffic
4. Low Latency search
2οΈβ£ Core Entities
Keep it simple initially:
EventVenuePerformerTicket
3οΈβ£ APIs Design
Map APIs β Functional Requirements
1. View Event
GET /events/{eventId} β Event & Venue & Performer & Ticket[]
2. Search Events
GET /events/search?term=&location=&date=&type= β Partial<Event> [ ]
3. Booking (2-Phase Process π₯)
POST /booking/reserve
Body: { ticketId }
POST /booking/confirm
Body: { ticketId, paymentDetails }
4οΈβ£ HLD
ticket-master.excalidraw
Seat blocking
Issue:
- User
reservesβ leaves β seat blocked forever β
Solution 1: Timestamp + Query
- Add column
reserved_at+ check > 10 mins β Messy logic
Solution 2: Cron Job
- Runs every 10 mins β release tickets from
reservestoavailable
β Delta Delay issue of 10 min
β Best Solution: Distributed Lock (Redis π₯)
π Goal: Prevent 2 users from reserving same seat at same time.
ticketId β locked (TTL = 10 min)
β Auto-expiry
β Clean design
β Scalable
SET ticket:101 userA NX EX 600
- User Opens Event Page
GET /events/101/tickets - Backend Checks DB
SQL QUERY
| Ticket | Status |
|---|---|
| 1 | available |
| 2 | booked |
| 3 | available |
- Backend Checks Redis Locks
ticket:3 = userA (TTL left 7 min)Final Response to User
| Ticket | Final UI State |
|---|---|
| 1 | available β |
| 2 | booked β |
| 3 | reserved β |
What Actually Happens If Redis Fails?
1. β Immediate impact:
- All active locks are lost
- System forgets reservations
π Result:
- Multiple users may try to book same ticket
- Temporary overbooking attempts
2. Why System Still Doesnβt Break Completely ?
Because: π Final consistency is enforced by database (PostgreSQL)
DB Protection:
- ACID transactions
- Unique constraint on ticket
Example flow:
- User A β payment success β tries to book
- User B β also tries
π DB ensures: Only one transaction succeeds
β One wins
β Others fail
3. User Impact (IMPORTANT β οΈ)
- Users who thought they reserved β will lose it
- Some users:
- Complete payment β get error β
- Bad experience π¬
π This is called:
Consistency preserved, UX degraded
4. How to Handle This (Real Systems)
β Option 1: Redis High Availability (Basic Fix)
- Use Redis Cluster / Sentinel
- Replication + failover
β Reduces failure chance
β Still possible edge cases
β Option 2: Graceful Degradation
When Redis is down:
- Skip reservation step
- Allow direct booking
π DB becomes single source of truth
β System still works
β Higher contention
β Option 3: Retry + Compensation
If payment succeeds but booking fails:
- Refund automatically
- Show message:
βSeat already taken, amount well be refundedβ
- β Its common in real systems
4οΈβ£ Deep Dive
Deep Dive 1: Search Optimization - low-latency search
Problem:
SQL LIKE β full DB scan β
Solution:
- Move Search to search optimized database β
Elasticsearch / AWS OpenSearch - It builds inverted index to searching document by terms really quickly
- Fast text search
- Supports:
- text
- location
- filters
Example: Event text:
Taylor Swift Live Concert Mumbai
Elasticsearch tokenizes into:
taylor
swift
live
concert
mumbai
Then maps:
swift -> [event1, event5]
concert -> [event1, event8]
...
So lookup becomes extremely fast.
Data Sync with primery data store:
- Dual write (DB + ES) β
- CDC (Change Data Capture) β better
CDC itself only captures database changes. We still need a consumer component, either a CDC connector or an indexing service, to transform the change event and update Elasticsearch through its indexing APIs.
Are writes frequent?
If: 100M searches/day100 event updates/day
Then: Kafka = probably unnecessary
If: Millions of ticketmaster events updates/day
Then: `Kafka = good idea
βIf indexing traffic becomes high, I can introduce Kafka as a buffer (temporary holding area between two systems).β
Popular Query / Hot Query search:
- Cashing - CDN (fastest for common API call) e.g
GET /events/search?query=taylor+swift - AWS Elastic Search has option -
Node Query cashing - Redis cache - betwen search service and elastic search
Strong Interview Answer
Initially Iβd use SQL search for MVP, but for highly popular searches like Taylor Swift, full-table scans wonβt scale. Iβd move search to Elasticsearch/OpenSearch for inverted-index-based retrieval. Since search traffic is highly read-heavy and repetitive, Iβd add CDN and Redis caching to absorb spikes. Search services would scale horizontally behind a load balancer, and API Gateway would enforce rate limiting to protect the system.
Deep Dive 2: Stale Seat Map Problem
Problem
User opens event page:
10:00:00 AM
GET /events/101/ticketsResponse:
Seat A1 β Available
Seat A2 β Available
Seat A3 β Availableβ οΈ What Goes Wrong?
After 2 seconds:
User B books Seat A1Database now:
Seat A1 β BookedBut User Aβs screen still shows:
Seat A1 β Availablebecause the page was loaded earlier. π Client data becomes stale.
Strong Interview Answer
The seat map can become stale because ticket availability changes frequently. To keep clients synchronized, I would establish a real-time channel using Server-Sent Events (or WebSockets). Whenever a ticket is booked or reserved, the server pushes updates to connected clients so unavailable seats are immediately disabled in the UI.
Solution 1: Polling
Every few seconds: GET /events/101/tickets
Refresh seat availability.
Problem
- Too many requests
- Expensive for popular events
Solution 2: Long Polling
Client sends request:
GET /events/101/updatesServer keeps connection open.
When seat changes:
Seat A1 bookedServer responds immediately.
Solution 3: SSE (Recommended π₯)
Client βββββββββ ServerPersistent connection. Whenever ticket state changes:
Seat A1 β booked
Seat A2 β reservedServer pushes update instantly. No need for repeated requests.
Flow:
User books seat
β
Booking Service
β
DB updated
β
Event Service
β
SSE push
β
All connected clientsUI updates in real-time.
π― Deep Dive 3: Handling Taylor Swift / World Cup Ticket Rush
Problem
Normally, users open the event page and see available seats.
Seat A1 β Available
Seat A2 β Available
Seat A3 β AvailableBut for a huge event like:
- Taylor Swift concert
- World Cup Final
- Super Bowl
Millions of users arrive at the same time.
β οΈ What Happens?
Suppose:
100,000 seats
10,000,000 usersAll users load the seat map simultaneously. Initially everyone sees:
Seat A1 β Available
Seat A2 β Available
...Then within seconds:
User1 books A1
User2 books A2
User3 books A3
...Real-time updates start arriving. The seat map rapidly turns into:
A1 β
A2 β
A3 β
A4 β
A5 β
...To many users it feels like:
βI entered the page and everything instantly became unavailable.β
This creates a terrible user experience.
Interview Answer
For highly popular events such as Taylor Swift concerts or World Cup finals, allowing everyone to enter the seat-selection page creates a poor experience because seats disappear instantly. Instead, I would introduce a virtual waiting queue. Users enter the queue first, and only a controlled number of users are allowed into the booking flow at a time. This protects the backend and provides a fairer and more predictable user experience.
π¨ Why Scaling More Servers Doesnβt Solve It
Many candidates say:
Let's add more servers.But the issue isnβt server capacity. The issue is:
Too many users competing for too few seats.Even with infinite servers:
100,000 seats
10,000,000 usersMost users will still lose.
π Solution: Virtual Waiting Queue
Note: this will be enable for only Taylor Swift/ High Traffic Events only β not for all Instead of letting everyone enter immediately:
Users
β
Virtual Queue
β
Ticket PageFlow
Step 1
Users arrive:
10M usersStep 2
Put them into queue.
Position #1
Position #2
Position #3
...Store in Redis Sorted Set.
Step 3
Only allow a small batch in. Example:
Allow first 1000 usersStep 4
When seats are booked:
1000 users leaveNext batch enters:
Next 1000 usersBenefits
Protect Backend
Instead of:
10M users
β
Booking Servicewe get:
1000 users
β
Booking ServiceBetter User Experience
Instead of:
Everything became unavailable instantlyUser sees:
You are #12,541 in queue.
Estimated wait: 8 minutes.Much more predictable.
How Queue Is Implemented
Simple answer:
Redis Sorted SetStore:
userId -> timestampor
userId -> random priorityβ Bad Math Good Math
Most candidates do:
DAU = 100M
QPS = 10K
Storage = 5TBThen say:
βOkay, itβs a large-scale system.β
And move on. The interviewer learns nothing.
β Bad Math
Doing calculations just because system design books told you to. Example:
100M users
1KB per event
10TB storageThenβ¦
...No design decision changed. Waste of time.
β Good Math
Do math only when it affects your design.
His example:
Should I shard PostgreSQL?
Now math matters.
10M events
100K tickets/eventCalculate:
Total storage
Total QPSThen conclude:
Single DB won't work
Need shardingNow math influenced architecture.
π§ For Ticketmaster
Good places to do math:
1. Search Traffic
10M users
1 search/secCan Elasticsearch handle it? Need cache? Need CDN?
2. Booking Traffic
100K seats
10M usersNeed waiting queue? Answer = yes.
3. Database Size
1M events
50K tickets/eventHow many ticket rows? Can one PostgreSQL instance handle it? Need partitioning/sharding?
Paraphrased:
Donβt do back-of-the-envelope calculations at the beginning just to check a box. Do calculations when they help you make a design decision.
π₯ Interview Trick
If interviewer asks:
βAny estimations?β
Do:
Let me estimate whether a single database can handle ticket storage before deciding if I need sharding.Thatβs much stronger than:
DAU = 100M
Storage = 10TB
Moving on...