One-Day Calendar

07:00 — 19:00

Calendar file

Upload calendar.csv click or drop a CSV file here

People

Upload a calendar first.

Meeting duration

minutes

Actions

No calendar loaded — upload a CSV to see the day.

Pick an endpoint and follow your data down the road, station by station. Inside each station: plain-English explanations on the left, the actual Java source with real line numbers on the right — hover any explanation to light up its line. Every number in the notes (loop counts, gates fired, events clipped) is computed from the file you uploaded. The rail on the far right is the full call chain; it tracks your scroll.

Upload a CSV on the Calendar tab first — the journey is built from your real data.

Live request log

No requests yet. Upload a file or run an action on the Calendar tab.

Every rule the system applies to odd input — what the code does, where it does it, how the test suite proves it, and (when a file is loaded) what that rule actually did to your data.

Same journey as the Data journey tab, but at memory level: for every stage, what the code does step by step — and on the dark panels, what is actually held on the heap at that moment, computed from your real file: the raw bytes as a hexdump, every Event object, each person's 720-bit mask down to the twelve long words inside the BitSet, and the scan table the scheduler walks.

Upload a CSV on the Calendar tab first — the internals view is built from your real data.

"Suppose we want to turn this take-home into a real service that people can call." Worked through interview-style, in order: pin the requirements, meet every component (and why it earns its place), make the three big technology calls, draw the picture, then follow each request start to finish.

1 · Pin the requirements

Everything below is justified by these eight lines — if a component doesn't serve one of them, it doesn't get to exist.

Functional — what it must do

  • Users can upload / manage calendars (people + events) instead of re-sending a CSV per call.
  • Callers can query availability: all slots, first slot, free windows — for any set of people and duration.
  • Calendars are persistent and shared: upload once, query many times, from any client.
  • Invalid input gets a clear, line-numbered error, never a silent wrong answer.

Non-functional — how well it must do it

  • Read-heavy: ~100 : 1 queries to uploads. Optimize for query latency, p95 < 100 ms.
  • Small hot data: one day = 720 bits per person → whole org fits in cache.
  • Consistency: a finished upload must be visible to the next query (read-your-writes per calendar).
  • Availability over freshness for reads: serving a mask a few seconds stale beats a 500.

2 · Meet the components

Every box in the diagram, explained like you'd explain it to a new teammate: what it is, why it's there, and what it actually holds.

API Gateway front door

The bouncer with the guest list.

What it is
One managed entry point (nginx / AWS API Gateway / Kong) that every request passes through before touching our code.
Why it's here
So each service behind it doesn't reimplement auth, rate limits and routing. Check the ticket once, at the door — not at every room inside.
What it does
Verifies the JWT, stamps the request with a tenant_id, throttles noisy clients, routes /calendars/… to the right service.

Calendar Service writes

The intake clerk — takes your paperwork, gives you a receipt.

What it is
A small stateless service (this take-home's parser, grown up) that owns uploads and edits.
Why it's here
Writes and reads scale and fail differently. Uploads are rare, chunky and validation-heavy; queries are constant and tiny. Splitting them means a flood of uploads can never slow down "is Alice free at 3?".
What it does
Accepts the CSV, stores the raw file, queues a parse job, returns 202 + uploadId immediately.

Availability Service reads

The calculator — the only box doing actual math.

What it is
MeetingScheduler as a fleet: stateless replicas behind the gateway, autoscaled on traffic.
Why it's here
This is the hot path — 100 queries per upload. It must answer in milliseconds and keep answering even if uploads are on fire.
What it does
Fetches busy masks from Redis, ORs them, scans for slots — the same three lines of BitSet logic this demo runs, just fed from a cache instead of a fresh parse.

Person Registry identity

The company phonebook. Yes, it's a service.

What it is
A tiny service (or a module inside Calendar Service on day one) that owns the person table and hands out canonical personIds.
Why it's here
In the take-home a person is a string — query Alise (typo) and she's silently "free all day". That's a wrong meeting time nobody notices. IDs make typos impossible instead of undetected.
What it does
Resolves names → IDs at upload time, serves GET /people?q=ali autocomplete so the UI can only pick people who exist.

Postgres source of truth

The filing cabinet. If it's not here, it didn't happen.

What it is
The system of record: calendar, person, event tables (DDL below).
Why it's here
Our data is naturally relational (events belong to people belong to calendars), an upload must be all-or-nothing (transactions), and "no duplicate names per calendar" is one UNIQUE constraint. See "why not Mongo/Dynamo" below.
What it holds
Every event ever uploaded, forever. Everything else in this diagram can be lost and rebuilt from here.

Redis cache

The sticky note on the monitor — fast, tiny, disposable.

What it is
An in-memory key-value store sitting between the Availability Service and Postgres.
Why it's here
Answering "is Alice free?" from Postgres means a query per request. Answering from RAM means microseconds — and our hot data is comically small.
What it holds
Exactly one thing: busy masks. Key avail:{personId}:{date} → value = the same 720-bit BitSet this demo builds, ~90 bytes. 100,000 people ≈ 9 MB — the whole company fits in the smallest Redis tier. Lost? Rebuild any mask from Postgres in one query.

Queue (SQS-style) work

The conveyor belt — every job gets done exactly once-ish.

What it is
A simple job queue between "file received" and "file parsed".
Why it's here
Parsing in the request thread means a big file blocks the caller and a crash mid-parse loses everything. Queued jobs retry with backoff; poisoned files land in a dead-letter queue with the line-numbered parse error attached.
What flows through
One message per upload: {uploadId, s3Key, calendarId}. Workers consume, parse, write.

Kafka events · later

The company newspaper — everyone reads their own copy.

What it is
An append-only event log. Not in the day-one design — drawn dashed until it earns its keep.
Why (eventually)
A queue message is consumed once, by one worker. When several systems care that a calendar changed — cache invalidation, webhooks, search indexing, analytics — each needs its own copy. That's a log with consumer groups, not a queue.
The trigger
The moment a second consumer wants calendar.updated events, the outbox relay starts publishing to Kafka instead of calling Redis directly. Bonus: replaying the topic rebuilds any corrupted cache.

Object store (S3) raw files

The shoebox of receipts.

What it is
Blob storage for every uploaded file, exactly as it arrived.
Why it's here
When a customer says "your parser ate my calendar", you replay the original bytes. Audit, debugging, reprocessing after a parser fix — all free once the file is kept.

Parser workers async

The kitchen staff — nobody sees them, everything depends on them.

What it is
A pool of processes running this take-home's CsvEventParser, fed by the queue.
What it does
Parse → resolve names to personIds → one Postgres transaction (all rows or none — same fail-fast rule as the demo) → rebuild affected masks in Redis → emit "updated".

3 · The three big technology calls

The questions an interviewer actually asks: why this database, why this cache, why this pipe — and what would make us change the answer.

Why Postgres — and not Mongo or DynamoDB?

  • The data is a textbook relational shape. Events belong to people, people belong to calendars. Foreign keys and a JOIN express that in one line; in Mongo you'd pick a document nesting and regret it the first time a query cuts across it.
  • Uploads need transactions. "All 5,000 rows or none" is BEGIN … COMMIT. Doable in Mongo, awkward; in Dynamo, painful beyond 100 items.
  • Constraints are features. UNIQUE (calendar_id, name) and CHECK (end_min >= start_min) move validation into the database — the same "invalid data cannot exist" philosophy as the Event constructor.
  • Dynamo's superpower is wasted here. It shines at millions of ops/sec on known key patterns. Our hot reads don't even hit the DB (Redis does), and our write volume is human-scale. We'd pay in flexibility and get nothing back.
  • Choose boring. Postgres + read replicas carries this to thousands of tenants. The day one table gets too big: partition by calendar_id — clean, because no query ever crosses tenants.

Why Redis — and what exactly is inside it?

  • One key type, one value type. avail:{personId}:{2026-07-06} → a 720-bit bitmap (Redis SETBIT/GETBIT speak bitmap natively). That's it. No sessions, no objects, no cleverness.
  • The math: 720 bits ≈ 90 bytes per person-day. 100k people × 365 hot days ≈ a few GB; cache only ±30 days and it's megabytes.
  • A query never touches Postgres. MGET N masks → OR → scan. The same microsecond BitSet logic as this demo — Station 3 of the Data journey, verbatim.
  • Cache, not database. Miss → one SQL query rebuilds the mask → SETEX. Redis dies → everything still works, just slower. Never the other way around: Redis is allowed to lie briefly (seconds-stale mask), never to be the only copy.
  • Invalidation is easy here because writes are chunky: an upload rewrites one calendar → delete that calendar's keys. No distributed-cache horror stories at this shape.

Queue vs Kafka — conveyor belt vs newspaper

  • A queue is a to-do list: each parse job is picked up by one worker, done, deleted. Retries, backoff, dead-letter queue for poisoned files. This we need on day one.
  • Kafka is a newspaper: the same "calendar updated" story is read independently by the cache invalidator, the webhook sender, the search indexer — each at its own pace, with replay. This we need the day a second reader shows up, and not before.
  • Using Kafka as a job queue (or SQS as an event bus) is the classic résumé-driven mistake — the tools aren't interchangeable, they answer different questions.

What stays exactly the same as the take-home

  • The BitSet mask is untouched — it just moves from a per-request local variable into Redis, keyed per (person, date).
  • The parser and its five gates run verbatim inside workers — fail-fast with line numbers, now retryable and asynchronous.
  • The write/read seam was already in the code: CsvEventParser vs MeetingScheduler becomes Calendar Service vs Availability Service. The architecture is the class diagram, blown up.

4 · Data model & API

Core entities

  • Calendar — id, ownerId, day bounds (07:00–19:00 today; a field, not a constant).
  • Person — id, calendarId, name. Identity lives here, not in free-text strings.
  • Event — id, personId, subject, startMin, endMin. Minutes-from-day-start, exactly like the BitSet.

The APIcalendar is the resource, availability is a view on it

POST /calendars                ← CSV body, returns calendarId
GET  /calendars/{id}/people     ← who exists (drives the UI)
GET  /calendars/{id}/availability
     ?people=p1,p2&durationMinutes=60
GET  /calendars/{id}/availability/first?…
GET  /calendars/{id}/free-windows?people=…

Data modelPostgres DDL

-- one row per uploaded calendar
CREATE TABLE calendar (
  id         uuid PRIMARY KEY,
  owner_id   uuid NOT NULL,
  day_start  smallint NOT NULL DEFAULT 420,  -- 07:00 in minutes
  day_end    smallint NOT NULL DEFAULT 1140, -- 19:00
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE person (
  id          uuid PRIMARY KEY,
  calendar_id uuid NOT NULL REFERENCES calendar(id),
  name        text NOT NULL,
  UNIQUE (calendar_id, name)
);

CREATE TABLE event (
  id         uuid PRIMARY KEY,
  person_id  uuid NOT NULL REFERENCES person(id),
  subject    text NOT NULL,
  start_min  smallint NOT NULL,  -- minutes from day start
  end_min    smallint NOT NULL,
  CHECK (end_min >= start_min)
);
CREATE INDEX event_by_person ON event(person_id);

The queries that matter

-- upload = one transaction (all-or-nothing,
-- same fail-fast semantics as the parser)
BEGIN;
INSERT INTO calendar …;
INSERT INTO person …;      -- batch
INSERT INTO event …;       -- batch
COMMIT;

-- availability read (cache miss): one round trip
-- fetches every event needed to build the masks
SELECT p.name, e.start_min, e.end_min
FROM person p
LEFT JOIN event e ON e.person_id = p.id
WHERE p.calendar_id = :calId
  AND p.name IN (:people);

-- who exists? (drives the UI + typo detection)
SELECT name FROM person
WHERE calendar_id = :calId ORDER BY name;
  • The mask math stays in the service, not SQL — ORing bits in Java is microseconds; a generate_series SQL version would be slower and unindexable.
  • LEFT JOIN keeps people-without-events visible — the "free all day" rule becomes explicit instead of accidental.

5 · The picture

All the components from section 2, wired together. Solid lines are the synchronous request path; the dashed line is the async invalidation that follows every upload.

High-level designclick any box for what it is and why it's there

Web client (this site) API clients (integrations, CLI) API Gateway & Load Balancer - routing - authentication - rate limiting - request validation Calendar Service - accepts CSV uploads - parses + validates rows - writes people & events - invalidates cached masks Availability Service - loads events per person - builds 720-bit busy masks - answers slots / first / windows - caches masks per (person, day) DB Cache (Redis) busy masks, short TTL Person - id - calendarId - name Event - id - personId - subject - startMin - endMin Calendar - id - ownerId - dayStart / dayEnd POST /calendars (CSV) GET /availability upsert people & events read events on cache miss get / put masks invalidate on upload
Click any box in the diagram — this panel explains what it is, why it earned its place, and the trade-off behind it.
  • Two services, one seam. Writes (parse, validate, store) and reads (mask math) scale differently and fail differently — the same split the code already has: CsvEventParser vs MeetingScheduler.
  • The calendar becomes a stored resource. Upload once → calendarId → query forever. Today's stateless re-upload-per-query becomes GET with an id.
  • Cache is an optimization, not a dependency. Miss → rebuild from DB in O(events); Redis down → slower, not down.

6 · Follow a request — the happy paths

Every API, walked start → finish. Green is where the caller gets their answer.

POST/calendars — upload a calendar

The write path. Rare, chunky, validation-heavy — so it's asynchronous.

Clientsends the CSV
Gatewaychecks JWT, stamps tenant, rate-limits
Calendar Servicestores raw file in S3, enqueues {uploadId, s3Key}
202 Acceptedcaller gets uploadId in ~50 ms — before any parsing
Workerpicks the job off the queue
Parse + validatethe take-home's five gates, verbatim
Person Registrynames → canonical IDs
Postgresone transaction: people + events + outbox row
Redisrebuild that calendar's masks
DoneGET /uploads/{id} flips to ready; webhook fires

If a row is bad: the job lands in the dead-letter queue with the line-numbered parse error; the upload status becomes failed with that exact message. Nothing partial is ever committed.

GET/calendars/{id}/availability?people=…&durationMinutes=60 — the hot path

100× more frequent than uploads. Target: p95 < 100 ms — typically single-digit.

Clientasks "when can these people meet?"
Gatewayauth + tenant check
Availability ServiceMGET one mask per person from Redis
OR + scanthe demo's Station 3, microseconds
200 JSON{ availableSlots: […] } — Postgres never touched

Cache miss? One extra hop: SELECT that person's events from a read replica → rebuild the 90-byte mask → SETEX into Redis → continue as above. First query after an upload pays ~10 ms; every one after rides the cache.

GET/availability/first & /free-windows — same road, different last step

Client→ Gateway → Availability Service
Same maskssame MGET, same OR
Different scanfirst short-circuits on the first hit; free-windows walks the gaps
200 JSONone slot, or a list of ranges

GET/people?q=ali — autocomplete

Clienttypes "ali" in the people picker
Gatewayauth
Person Registryprefix search on the person table (tiny, indexed)
200 JSON[{id, name}] — the UI can only submit real IDs, so typos die here

7 · Two deep dives worth having ready

Consistency & the outbox

  • Upload is transactional: people + events commit together or not at all — matches today's fail-fast parser.
  • The naïve version — "commit, then delete cache keys" — has a failure window: crash between commit and invalidation leaves stale masks forever.
  • Fix: transactional outbox. The same DB transaction that writes events also inserts an outbox(calendar_id, 'updated') row. A relay reads the outbox and performs invalidation (and webhook fan-out). Crash-safe: the message exists iff the data does.
  • Availability answers stay idempotent reads — safe to retry, safe to serve from any replica.

Growth pressure points

  • Too many events per person (imports with years of history): masks are per-day so query cost is flat, but rebuild cost grows — materialize masks at write time (workers) instead of on cache miss.
  • Too many people per query ("is anyone in engineering free?", N=2000): ORing 2000 masks is still < 1 ms — the real cost is 2000 cache GETs. Batch with MGET, or maintain group masks updated incrementally on member change.
  • Too many tenants: partition person/event by calendar_id hash; Redis keys already shard naturally. Nothing crosses tenants, so sharding is embarrassingly clean.
  • DB write ceiling: events are append-mostly; if a single primary chokes, move ingestion to bulk COPY per upload before reaching for distributed SQL.

What it takes to make this production-grade. Click any box in the diagram for what that part is and why it's there; below it, each numbered upgrade explains one change in depth — click a title to expand it.

Target high-level designclick any box for what it is and why it's there

Web / Mobile users 3rd-party apps API keys API Gateway - OIDC auth (JWT) - rate limits & quotas - tenant resolution - API versioning /v1 - idempotency keys Person Registry - canonical person IDs - search / autocomplete - no free-text names Ingestion Service - receives uploads - stores raw file → S3 - enqueues parse job Availability Service ×N - stateless, autoscaled - masks from Redis - miss → rebuild from replica - slots / first / windows Object store raw uploads (S3) Queue parse jobs + DLQ Parser workers - parse + validate - resolve person IDs - write events (tx) - rebuild masks → cache Redis bitmap / (person, date) Postgres primary read replicas Webhooks / Notify "calendar updated" events Observability — metrics · traces · structured logs · alerts (SLO: availability reads p95 < 100 ms, 99.9%) GET /people?q=ali POST /calendars GET /availability raw file job consume insert events (tx) write masks emit "updated" read masks cold rebuild from replica person table
Click any box in the diagram — this panel explains what that part is and why the production design needs it.

See an upgrade in action

Pick an upgrade: on the left, what changes on the schedule; on the right, what changes in the data model — green lines are added, red lines go away.

01Person identity — stop trusting free-text names

Problem: today a person is a string. Query people=Alise (typo) and the scheduler logs a warning and treats her as free all day — a silently wrong meeting time. The edge cases below demonstrate it live.
Upgrade: a Person Registry owns canonical personIds. Uploads resolve names → IDs at ingestion (unknown name = hard error or explicit "create person" flow); queries accept only IDs. The UI gets GET /people?q= autocomplete, so a typo becomes impossible instead of undetected.

Strictness becomes a policy flag per tenant: unknown_person = reject | warn | treat-as-free.

02Real storage — a database instead of a re-parsed file

Problem: every query re-uploads and re-parses the whole CSV. No sharing, no history, no concurrent edits, size limited by request body.
Upgrade: Postgres. calendar(id, tenant_id, day_start, day_end), person(id, calendar_id, name), event(id, person_id, subject, start_min, end_min). Uploads are transactions; queries hit indexes (event(person_id)). Raw files go to object storage for audit/replay.

Partition by tenant when big; events are tiny rows, so a single primary + read replicas carries this design very far.

03Uploads that propagate — ingestion pipeline

Problem: synchronous parse-in-request means a big file or a slow moment blocks the caller, and a crash mid-parse loses everything.
Upgrade: upload → store raw file in S3 → enqueue job → return 202 Accepted + uploadId. Workers parse, validate, resolve IDs, write the DB transaction, rebuild the affected masks in Redis, then emit a calendar.updated event. Callers poll GET /uploads/{id} or receive a webhook.

Failed jobs land in a dead-letter queue with the line-numbered parse error attached — same fail-fast philosophy, now asynchronous and retryable.

04Scaling reads — masks as a cache-native structure

Problem: one JVM, masks rebuilt per request. Fine for a demo; wasteful at 10M queries/day.
Upgrade: the 720-bit mask becomes the cached unit: SETBIT avail:{personId}:{date} in Redis. A query is N mask reads + OR + scan — zero DB on the hot path. Availability nodes are stateless, so horizontal scale is a replica count; Redis loss degrades to replica-DB rebuilds, not downtime.

At extreme read volume, precompute per-team combined masks for the common "whole team free?" query.

05Security & multi-tenancy

Problem: the API is anonymous — anyone can post anything; everyone sees everything; a calendar leaks who meets whom.
Upgrade: OIDC at the gateway (JWT with tenant_id), every table keyed by tenant, row-level checks in services. API keys + scopes for machine callers (availability:read, calendar:write). Rate limits per key. Subjects are the sensitive field — encrypt at rest, omit from availability responses entirely (they never mattered to the math).

06API hardening

Problem: breaking changes would break integrators; duplicate uploads double events; errors are ad-hoc JSON.
Upgrade: version the API (/v1), idempotency keys on uploads, RFC 7807 application/problem+json errors (the line-numbered parse error slots in perfectly), pagination on list endpoints, OpenAPI spec generated from the controllers.

07Time correctness — beyond one hardcoded day

Problem: one dateless day, fixed 07:00–19:00, minute granularity, duration-grid slot stepping that can miss valid starts.
Upgrade: real dates + IANA timezones (store UTC, convert at the edge), per-calendar working hours, recurring events expanded at ingestion, and a sliding scan (step 5–15 min, or nextClearBit jumps) so a 45-minute meeting can start at 08:30. The BitSet model survives all of it — it just gets one mask per (person, date).

08Observability & operations

Problem: java.util.logging warnings nobody reads; no way to know the service is slow or wrong before users do.
Upgrade: structured logs with request IDs, RED metrics per endpoint, traces across gateway → service → cache → DB, alerts on SLO burn (p95 < 100 ms, 99.9% availability), dashboards for queue lag and cache hit rate. The "unknown person" warning becomes a counted metric — a spike means a client is sending garbage.

09Failure modes, rehearsed

Problem: single process — any failure is total; retries and partial failures are unhandled by design.
Upgrade: queue retries with backoff + DLQ for poisoned files; circuit breaker around Redis (fall back to replica rebuild); graceful degradation: availability reads keep working during an ingestion outage because they never touch the write path. Load-test the OR+scan hot path; chaos-test the cache-miss storm after a mass upload.

Below: the asks that change the model, not just the plumbing — "what if the requirements themselves grew?"

10What if the day isn't 07:00–19:00 — or is the whole day?

Problem: the working day is a constant. A hospital (24/7), a trading desk (06:00–22:00), or a team spanning offices all break it.
Upgrade: the schema above already stores day_start/day_end per calendar — the code change is passing a real WorkingDay instead of new WorkingDay(). Whole day = 1440 bits = 180 bytes; the BitSet approach doesn't blink. Clipping, scanning and caching are already written against workingDay.getLength(), not the number 720.

The demo's own domain object made this a config change, not a redesign — that's the payoff of not scattering 420 and 1140 through the code.

11What if it's a real calendar — dates, weeks, months?

Problem: events have no date. Real questions are "first slot this week", "are they free next Tuesday 14:00", recurring standups, timezones, DST.
Upgrade: the unit of computation becomes (person, date) → mask. Events store real timestamps (start_at timestamptz, end_at timestamptz, index on (person_id, start_at)); ingestion slices them into per-date masks in the calendar's timezone (a 23:00–02:00 event marks two dates — clipping logic reused). A week query = 7 masks per person, OR per date, scan per date; answers merge. Recurrences stored as RRULE and materialized into a rolling window (e.g. 90 days ahead) by a scheduled job — queries never expand rules at read time.
-- events for one person-week, cache-miss path
SELECT start_at, end_at FROM event
WHERE person_id = :p
  AND start_at < :week_end AND end_at > :week_start;
-- Redis: avail:{personId}:{2026-07-06} → 1440-bit mask

Storage math: 1440 bits/day × 365 days × 100k people ≈ 6.5 GB of masks — cache only the hot window (±30 days), rebuild cold dates from the DB on demand. DST: store UTC, build masks in the calendar's IANA zone, accept that one day a year has 1380/1500 minutes.

12What if colliding events matter — instead of silently merging?

Problem: today overlap is invisible by design — BitSet.set() twice is a no-op. Fine for "is anyone busy?", useless for "Alice is double-booked, warn her" or "the room fits 2 meetings".
Upgrade: two tiers. Detection: at ingestion, a sort-and-sweep per person flags overlapping pairs → stored in a conflict table + surfaced in the upload report and a GET /calendars/{id}/conflicts endpoint. Capacity semantics: where "busy" isn't binary (rooms, on-call rotations with N responders), the mask generalizes from 1 bit to a small counter per minute — byte[720] instead of BitSet; "available" becomes count[m] < capacity. Same shape, same OR→scan pipeline, one type swap behind the EventCalendar interface.
-- find overlaps for the upload report (sweep in SQL)
SELECT a.id, b.id FROM event a JOIN event b
  ON a.person_id = b.person_id AND a.id < b.id
WHERE a.start_min < b.end_min AND b.start_min < a.end_min;

13What if events can be added, moved, deleted?

Problem: the model is upload-and-replace. Real calendars mutate constantly; re-uploading a whole CSV to move one meeting is absurd, and two people editing at once would clobber each other.
Upgrade: event CRUD becomes the primary write path (POST/PATCH/DELETE /calendars/{id}/events/{eventId}), CSV upload demotes to a bulk import. Concurrency via optimistic locking — version column, PATCH carries If-Match, conflict → 409 and the client rebases. Masks update incrementally: a moved event touches at most two (person, date) masks — clear old range, set new range, or just rebuild that person-day from the DB (720 bits, microseconds) which is simpler and equally fast. Every mutation writes the outbox row → cache refresh + webhook delta (event.moved), so connected UIs update live.
-- move an event with optimistic concurrency
UPDATE event
SET start_min = :new_start, end_min = :new_end,
    version = version + 1
WHERE id = :eventId AND version = :expected
RETURNING version;  -- 0 rows → 409 Conflict

History for free: an event_audit table fed by the same outbox relay gives undo, "who moved my meeting", and replay-based debugging.

Why each piece exists, what it decided, and what calls what. Read top to bottom — it follows a request through the system.

The big picturewho calls whom

Browser (index.html) │ multipart POST: file (+ people, durationMinutes) ▼ AvailabilityController ── thin HTTP edge: decode multipart, map DTOs ▼ AvailabilityService ── wires the pipeline, owns no logic itself ├─▶ CsvEventParser.parse(reader) → List<Event> ├─▶ new EventCalendar(events, workingDay) → busy BitSet per person └─▶ new MeetingScheduler(calendar, workingDay) ├─ findAvailableSlots(people, duration) → List<LocalTime> ├─ findFirstAvailableSlot(...) → Optional<LocalTime> └─ getFreeWindows(people) → List<TimeRange> CalendarParseException / IllegalArgumentException └─▶ ApiExceptionHandler → HTTP 400 { "error": "..." }

ParsingCsvEventParser, EventParser, CalendarParseException

The CSV is parsed with Apache Commons CSV in RFC 4180 mode rather than String.split(","), because the sample data itself contains quoted subjects with commas ("Lunch, then a walk"). The parser also strips a UTF-8 BOM and skips an optional header row, since files exported from spreadsheets commonly have both.

Decision: fail fast on malformed rows. A bad row throws CalendarParseException carrying the line number and raw line, instead of silently skipping it — a silently dropped event would produce wrong availability, which is worse than an error.
Decision: EventParser is an interface even though there is one implementation. Parsing is the natural seam of the system: tests inject events directly, and a JSON or ICS parser could drop in without touching scheduling code.

DomainEvent, WorkingDay, TimeRange

All domain objects are immutable and validate in the constructor: an Event can never exist with a blank person or end < start. That means every layer downstream can trust its inputs and skip re-validation.

Decision: time ranges are half-open — [start, end). An event ending at 09:00 does not block a meeting starting at 09:00. This is the convention real calendars use and it makes adjacent events compose without off-by-one gaps.
Decision: the 07:00–19:00 day lives in one place, WorkingDay, not as scattered constants. Events outside the day are clipped to it; a 06:00–08:00 event blocks only 07:00–08:00.

The core data structureEventCalendar

Instead of interval arithmetic (sort, merge, subtract), each person gets a BitSet of 720 bits — one bit per minute of the working day. Loading an event just sets its minutes to 1.

  • Bit 0 = 07:00–07:01, bit 719 = 18:59–19:00.
  • Overlapping events merge for free — setting a bit twice is a no-op.
  • Combining people is one OR per person, no merge logic at all.
Decision: trade a little memory (90 bytes per person) for the simplest correct algorithm. Interval-merge code is where off-by-one bugs live; a bitmask has no edge cases once clipping is done. Load is O(events), each query is O(people + 720) — effectively constant.
Known limit — a bit can't count, and can't be safely un-set. The merge that makes overlaps free also erases them: the mask can't report a double-booking, and removing one of two overlapping meetings must never clear() bits the other still owns. That's why masks here are derived and immutable — built once from the full event list, rebuilt rather than mutated. Conflicts-as-a-feature and event deletion are answered in the Interview Q&A tab (Q3) and Upgrades tab (cards 12–13): rebuild the person-day mask on change, or swap the bit for a per-minute counter when counts matter.

Answering queriesMeetingScheduler

A query ORs the selected people's masks into one combined busy mask, then scans candidate start times. The free-check is a single call:

busy.nextSetBit(start) == -1 || nextSetBit(start) >= start + duration

— i.e. "the next busy minute is outside my window".

Decision: candidate slots step by the meeting duration (60-min meeting → 07:00, 08:00, …), matching the README's expected output exactly. Known trade-off: a sliding window (step 1–15 min) would find more starts; the grid was chosen to reproduce the specified example.
Decision: a person with no events is treated as free all day, with a warning logged — the requirement is "find slots where all persons are available", and someone absent from the file has nothing blocking them. Validation still rejects null, empty and blank names.

All input validation happens here at the public API boundary: null/empty people, blank names, zero/negative/sub-minute durations, durations longer than the day. Whitespace-padded duplicate names ("Alice ") collapse to one person.

Web layerAvailabilityController, AvailabilityService, ApiExceptionHandler

The HTTP layer was added on top of the original CLI without touching core code — the same MeetingScheduler serves both. The controller only decodes the multipart request and maps domain objects to small response records; the service wires parser → calendar → scheduler per request.

Decision: the server is stateless. The browser keeps the uploaded file and re-sends it with every query. No session, no upload storage, no cleanup — and 720-bit masks are cheap enough that rebuilding per request costs microseconds.
Decision: domain exceptions map to HTTP 400 with a JSON { error } body via ApiExceptionHandler, so the UI can show the exact parse error ("Malformed calendar entry at line 3 …") instead of a generic 500.

Frontendstatic/index.html

One static file, no framework, no build step — served by Spring Boot's static handler. The day board is painted on a graph-paper background: 1 pixel = 1 minute, so an event's top and height are just its start minute and length.

Decision: the browser never computes availability. Every result shown — slots, first slot, free windows — comes from the Java API, so the page is a demo of the backend, not a re-implementation of it. The only client-side calendar logic is drawing rectangles.

TestingMeetingSchedulerTest, CsvEventParserTest, SchedulingEndToEndTest

Three layers of tests, ~40 cases:

  • Unit — scheduler logic against in-memory events: README example, boundary events straddling 07:00/19:00, zero-length events, duplicate names, every validation rule.
  • Parser — quoted commas, header row, BOM, malformed rows.
  • End-to-end — CSV fixture files through parser → calendar → scheduler, one fixture per scenario (empty, fully booked, overlapping, invalid row).
Decision: the README's expected output is a pinned regression test. Any change that breaks the specified example fails the build.

A working model of the production write path from the Architecture and Upgrades tabs — running right here in the browser. Add, move and delete events on the schedule and watch the whole machine react in order: API → queue → a worker thread picks the job → row lock → transaction → outbox → commit → mask rebuild in Redis → webhook. Try "two clients edit at once" to see a real optimistic-lock conflict: one commit wins, the other gets a 409, refetches and retries.

Schedule · 07:00 – 19:00 — click an event to select it
Redis — one busy mask per person
Event log — every step, in order
    Queue — jobs waiting for a worker
    empty — the belt is clear
    Worker pool — two threads
    Postgres — event table
    outbox — written inside the same transaction
    • empty

    The questions this project should be able to answer out loud — asked the way an interviewer would ask them, answered the way I'd answer across the table. Click a question to reveal its answer — try answering first.

    Q1

    If you went over the project right now, what would be your opinion — what would you change?

    Honest answer: the core is right and I'd keep it — BitSet masks, fail-fast parsing, the layer seams. But a fresh read finds real things to poke at:

    • The slot grid is the biggest known trade-off. Candidates step by the meeting duration (MeetingScheduler.java:36), so a 60-minute meeting can only start at 07:00, 08:00, … — exactly matching the README's expected output, which is why I chose it. But a real product wants a sliding scan: a 45-minute meeting should be able to start at 08:30. The fix is small — step by 5–15 minutes, or jump with busy.nextClearBit() — and the README example survives as a special case.
    • AvailabilityService news up new WorkingDay() per request (line 42). Works, but the 07:00–19:00 day should be injected configuration — the WorkingDay(start, end) constructor already exists and is tested; the service just doesn't use it.
    • People are strings. Case-sensitive, no identity — alice ≠ Alice, and a typo is silently "free all day" (a logged warning the caller never sees). Fine for the exercise, the first thing to fix on the way to production (see Q6).
    • Every HTTP query re-parses the CSV. A deliberate choice — stateless server, no storage, no cleanup — and honest for a demo. But it means the API's cost is dominated by parsing, not by the microsecond scheduling math it exists to show off.
    • Smaller nits: Event could be a Java 17 record (same guarantees, half the code); java.util.logging → SLF4J; scheduler invariants ("first slot == head of all slots" is already tested) would suit property-based testing.

    The slot-grid problem, drawn: one 30-minute meeting at the start of the day, and a 60-minute meeting to place.

    Verdict: I'd change the slot stepping first — it's the one place where a documented assumption could surprise a user rather than protect them.
    Q2

    Explain the design choices — commons-csv, BitSet, the structure.

    • Apache Commons CSV, not split(","). The sample data itself contains "Lunch, then a walk" — a quoted comma. split corrupts that row silently; an RFC 4180 parser handles quotes, and the wrapper adds BOM-stripping and header detection because that's what real spreadsheet exports look like. Rule: never hand-roll a parser for a format that has a spec and a library.
    • BitSet, not interval arithmetic. The day is 720 minutes → 720 bits per person (~90 bytes). Loading an event = set its bits. Overlaps merge for free (setting a bit twice is a no-op). Combining people = one OR each. The free-check is two lines (isFree, lines 87–90). The alternative — sort, merge, subtract intervals — is where off-by-one bugs live. I traded a few bytes for an algorithm with no edge cases left.
    • The package structure is the SOLID answer. parsing / domain / repository / scheduling / service / controller — each layer has one reason to change. The proof it works: the web API was added later without touching a single core class.
    • EventParser is an interface with one implementation — normally a smell, justified here because parsing is the system's natural seam: tests inject events directly, and a JSON or ICS parser drops in without the scheduler knowing.
    • Immutability + constructor validation. An Event with end < start cannot exist. Every layer downstream trusts its inputs instead of re-checking them.
    • Fail-fast parsing with line numbers. A silently dropped row would produce a silently wrong meeting time — the worst possible failure for a scheduler. So one bad row aborts with "line 3, here's the content, here's what's wrong".
    • Half-open ranges [start, end). An event ending 09:00 doesn't block a meeting starting 09:00 — how real calendars behave, and it makes back-to-back events compose without fake conflicts.

    The engine, drawn: the README's own query — Alice + Jack, 60 minutes — as bits.

    Q3

    Doesn't the BitSet break with conflicting meetings — or when a meeting is added or removed?

    Sharp question — this is exactly where the trade-off lives. Two separate problems hide in it: a bit can't count, and a bit can't be safely un-set.

    • Adding is safe. set() is idempotent — setting a busy minute twice is a no-op. That's why overlapping meetings cost nothing for availability: busy is busy, whether Alice has one meeting at 09:00 or three.
    • But the merge destroys information. Once two meetings burn into the same bits, the mask can't tell you Alice is double-booked, can't say which meeting owns 09:15, and can't model "the room fits two meetings". Bit = 1 means busy; it doesn't remember why.
    • Removing is the real trap. Say Alice has a standup 09:00–10:00 and a review 09:30–10:30. Delete the standup and naively clear(120, 180) — you just zeroed 09:30–10:00, minutes the review still occupies. The scheduler now happily books over the review. Silent wrong answer, the worst kind.

    Why this project doesn't have the bug: the mask is never treated as the data. Look at EventCalendar — the constructor builds all masks from the full event list, and no method mutates them afterwards. Events are the source of truth; the BitSet is a throw-away index derived from them, rebuilt per request. There is no delete-a-bit code path to get wrong.

    And in production, three tiers depending on what you actually need:

    • Just add/move/delete events? Keep the same rule: mutate the event table, then rebuild that person-day's mask from the remaining events. It's 720 bits and one indexed query — microseconds. Never surgically clear bits; recompute them. (Upgrades tab, card 13.)
    • Need to detect conflicts, or count capacity? Swap the bit for a small counter: int[720] per person-day. Add = increment, delete = decrement, busy = count > 0, double-booked = count > 1, "room fits 2" = count < capacity. Same build-then-scan pipeline, one type swap hidden behind EventCalendar. (Upgrades tab, card 12.)
    • Need "which meeting is at 09:15?" That's an interval question, not a bitmap question — keep the events sorted by start time (or an interval tree at scale) next to the mask, and let each structure answer what it's good at.

    The delete trap, drawn: Alice's overlapping standup and review — and what each strategy leaves behind when the standup is deleted.

    Verdict: the BitSet is the right index for "when is everyone free?" and the wrong system of record. Keep it derived — rebuilt, never mutated — and conflicts and deletions can't hurt it. This codebase already follows that rule.
    Q4

    Explain the bonus features.

    The brief said "feel free to go above and beyond". What was added, in order of usefulness:

    • Two extra scheduling queries on the same engine: findFirstAvailableSlot (short-circuits, returns Optional — tested to always equal the head of the full list) and getFreeWindows (no duration; walks the gaps with nextClearBit/nextSetBit).
    • A REST API (Spring Boot, 4 endpoints) wrapped around the untouched core — multipart CSV upload, JSON out, domain errors mapped to 400 { error } with the exact parse message.
    • This site — a live client for the API that doubles as the project's documentation: the day board, the line-by-line data journey, memory-level internals, edge cases demonstrated against your own file. The browser computes nothing; every answer comes from the Java API.
    • Deployability: a Dockerfile and fly.toml — the demo runs as a container, not just on a laptop.
    The point: every bonus reuses MeetingScheduler untouched — the "above and beyond" is proof the core abstraction was right, not a second system bolted on.
    Q5

    Which edge cases did you check, and how did you answer them?

    Each answer follows from one of the key assumptions (documented in SOLUTION.md, pinned by tests, demonstrable live on the Edge cases tab):

    Edge caseRuling & the assumption behind itWhere
    Quoted comma in subjectParsed as one field — CSV has a spec (RFC 4180), follow it.CsvEventParserTest
    Header row / BOM / blank linesRecognised and skipped — real exports have them; they're noise, not errors.with-header.csv
    Malformed row (columns, bad time, end<start)Fail fast with line number + raw line — a dropped row means a wrong answer, and wrong beats loud never.invalid-row.csv
    Event outside 07:00–19:00Clipped to the day (06:00–08:00 blocks only 07:00–08:00); fully outside → ignored. The day bounds are law.MeetingSchedulerTest
    Overlapping eventsMerge silently — busy minutes are a set, not a sum. (BitSet gives this for free.)overlapping.csv
    Back-to-back events (end == next start)No conflict — half-open [start, end) ranges.meetingEndingExactlyAt…
    Zero-length event (start == end)Legal input, blocks nothing.zeroLengthEvent…
    Person not in the fileFree all day + logged warning — "all persons available" is satisfiable by someone with no events. (The weakest ruling; Q1 and Q5 both flag it.)unknownPersonIs…
    Duplicate / padded names ("Alice ")Trimmed and deduped to one person; case stays significant.duplicateAndWhitespace…
    Bad duration (0, negative, 90 s, > 12 h)Rejected with IllegalArgumentException → HTTP 400 — garbage questions get errors, not guesses.validateDuration tests
    Fully booked dayEmpty list / Optional.empty() — "no slots" is an answer, not an error.fully-booked.csv
    Meeting exactly the whole dayOne slot at 07:00 — the scan condition is <=, so ending exactly at 19:00 counts.durationExactlyEqual…
    The README example itselfAsserted verbatim, unit + end-to-end — the spec is a pinned regression test.readme.csv
    Q6

    It works, people are happy — now expose it as a service where people connect or upload whole calendars. How do you design that?

    The one-paragraph answer: make the calendar a stored resource instead of a request parameter. Upload once → get a calendarId → query it forever. Postgres holds the truth, Redis holds the 720-bit masks this demo already computes, and uploads become asynchronous jobs so a big file can never block a query.

    The shape of it:

    UPLOAD (rare, async) Client → Gateway (auth) → Calendar Service → S3 (raw file) + queue └→ 202 + uploadId Worker → parse (the take-home's 5 gates) → Person Registry (names → IDs) → Postgres (one transaction) → Redis (rebuild masks) → webhook "ready" QUERY (constant, fast) Client → Gateway → Availability Service → Redis MGET masks → OR + scan → 200 JSON (miss: 1 SQL read → rebuild → cache)
    • "Connect their whole calendar" (Google/Outlook sync) is the same pipeline with a different front end: an OAuth connector pulls events and feeds the identical ingestion path — the parser is just one more event source behind the EventParser seam.
    • Real dates: the unit of computation becomes (person, date) → mask. Events store real timestamps; ingestion slices them into per-date masks in the calendar's timezone. A week query = 7 masks per person. The BitSet doesn't blink.
    • What deliberately doesn't change: the mask math, the fail-fast validation, the write/read seam — the architecture is the take-home's class structure, scaled out.
    The full HLD — every component, why Postgres and not Mongo/Dynamo, what exactly lives in Redis, queue vs Kafka, and the happy path of every API — lives on the Architecture tab, one click left of here.