Chapter 3: How Amazon, Shopify, and Marketplaces Work
Why This Exists
To build modern e-commerce systems, you must understand the two dominant architectural paradigms that rule the internet: The Aggregator (Marketplace) and The Platform (Infrastructure). You cannot design a system effectively if you don't understand whether you are building a unified marketplace (like Amazon) or a distributed platform (like Shopify). These two models solve the same problem—selling things online—but their software architectures are polar opposites.
Real World Problem
Imagine a small business that makes handmade leather wallets. They want to sell online. They face a choice:
- Build their own website, drive their own traffic, and manage their own brand identity.
- Put the wallets on a massive website where millions of people are already shopping, but compete side-by-side with 50 other wallet makers.
From an engineering perspective, building software for Scenario 1 (Shopify) requires isolating every seller into their own little world. Building software for Scenario 2 (Amazon) requires merging every seller into one gigantic, highly competitive database.
Everyday Analogy
- Shopify (The Platform) is like a company that builds shopping malls. They lease you an empty store. You paint the walls, put your name on the door, and you are responsible for getting people to walk into your specific store. If your store burns down, the mall is fine.
- Amazon (The Marketplace) is a gigantic supermarket. Amazon owns the building, the shelves, and the checkout lanes. You just supply the wallets, which they put on Aisle 4 next to your competitor's wallets. The customer doesn't care who you are; they just want the wallet.
Beginner Explanation
- Shopify provides the software for someone to run their own business. When you visit a Shopify site, it looks like an independent brand (e.g., Gymshark, Allbirds). The URL is
gymshark.com. The seller owns the customer data. - Amazon provides the destination. When you visit Amazon, it looks like Amazon. The URL is
amazon.com. Amazon owns the customer data; the seller is merely a supplier.
Intermediate Explanation
The architectural differences become clear in how data is structured.
In Shopify, the architecture is Multi-Tenant. A single codebase serves millions of merchants. However, the data is logically isolated. Merchant A cannot see Merchant B's orders. If Merchant A wants a blue website and Merchant B wants a red website, the system must render customized frontends dynamically based on the merchant_id.
In Amazon, the architecture is a Unified Catalog. There is only one product page for "Apple iPhone 15." If 50 different sellers are selling that exact iPhone, they don't get 50 different pages. They all compete for the "Buy Box" on that single page. The algorithm decides which seller actually gets the sale when a customer clicks "Add to Cart."
Advanced Explanation
At scale, the routing and database sharding strategies diverge entirely.
Shopify's Architecture (Pods): Shopify cannot put a million merchants in one database; it would collapse. Instead, they use a "Pod" architecture. A Pod is a self-contained unit of architecture (web servers, background workers, and a dedicated database shard). When Merchant A signs up, they are assigned to Pod 1. When Merchant B signs up, they go to Pod 2. This means if Pod 1 experiences a massive traffic spike (a "Flash Sale"), Pod 2 remains completely unaffected.
Amazon's Architecture (The Buy Box & Global Catalog): Amazon cannot shard by merchant because the catalog is unified. Instead, they shard by Product (ASIN) or Category. When a user views an item, the system queries the Global Catalog to render the page, then simultaneously queries the Pricing Service, Inventory Service, and Fulfillment Service to run a real-time auction among all sellers offering that item to determine who wins the Buy Box.
Real World Example
Kylie Cosmetics on Shopify:
When Kylie Jenner launched her cosmetics line on Shopify, a single tweet would send hundreds of thousands of buyers to her specific store within seconds. Shopify's routing layer detected this massive surge directed at her specific merchant_id and allocated dedicated compute resources just to her Pod to prevent her flash sale from taking down the rest of the Shopify network.
Architecture Design
Here is a conceptual view of how the two models differ in handling a product request:
graph TD
subgraph Shopify (Platform Model)
Req1[User requests Gymshark.com] --> Router1[Edge Router]
Router1 --> Pod1[Pod 1: Gymshark DB & App]
Pod1 --> Res1[Gymshark Branded Page]
end
subgraph Amazon (Marketplace Model)
Req2[User requests iPhone on Amazon.com] --> Router2[Load Balancer]
Router2 --> Cat_Service[Unified Catalog Service]
Cat_Service --> Pricing[Buy Box Engine]
Pricing --> SellerA(Seller A: $999)
Pricing --> SellerB(Seller B: $990 - WINS)
Pricing --> Res2[Amazon Page - Buy from Seller B]
end
Database Design
Shopify (Isolated):
Every query relies heavily on merchant_id.
SELECT * FROM orders WHERE merchant_id = 123 AND order_id = 456;
To scale, databases are physically sharded so merchant_id 1-10000 live on Server A, and 10001-20000 live on Server B.
Amazon (Unified): The catalog is keyed by a unique product identifier (ASIN). Sellers are joined to the product.
-- Conceptual NoSQL pattern
SELECT * FROM product_listings WHERE asin = 'B08F7PTF53';
-- Returns a list of all sellers and their current prices/inventory for this ASIN.
API Design
Shopify API:
APIs are contextually bound to the store. You authenticate as a specific store, so the URL doesn't need a store ID.
GET /admin/api/2024-01/products.json (The system infers your store from your API token).
Marketplace API:
APIs are platform-wide. A seller authenticates to update their specific offer on a global product.
POST /api/v1/offers
Payload: { asin: "B08F7PTF53", price: 990.00, inventory: 50 }
Production Considerations
- The Flash Sale Problem (Shopify): Multi-tenant systems must protect against "Noisy Neighbors," where one merchant's viral success drains resources from other merchants sharing the same infrastructure.
- The Buy Box Latency (Amazon): Calculating the winning seller requires real-time data on price, shipping speed, and seller rating. This algorithm must execute in milliseconds, requiring massive in-memory caching (e.g., Redis).
Security Considerations
- Tenant Isolation: In a platform like Shopify, a bug that allows SQL injection or cross-tenant data access is catastrophic. Strict Row-Level Security (RLS) is required to ensure one tenant cannot read another's orders.
- Fraud and Counterfeits: In a marketplace, malicious sellers will try to list counterfeit goods under legitimate ASINs. Machine learning systems must constantly scan new seller listings for anomalies.
Common Mistakes
- Building a Marketplace when you need a Platform: Trying to force a unified catalog onto a business model where sellers want their own branded identity and isolated storefronts.
- Ignoring the "Noisy Neighbor" effect: Designing a multi-tenant database without rate-limiting per tenant, leading to system-wide outages during localized traffic spikes.
Tradeoffs and Alternatives
- Control vs. Reach: As a business, you choose Shopify to own your brand and customer data (Control). You choose Amazon to get immediate access to millions of buyers (Reach).
- SaaS vs. Custom: You can build a custom Shopify-like system using modern tools, but handling the routing, multi-tenant data isolation, and global CDN caching is a massive operational burden compared to paying a SaaS fee.
Interview Questions
- You are designing a SaaS e-commerce platform. How do you design the database so that one massive customer doesn't slow down the database for all the small customers?
- Explain how you would architect the "Buy Box" feature for a marketplace where 100 sellers offer the exact same product.
- What is the difference between a Tenant and a User in a multi-tenant architecture?
Hands-On Exercise
- Go to Amazon.com, search for a popular book or electronic device. Look below the "Add to Cart" button for the link that says "New & Used from $X.XX". Click it to see how the marketplace actually structures multiple sellers under one product.
- Go to a Shopify store (e.g., Allbirds.com). Notice how nothing on the frontend indicates it is Shopify. Inspect the page source (Ctrl+U) and search for the word
shopifyto see the underlying infrastructure leaking through the HTML.
Key Takeaways
- Platforms (Shopify) provide isolated infrastructure for independent brands. Data is sharded by merchant.
- Marketplaces (Amazon) provide a unified shopping destination. Data is unified by product, and sellers compete.
- Multi-tenant architectures require strict data isolation and protection against noisy neighbors.
Further Reading
- Shopify Engineering Blog: "A Pod's Life"
- "The Everything Store: Jeff Bezos and the Age of Amazon" by Brad Stone