Chapter 6: Product Variants and SKUs
Why This Exists
If a company sells a t-shirt in 3 colors and 4 sizes, they technically have 12 distinct physical items in their warehouse. If they display these as 12 separate products on their website, the user experience will be terrible. Customers want to click one "T-Shirt" and select their size and color from dropdown menus. The architecture of Variants and SKUs exists to bridge the gap between a unified visual product (what the customer sees) and the specific, trackable physical item (what the warehouse ships).
Real World Problem
A startup builds an e-commerce site where each row in the products table represents a physical item. They launch a shoe in 10 sizes. The homepage now shows 10 identical pictures of the same shoe. To fix this, a developer tries to hack the UI by grouping them together on the frontend. Then, the marketing team changes the description of the shoe, but the developer has to update 10 different database rows. Eventually, data drifts, prices mismatch, and the system collapses under its own redundancy.
Everyday Analogy
Think of buying a new car.
- The Product is a "Ford Mustang."
- The Options are the Color (Red, Blue) and the Engine (V6, V8).
- The Variant is the specific combination: "Red Ford Mustang with a V8."
- The SKU (Stock Keeping Unit) is the specific VIN (Vehicle Identification Number) barcode slapped on the actual car sitting in the lot.
Beginner Explanation
- Parent Product: The overarching item. It holds the shared information: the main Title, the core Description, and the main image.
- Variant: A specific version of the Parent Product. It holds information specific to that version: the Price (a V8 costs more than a V6) and the specific Variant Image (the picture of the Red car).
- SKU (Stock Keeping Unit): A unique alphanumeric code (like
TSHIRT-RED-LRG) used by warehouse workers to scan and track the physical box on a shelf.
Intermediate Explanation
In database architecture, this is modeled as a Parent-Child relationship.
The Parent (products table) acts as an umbrella. The Children (variants table) belong to the Parent.
Every Variant must have its own SKU, its own Price, and its own Inventory count.
When a user visits /products/ford-mustang, the server queries the Parent Product, and then queries all associated Variants. The frontend uses this data to build the dropdown menus (Color, Engine). When the user changes a dropdown, the frontend dynamically swaps the price and image to match the selected Variant.
Advanced Explanation
Handling complex Variant Matrices is one of the hardest problems in e-commerce. If a product has 3 option types (Size, Color, Material), and each has 5 values, that is $5 \times 5 \times 5 = 125$ variants. What if the "Leather" material is out of stock in "Red," but only for "Size Small"?
The data model must separate the Options from the Variants.
- Options:
Color,Size. - Option Values:
Red,Large. - Variants: The intersection of Option Values (e.g., Variant 1 = Red + Large).
For high-performance search, these deeply relational tables must be denormalized. Elasticsearch indexes the Parent Product, but includes a nested array of all its Variants, so a user filtering for "Red" finds the Parent Product, but the UI knows to pre-select the Red variant.
Real World Example
Shopify vs. Amazon:
- Shopify strictly limits merchants to 3 Option Types (e.g., Size, Color, Style) and a maximum of 100 Variants per product. This hard constraint keeps their database queries insanely fast and prevents merchants from creating million-variant matrices that crash browsers.
- Amazon uses a system called "Twisters." They don't enforce a strict 3-option limit, allowing incredibly complex parent-child relationships (which is why you sometimes see bizarre, unrelated items clustered together on an Amazon product page if a seller manipulated the Twister logic).
Architecture Design
Here is the relational data model for Variants:
erDiagram
PRODUCT ||--o{ PRODUCT_VARIANT : has
PRODUCT {
int id
string title
string description
}
PRODUCT_VARIANT ||--o{ INVENTORY : tracked_in
PRODUCT_VARIANT {
int id
int product_id
string sku
decimal price
string image_url
}
PRODUCT_VARIANT ||--o{ VARIANT_OPTION_VALUE : defined_by
VARIANT_OPTION_VALUE {
int variant_id
string option_name
string option_value
}
%% Example: option_name="Size", option_value="Large"
Database Design
1. Products Table (Parent):
CREATE TABLE products (
id INT PRIMARY KEY,
title VARCHAR(255),
description TEXT
);
2. Variants Table (Child):
CREATE TABLE variants (
id INT PRIMARY KEY,
product_id INT,
sku VARCHAR(100) UNIQUE,
price DECIMAL(10,2),
stock_count INT,
FOREIGN KEY (product_id) REFERENCES products(id)
);
3. Variant Options (The Matrix):
CREATE TABLE variant_options (
variant_id INT,
option_name VARCHAR(50), -- e.g., 'Color'
option_value VARCHAR(50), -- e.g., 'Red'
PRIMARY KEY (variant_id, option_name)
);
API Design
Fetching a Product and its Variants:
GET /api/v1/products/123
{
"id": 123,
"title": "Classic T-Shirt",
"variants": [
{
"id": 991,
"sku": "TSH-RED-S",
"price": 20.00,
"options": {"Color": "Red", "Size": "Small"}
},
{
"id": 992,
"sku": "TSH-BLU-S",
"price": 22.00,
"options": {"Color": "Blue", "Size": "Small"}
}
]
}
Production Considerations
- Variant Explosion (Cartesian Products): An API that blindly generates variants for every combination of options can cause memory exhaustion. If a merchant inputs 10 sizes, 10 colors, and 10 patterns, generating 1,000 variants synchronously will time out the request. This generation must be done via asynchronous background jobs.
- Search Indexing: If a Parent product has 100 variants, and 1 variant goes out of stock, updating the entire Elasticsearch document for the Parent just for that 1 variant is inefficient but necessary.
Security Considerations
- Price Tampering: A malicious user might manipulate the frontend DOM to swap the ID of an expensive Variant (a $2000 Gold Watch) with the ID of a cheap Variant (a $20 Silver Watch) before submitting the checkout form. The backend must never trust the price sent by the client; it must always look up the price using the
variant_id.
Common Mistakes
- Putting Price on the Parent: Assuming all sizes of a shirt cost the same. When the business decides XXL costs $2 more, the database schema breaks.
- No SKU uniqueness: Allowing two variants to have the same SKU. The warehouse will ship the wrong item, guaranteed. SKU must have a
UNIQUEdatabase constraint.
Tradeoffs and Alternatives
- Relational vs. JSON Arrays: Storing variants as a JSON array inside the
productstable makes read queries extremely fast but makes updating the stock count of a single SKU very complex. Relational tables are preferred for variants because pricing and inventory are highly transactional.
Interview Questions
- Explain the difference between a Product, a Variant, and a SKU.
- How would you design a database to handle a product that has options for Size, Color, and Material?
- Why does Shopify limit merchants to 100 variants per product?
Hands-On Exercise
- Pick an item of clothing you are wearing right now.
- Create a mock SKU code for it.
- Write out the JSON payload for the Parent Product and the specific Variant you are wearing.
Key Takeaways
- Parents handle display (Title, Description). Children (Variants) handle commerce (Price, SKU, Inventory).
- The Variant Matrix is a Cartesian product of Option Values.
- Enforcing limits on variant generation is critical to protect database performance.
- Never trust frontend pricing; always validate against the Variant ID in the database.
Further Reading
- Shopify API Documentation: Products and Variants
- E-commerce Data Modeling Patterns