Captain Codeman Captain Codeman

Schema Versioning with Google Firestore

Auto upgrade your entities

Contents

What Is Schema Versioning & Why Use It?

NoSQL is great, freedom from those rigid fixed database schemas and data-migrations. What could possibly go wrong?

It’s not really schema-free of course, it’s that each entity can have it’s own schema which means that as your app evolves your data structures may also change with it. Running a database migration everytime it does is one approach but if you’re using NoSQL for it’s scale and flexibility then needing to coordinate code deployment and database updates makes things difficult if not impossible without downtime.

One approach to solving this is Schema Versioning. This is where your application always writes data based on the current version of the schema but existing data is left “as-is” and the app knows how to “upgrade” these existing entities as they are read, so the upgrade happens gradually instead of needing to be done as part of a deployment.

This article shows how to handle Schema Versioning with Firestore and TypeScript.

Simple Example

It’s easier to work with a simple example so picture the scene - you create a new startup, the 10 Dollar Store! (it’s like the Dollar Store but with todays inflation).

Everything in your fictional store will cost $1 (or $9.99 because that fools people into thinking things are cheaper, right?)

You code your app and the product collection is super-simple - each product just needs a name, the prices are fixed after all:

export interface Product {
  name: string
}

Business is good and soon you start to expand the products you offer. It turns out that only selling things that cost $10 is limiting and inflation starts eating into your profits so you want to be able to change some prices. So you add a price field to your product entities. You’re based in the US so prices will always be in USD … right?

export interface Product {
  name: string
  usd: number
}

This gives things a boost and soon you’re looking to expand internationally which is when you realize that having the price field be fixed as USD was a mistake, so you add a proper Money type to your schema:

export type Currency = 'USD' | 'CAD' | 'GBP' | 'EUR' | 'AUD'

export interface Money {
  currency: Currency
  amount: number
}

export interface Product {
  name: string
  price: Money
}

Along the way you don’t update any existing data - you just write new product entities using whatever the schema is at the time. How does your app handle reading products?

Schema Versioning The Hard Way

The difficult way to do it would be to check the fields when data is read. Does the entity have a price property? If it does, then it’s the latest version already. If not then does it have the usd property? Again, if it does then it was the 2nd version, if not then it was the original schema version.

At each step the entity can be ‘upgraded’ to match the latest format so application code only ever has to handle the one, latest, schema.

This is doable but the logic can become complex and doesn’t work if the change was to how the values of fields may be interpreted. Is an image name the ID of an image assumed to have a .jpg extension, or is it now a full image path that could end in .webp or .avif? Again, it’s doable but it is less exact and easy to get wrong.

Adding a Schema Version Field

A better approach is to add a field to our schema. Even if we don’t start with schema versioning planned, when we make the first schema change we can introduct the version value.

Let’s update our TypeScript definitions to include a version number. We’ll use v for the propery name:

export type Currency = 'USD' | 'CAD' | 'GBP' | 'EUR' | 'AUD'

export interface Money {
  currency: Currency
  amount: number
}

export interface ProductBase {
  v: number
}

export interface ProductV1 extends ProductBase {
  v: 1
  name: string
}

export interface ProductV2 extends ProductBase {
  v: 2
  name: string
  usd: number
}

export interface ProductV3 extends ProductBase {
  v: 3
  name: string
  price: Money
}

// latest product schema
export type Product = ProductV3

// versioned product schemas
export type ProductVersioned = ProductV1 | ProductV2 | ProductV3

As our system evolved, we added the new product versions. Product type always points to the latest and we added a union type called ProductVersioned which is the combination of all of them. This will allow us to use discriminated unions when reading a product to update it’s schema.

Version Upgrades

To show how the entity upgrading works, we’ll add a method that will accept any product entity version and upgrade it to match the latest schema version.

export function upgradeProduct(product: ProductVersioned): Product {
  // initial, unversioned producs will be missing 
  // the version number so we set it here
  // it's just easier than trying to deal with an
  // undefined value in discriminated unions
  if (product.v === undefined) product.v = 1

  switch (product.v) {
    case 1:
      // upgrade from v1 to v2 by adding usd with default price
      product = {
        v: 2,
        name: product.name,
        usd: 9.99,
      }
    case 2:
      // upgrade from v2 to v3 by converting usd to Money object
      product = {
        v: 3,
        name: product.name,
        price: { currency: 'USD', amount: product.usd },
      }
    case 3:
      // the latest schema version ... for now!
      return product
    default:
      throw assertNever(product)
  }
}

export function assertNever(x: never): never {
  throw new Error("Unexpected object: " + x);
}

The trick here is the Typescript Discriminated Union - inside each case of the switch block, Typescript knows exactly what the shape of that version of the product schema was, which makes it easier to correctly migrate it to the next version.

Each version falls through to the next until it reaches the latest version which is returned. The beauty of this approach is that we don’t change any existing schema version upgrade code when adding a new version, we just add code to make the next incremental change.

The default that throws an assertNever makes the version check exhaustive. It will cause a type error if we add a new product version and don’t handle it in the upgrade code or add upgrade code and forget to add the version to the ProductVersioned union type.

So we have a function that we can pass in any product version and it will upgrade it to the latest version.

Here’s a unit test to show we always get back the same latest version, whatever version is passed in:

import { ProductVersioned, upgradeProduct } from "./schema"

const expected = {
  v: 3,
  name: 'Test',
  price: { currency: 'USD', amount: 9.99 },
}

describe('schema', () => {
  it('handles v0', async () => {
    const product = upgradeProduct(getProductVersion(0))
    expect(product).toEqual(expected)
  })

  it('handles v1', async () => {
    const product = upgradeProduct(getProductVersion(1))
    expect(product).toEqual(expected)
  })

  it('handles v2', async () => {
    const product = upgradeProduct(getProductVersion(2))
    expect(product).toEqual(expected)
  })

  it('handles v3', async () => {
    const product = upgradeProduct(getProductVersion(3))
    expect(product).toEqual(expected)
  })
})

export function getProductVersion(id: number): ProductVersioned {
  switch (id) {
    case 0:
      return { name: 'Test' } as ProductVersioned
    case 1:
      return { v: 1, name: 'Test' }
    case 2:
      return { v: 2, name: 'Test', usd: 9.99 }
    case 3:
      return { v: 3, name: 'Test', price: { currency: 'USD', amount: 9.99 } }
    default:
      throw 'unexpected version'
  }
}

In case you’re wondering about the v0 and check for v being undefined in the upgrade function - that is to handle original Product entities that didn’t have any version property set. But undefined doesn’t work wirth discriminated unions to narrow the type so this is a simpler, more pragmatic approach to dealing with it.

Schema Versioning With Firestore

Nothing so far has really been Firestore related and the technique to read an entity and automatically upgrade it should be usable with any NoSQL datastore. But Firestore recently added a withConverter method which makes it easier to save and load strongly-typed data and it’s also a convenient place to hook in to our versioning code.

First, the code to initialize Firestore. This is using the firebase-admin SDK but the same approach works for using the client-side firebase SDK or the @google-cloud/firestore Node Firestore client directly. I also use a helper function to convert any Firestore Timestamp properties to regualar JavaScript Date objects - I rarely need nanosecond precision, this should really be a built-in option on the client.

import { getFirestore, QueryDocumentSnapshot, WithFieldValue, FirestoreDataConverter, Timestamp } from 'firebase-admin/firestore'
import { app } from './app'
import { Product, ProductVersioned, upgradeProduct } from './schema'

export const firestore = getFirestore(app)

// converts firestore Timestamp to regular JS Date objects
const convert = (obj: any) => {
  for (const prop in obj) {
    if (obj[prop] instanceof Timestamp) {
      obj[prop] = obj[prop].toDate();
    } else if (typeof obj[prop] === 'object') {
      convert(obj[prop]);
    } else if (Array.isArray(obj[prop])) {
      obj[prop].forEach(convert);
    }
  }
  return obj
}

const toJS = <T>(doc: QueryDocumentSnapshot, id?: string) => {
  return <T>{ ...convert(doc.data()), id: id ? id : decodeURIComponent(doc.id) }
}

The final part is defining our costom converter which simply needs to cast the loaded data as the “any product version” type (ProductVersioned) which is then upgraded to the latest version:

const productConverter = {
  toFirestore: (product: WithFieldValue<Product>) => {
    return product
  },
  fromFirestore: (snapshot: QueryDocumentSnapshot) => {
    const product = toJS<ProductVersioned>(snapshot)
    return upgradeProduct(product)
  }
} satisfies FirestoreDataConverter<Product>

This is used by including a .withConverter(productConverter) in any get or query that loads data from Firestore:

export async function fetchProduct(id: number): Promise<Product> {
  const snap = await firestore.doc(`products/${id}`).withConverter(productConverter).get()
  return snap.data() as Product
}

Now our codebase only needs to handle the latest Product schema and we don’t have to worry too much about entities that were written with previous versions.

Whether you need to export all the types depends on how you organize your code. Personally, I think it’s nice to have a single Product type that the rest of the codebase sees and to keep the ProductVersioned and version specific types internal to the loading module if possible.

If we wanted to unit test our Firestore converter (maybe with the Firestore Emulator as part of testing security rules etc…), we could have something similar to the previous unit test but would need some predefined test data be loaded:

firestore-schema-loading-test-data

import { fetchProduct } from "./firestore"

const expected = {
  v: 3,
  name: 'Test',
  price: { currency: 'USD', amount: 9.99 },
}

describe('schema', () => {
  it('load v1', async () => {
    const product = await fetchProduct(1)
    expect(product).toEqual(expected)
  })

  it('load v2', async () => {
    const product = await fetchProduct(2)
    expect(product).toEqual(expected)
  })

  it('load v3', async () => {
    const product = await fetchProduct(2)
    expect(product).toEqual(expected)
  })
})