The Nuxt Content v1 Content.DB database and file size

Background

I've been using NuxtJS successfully for a number of client sites, and this one, for quite a while. One of my clients, MKG Marketing, has produced a lot more content than the others with over 200 podcast episodes most containing a full transcript. It's been going very well with them maintaining all the markdown content in CloudCannon and the site itself being built and published with CloudFlare pages.

MKG, like DamienG.com, is a Nuxt 2 site I haven't upgraded yet - the upgrade Nuxt guide has failed to materialize even though Nuxt 3 has been out for a year and a half so it's probably safe to assume it's not coming. There's some Nuxt Content specific migration guidance available however.

Getting back to the now we started seeing a build error on CloudFlare pages that looks like this:

16:09:05.122    ✘ [ERROR] Error: Pages only supports files up to 25 MiB in size
16:09:05.122    
16:09:05.122      _nuxt/content/db-931b471c.json is 25.1 MiB in size

Why would NuxtJS be generating a file that large? The first thing to do is to dig in and see what's in there...

What is the content.db file?

The content.db is a JSON file created by the @nuxt/content module when building sites in static mode (nuxt generate with target: 'static' in the nuxt.config.js) and placed into the /dist/_nuxt/content folder.

The JSON is minified so if you want to take a peek find a good JSON viewer. I used Firefox which was a bit slow to initially load the file but once it did it was easy to navigate. The contents look like this:

Firefox showing the contents of content.db

This file is actually a serialization of the LokiDB database that Nuxt Content v1 uses for two things:

  1. Client-side querying of content
  2. Serverless full-text search

We can see that in the JSON there a single collection named items which contains all the content under the _data branch for client-side querying and tokenized words for the full-text search under the _fullTextSearch branch further broken down by the search field, e.g. title, description, text etc.

Given that it's quite easy to see how the file could get large with a lot of content. Nuxt v3 doesn't offer serverless full-text search and even offers a way to chunk up the remaining content so this is just a Nuxt v2 and Content v1 problem.

Eliminate the content.db file?

I was advised by the helpful folks at Nuxt that I could just delete the content.db file at the end of the build process if I didn't need full-text-search and my site was truly static.

I tried this quickly by deleting content.db after a yarn generate and then using yarn start and I found some issues so this wasn't going to be as simple for me.

Do not use yarn dev for testing this as there are situation where things work in dev mode but do not work on static serving. You can also search your code base for content access in a mounted hook.

If your site works great then you can add the following to your nuxt.config.js file. If, like me, you need it in some places then skip to the next section about how we can shrink it down.

  hooks: {
    "generate:done": (builder) => {
      const options = builder.nuxt.options
      const hash = options.publicRuntimeConfig.content.dbHash
      const fileName = path.resolve(options.generate.dir, '_nuxt', 'content', `db-${hash}.json`)
      fs.unlinkSync(fileName)
    }
  }

You'll also need to add the necessary imports at the top of the file if they're not already there:

import fs from 'fs'
import path from 'path'

If that works for you great, you're done and off you go! For the rest of us, keep reading...

Shrinking content.db by reducing full-text search data

My first thought was to eliminate the full-text search data. We don't use it and if your site is big enough to hit a 25MB limit it might be time to consider a third-party search service like Algolia which I've used before with success. So let's change that with a tweak to nuxt.config.js:

export default {
  // ...
  content: {
    fullTextSearchFields: () => []
  }

Note: this is a function not an array! If you use an array it will still generate the full-text search data as the values are merged with the default values.

This will prevent the full-text search data from being included cut the content.db significantly. Ours went from just over 25MB to 7MB. Not bad!

Alternatively if you do want a basic search you could just limit this to the fields you want to search on like ['title', 'description'] and still see a significant reduction in size. The text field is the full body of the articles and carries most of the weight.

Shrinking content.db by reducing content

If you still want to reduce the size of that content.db - and mine was still 7MB after removing the full text search - the the next target is the _data branch and the content it holds.

Fixing pages that are loading content client side

In theory NuxtJS shouldn't need this data at build time as it's only used for client-side querying. But when I deleted the content.db it I found some of my components did not load data. They were components using mounted to load data and all but one had both fetch and mounted doing the same thing, e.g.

export default {
  data() {
    return {
      positions: [],
    };
  },

  async fetch() {
    this.positions = await this.$content("about/jobs").sortBy("title").fetch();
  },

  async mounted() {
    this.positions = await this.$content("about/jobs").sortBy("title").fetch();
  },
};

A bell was ringing in my head. When I removed mounted from the component it worked fine in dev mode but then failed to load the data in generate/start mode unless the page was manually refreshed with F5. Client-side page navigation did not load the data.

This is, I believe, a bug in NuxtJS given it works fine in dev mode. I was able to find a workaround when I noticed not ALL of my components had this problem.

The ones that did not have the problem were using a fetchKey function to ensure that their data was page or instance specific. For example the "latest from our blog" section is keyed by category so it shows the correct 3 blog posts for the current category the page is used on. You're not supposed to need a fetchKey if the component uses the same data everywhere but this seems to be broken so let's just give it a constant key.

export default {
  data() {
    return {
      positions: [],
    };
  },

  async fetch() {
    this.positions = await this.$content("about/jobs").sortBy("title").fetch();
  },

  fetchKey: () => "positionsSection",
};

Hey presto! No more loading data client-side when client-side navigation is used!

If your site is now fully working you can go add that generate:done hook to your nuxt.config.js file to purge content.db entirely and you're done!

If however it turns out there are a few places you really do want to use some client-side content access...

The component that does need to load content client-side...

For us it's a component that shows some team members at random. If we use the server-rendered fetch function to select the individuals then it will be random at build-time but then static until the next site build. That's not what we want.

Here's the code we're using to select 4 random team members on the client:

export default {
  props: {
    team: { type: String, required: false },
  },

  data() {
    return {
      people: [{}, {}, {}, {}],
    };
  },

  async mounted() {
    const people = await this.$content("team")
      .sortBy("sequence")
      .sortBy("name")
      .fetch();
    this.people = getRandom(people, 4);
  },
};

function getRandom(arr, n) {
  let result = new Array(n),
    len = arr.length,
    taken = new Array(len);
  while (n--) {
    const x = Math.floor(Math.random() * len);
    result[n] = arr[x in taken ? taken[x] : x];
    taken[x] = --len in taken ? taken[len] : len;
  }
  return result;
}

So because the mounted hook runs client-side it's going to need some content in the _data branch of the content.db file but not all of it.

Nuxt content doesn't include a way to prune just the content we need so we're going to have to do it ourselves but no problem it's only JSON. So adding to our nuxt.config.js:

export default {
  // ...
  hooks: {
    "generate:done": (builder) => {
      const options = builder.nuxt.options
      const hash = options.publicRuntimeConfig.content.dbHash
      const fileName = path.resolve(options.generate.dir, '_nuxt', 'content', `db-${hash}.json`)

      console.log('Pruning content database file:', fileName)
      const data = JSON.parse(fs.readFileSync(fileName, 'utf8'))
      const collection = data._collections[0]
      collection._data = collection._data.filter(item => item.path.startsWith('/team/'))
      fs.writeFileSync(fileName, JSON.stringify(data))
    },
  },

This brings our content.db down to just 53KB!

Technically there's more we could do in there - pruning the body branch of each team profile in our case as we only use the meta - and sorting out the idIndex branch but it doesn't seem to cause any problems here and 53KB is small enough for me.

Conclusion

There we go! That 25MB hard-limit of CloudFlare pages is no longer a problem and you've either eliminated it entirely (no full-text search, no client-side content access) or halved it (no full-text search) or shrunk it to a tiny stub (minimal client-side content access).

Enjoy!

Damien

0 responses