The Nuxt Content v1 Content.DB database and file size
Background
I've been using NuxtJS successfully for a number of client sites, and this one, for quite a while. One of my clients, MKG Marketing, has produced a lot more content than the others with over 200 podcast episodes most containing a full transcript. It's been going very well with them maintaining all the markdown content in CloudCannon and the site itself being built and published with CloudFlare pages.
MKG, like DamienG.com, is a Nuxt 2 site I haven't upgraded yet - the upgrade Nuxt guide has failed to materialize even though Nuxt 3 has been out for a year and a half so it's probably safe to assume it's not coming. There's some Nuxt Content specific migration guidance available however.
Getting back to the now we started seeing a build error on CloudFlare pages that looks like this:
16:09:05.122 ✘ [ERROR] Error: Pages only supports files up to 25 MiB in size
16:09:05.122
16:09:05.122 _nuxt/content/db-931b471c.json is 25.1 MiB in size
Why would NuxtJS be generating a file that large? The first thing to do is to dig in and see what's in there...
What is the content.db file?
The content.db is a JSON file created by the @nuxt/content
module when building sites in static mode (nuxt generate
with target: 'static'
in the nuxt.config.js) and placed into the /dist/_nuxt/content
folder.
The JSON is minified so if you want to take a peek find a good JSON viewer. I used Firefox which was a bit slow to initially load the file but once it did it was easy to navigate. The contents look like this:
This file is actually a serialization of the LokiDB database that Nuxt Content v1 uses for two things:
- Client-side querying of content
- Serverless full-text search
We can see that in the JSON there a single collection named items
which contains all the content under the _data
branch for client-side querying and tokenized words for the full-text search under the _fullTextSearch
branch further broken down by the search field, e.g. title
, description
, text
etc.
Given that it's quite easy to see how the file could get large with a lot of content. Nuxt v3 doesn't offer serverless full-text search and even offers a way to chunk up the remaining content so this is just a Nuxt v2 and Content v1 problem.
Eliminate the content.db file?
I was advised by the helpful folks at Nuxt that I could just delete the content.db file at the end of the build process if I didn't need full-text-search and my site was truly static.
I tried this quickly by deleting content.db after a yarn generate
and then using yarn start
and I found some issues so this wasn't going to be as simple for me.
Do not use
yarn dev
for testing this as there are situation where things work in dev mode but do not work on static serving. You can also search your code base for content access in amounted
hook.
If your site works great then you can add the following to your nuxt.config.js
file. If, like me, you need it in some places then skip to the next section about how we can shrink it down.
hooks: {
"generate:done": (builder) => {
const options = builder.nuxt.options
const hash = options.publicRuntimeConfig.content.dbHash
const fileName = path.resolve(options.generate.dir, '_nuxt', 'content', `db-${hash}.json`)
fs.unlinkSync(fileName)
}
}
You'll also need to add the necessary imports at the top of the file if they're not already there:
import fs from 'fs'
import path from 'path'
If that works for you great, you're done and off you go! For the rest of us, keep reading...
Shrinking content.db by reducing full-text search data
My first thought was to eliminate the full-text search data. We don't use it and if your site is big enough to hit a 25MB limit it might be time to consider a third-party search service like Algolia which I've used before with success. So let's change that with a tweak to nuxt.config.js
:
export default {
// ...
content: {
fullTextSearchFields: () => []
}
Note: this is a function not an array! If you use an array it will still generate the full-text search data as the values are merged with the default values.
This will prevent the full-text search data from being included cut the content.db significantly. Ours went from just over 25MB to 7MB. Not bad!
Alternatively if you do want a basic search you could just limit this to the fields you want to search on like ['title', 'description']
and still see a significant reduction in size. The text
field is the full body of the articles and carries most of the weight.
Shrinking content.db by reducing content
If you still want to reduce the size of that content.db - and mine was still 7MB after removing the full text search - the the next target is the _data
branch and the content it holds.
Fixing pages that are loading content client side
In theory NuxtJS shouldn't need this data at build time as it's only used for client-side querying. But when I deleted the content.db it I found some of my components did not load data. They were components using mounted
to load data and all but one had both fetch
and mounted
doing the same thing, e.g.
export default {
data() {
return {
positions: [],
};
},
async fetch() {
this.positions = await this.$content("about/jobs").sortBy("title").fetch();
},
async mounted() {
this.positions = await this.$content("about/jobs").sortBy("title").fetch();
},
};
A bell was ringing in my head. When I removed mounted
from the component it worked fine in dev mode but then failed to load the data in generate/start mode unless the page was manually refreshed with F5. Client-side page navigation did not load the data.
This is, I believe, a bug in NuxtJS given it works fine in dev mode. I was able to find a workaround when I noticed not ALL of my components had this problem.
The ones that did not have the problem were using a fetchKey
function to ensure that their data was page or instance specific. For example the "latest from our blog" section is keyed by category so it shows the correct 3 blog posts for the current category the page is used on. You're not supposed to need a fetchKey if the component uses the same data everywhere but this seems to be broken so let's just give it a constant key.
export default {
data() {
return {
positions: [],
};
},
async fetch() {
this.positions = await this.$content("about/jobs").sortBy("title").fetch();
},
fetchKey: () => "positionsSection",
};
Hey presto! No more loading data client-side when client-side navigation is used!
If your site is now fully working you can go add that generate:done
hook to your nuxt.config.js
file to purge content.db entirely and you're done!
If however it turns out there are a few places you really do want to use some client-side content access...
The component that does need to load content client-side...
For us it's a component that shows some team members at random. If we use the server-rendered fetch
function to select the individuals then it will be random at build-time but then static until the next site build. That's not what we want.
Here's the code we're using to select 4 random team members on the client:
export default {
props: {
team: { type: String, required: false },
},
data() {
return {
people: [{}, {}, {}, {}],
};
},
async mounted() {
const people = await this.$content("team")
.sortBy("sequence")
.sortBy("name")
.fetch();
this.people = getRandom(people, 4);
},
};
function getRandom(arr, n) {
let result = new Array(n),
len = arr.length,
taken = new Array(len);
while (n--) {
const x = Math.floor(Math.random() * len);
result[n] = arr[x in taken ? taken[x] : x];
taken[x] = --len in taken ? taken[len] : len;
}
return result;
}
So because the mounted
hook runs client-side it's going to need some content in the _data
branch of the content.db file but not all of it.
Nuxt content doesn't include a way to prune just the content we need so we're going to have to do it ourselves but no problem it's only JSON. So adding to our nuxt.config.js
:
export default {
// ...
hooks: {
"generate:done": (builder) => {
const options = builder.nuxt.options
const hash = options.publicRuntimeConfig.content.dbHash
const fileName = path.resolve(options.generate.dir, '_nuxt', 'content', `db-${hash}.json`)
console.log('Pruning content database file:', fileName)
const data = JSON.parse(fs.readFileSync(fileName, 'utf8'))
const collection = data._collections[0]
collection._data = collection._data.filter(item => item.path.startsWith('/team/'))
fs.writeFileSync(fileName, JSON.stringify(data))
},
},
This brings our content.db down to just 53KB!
Technically there's more we could do in there - pruning the body
branch of each team profile in our case as we only use the meta - and sorting out the idIndex
branch but it doesn't seem to cause any problems here and 53KB is small enough for me.
Conclusion
There we go! That 25MB hard-limit of CloudFlare pages is no longer a problem and you've either eliminated it entirely (no full-text search, no client-side content access) or halved it (no full-text search) or shrunk it to a tiny stub (minimal client-side content access).
Enjoy!
Damien