- Start Date: 2022-10-14
- Reference Issues: withastro/astro#4307 (comment)
- Implementation PR: withastro/astro#5291
💡 This RFC is complimented by the Render Content proposal. Our goal is to propose and accept both of these RFCs as a pair before implementing any features discussed. We recommend reading that document after reading this to understand how all use cases can be covered.
Content Collections are a way to fetch Markdown and MDX frontmatter in your Astro projects in a consistent, performant, and type-safe way.
This introduces four new concepts:
- A new, reserved directory that Astro will manage:
src/content/
- A set of helper functions to load entries and collections of entries from this directory
- The introduction of "collections" and "schemas" (see glossary) to type-check frontmatter data
- A new, ignored directory for metadata generated from your project:
.astro/
We'll be using the words "schema," "collection," and "entry" throughout. Let's define those terms in the context of this RFC:
- Schema: a way to codify the "structure" of your data. In this case, frontmatter data
- Collection: a set of data that share a common schema. In this case, Markdown and MDX files
- Entry: A piece of data (Markdown or MDX file) belonging to a given collection
- Make landing and index pages easy to create and debug
- Standardize frontmatter type checking at the framework level
- Provide valuable error messages to debug and correct frontmatter that is malformed
First, this RFC is focused on Markdown and MDX content only. We see how this generic pattern of “collections” and “schemas” may extend to other resources like YAML, JSON, image assets, and more. We would love to explore these avenues if the concept of schemas is accepted.
We also expect users to attempt relative paths (i.e. ![image](./image.png)
) in their Markdown and MDX files. Since these files will be query-able by .astro
files, we don't expect these paths to resolve correctly without added preprocessing.
For simplicity, we will consider this use case out-of-scope. This is in-keeping with how Astro handles relative paths in Markdown and MDX today. We will raise an error whenever relative paths are used, and encourage users to use absolute assets paths instead.
- Contentlayer - This is a framework-agnostic tool for managing both local and remote content. Their schema-based approach to Markdown frontmatter types is especially notable, and informs the "schema" approach of content collections.
- Nuxt Content - This plugin brings Markdown management to NuxtJS. Their use of a
content/
directory and their ability to generate pages from content have each informed this RFC.
Say we want a landing page for our collection of blog posts:
src/content/
blog/
enterprise.md
columbia.md
endeavour.md
We can use getCollection
to retrieve type-safe frontmatter:
---
// src/pages/index.astro
// We'll talk about that `astro:content` in the Detailed design :)
import { getCollection, getEntry } from 'astro:content';
// Get all `blog` entries
const allBlogPosts = await getCollection('blog');
// Filter blog posts by frontmatter properties
const draftBlogPosts = await getCollection('blog', ({ data }) => {
return data.status === 'draft';
});
// Get a specific blog post by file name
const enterprise = await getEntry('blog', 'enterprise.md');
---
<ul>
{allBlogPosts.map(post => (
<li>
{/* access frontmatter properties with `.data` */}
<a href={post.data.slug}>{post.data.title}</a>
{/* each property is type-safe, */}
{/* so expect nice autocomplete and red squiggles here! */}
<time datetime={post.data.publishedDate.toISOString()}>
{post.data.publishedDate.toDateString()}
</time>
</li>
))}
</ul>
And optionally define a schema to enforce frontmatter fields:
// src/content/config.ts
import { z, defineCollection } from "astro:content";
const blog = defineCollection({
schema: {
title: z.string(),
slug: z.string(),
// mark optional properties with `.optional()`
image: z.string().optional(),
tags: z.array(z.string()),
// transform to another data type with `transform`
// ex. convert date strings to Date objects
publishedDate: z.string().transform((str) => new Date(str)),
},
});
export const collections = { blog };
Let's break down the problem space to better understand the value of this proposal.
First problem: enforcing consistent frontmatter across your content is a lot to manage. You can define your own types with a type-cast using Astro.glob
today:
---
import type { MarkdownInstance } from 'astro';
const posts: Array<MarkdownInstance<{ title: string; ... }>> = await Astro.glob('./blog/**/*.md');
---
However, there's no guarantee your frontmatter actually matches this MarkdownInstance
type.
Say blog/columbia.md
is missing the required title
property. When writing a landing page like this:
...
<ul>
{allBlogPosts.map(post => (
<li>
{post.frontmatter.title.toUpperCase()}
</li>
))}
</ul>
...You'll get the ominous error "cannot read property toUpperCase
of undefined." Stop me if you've had this monologue before:
Aw where did I call
toUpperCase
again?Right, on the landing page. Probably the
title
property.But which post is missing a title? Agh, better add a
console.log
and scroll through here...Ah finally, it was post #1149. I'll go fix that.
Authors shouldn't have to think like this. What if instead, they were given a readable error pointing to where the problem is?
This is why schemas are a huge win for a developer's day-to-day. Astro will autocomplete properties that match your schema, and give helpful errors to fix properties that don't.
Second problem: importing globs of content via Astro.glob
can be slow at scale. This is due to a fundamental flaw with importing: even if you just need the frontmatter of a post (i.e. for landing pages), you still wait on the content of that render and parse to a JS module as well. Though less of a problem with Markdown, globbing hundreds-to-thousands of MDX entries can slow down dev server HMR updates significantly.
To avoid this, Content Collections will focus on processing and returning a post's frontmatter, not the post's contents, and avoid transforming documents to JS modules. This should make Markdown and MDX equally quick to process, and should make landing pages faster to build and debug.
💡 Don’t worry, it will still be easy to retrieve a post’s content when you need it! See the Render Content proposal for more.
As you might imagine, Content Collections have a lot of moving parts. Let's detail each one:
This RFC introduces a new, reserved directory for Astro to manage: {srcDir}/content/
. This directory is where all collections and schema definitions live, relative to your configured source directory.
The content/
directory will only support Markdown and MDX documents to start. This means colocating other files like styles (.css
) and images .png
) will not be allowed.
However, we will all colocation for files and directories with the underscore -
prefix. This is in-keeping with Astro's existing src/pages/
convention:
src/content/
├── blog/
│ ├── _assets/
│ │ └── image.png
│ ├── _styles/
│ │ └── blog.css
│ ├── columbia.md
│ └── ...
Since we will preprocess your post frontmatter separate from Vite (see background), we need a new home for generated metadata. This will be a special .astro
directory generated by Astro at build time or on dev server startup.
We expect .astro
to live at the base of your project directory. This falls inline with generated directories like .vscode
for editor tooling and .vercel
for deployments.
To clarify, the user will not view or edit files in the .astro
directory. However, since it will live at the base of their project, they are free to check this generated directory into their repository.
Content Collections introduces a new virtual module convention for Astro using the astro:
prefix. Since all types and utilities are generated based on your content, we are free to use whatever name we choose. We chose astro:
for this proposal since:
- It falls in-line with NodeJS'
node:
convention. - It leaves the door open for future
astro:
utility modules based on your project's configuration.
The user will import helpers like getCollection
and getEntry
from astro:content
like so:
import { getCollection, getEntry } from "astro:content";
Users can also expect full auto-importing and intellisense from their editor.
All entries in src/content/
must be nested in a "collection" directory. This allows you to get a collection of entries based on the directory name, and optionally enforce frontmatter types with a schema. This is similar to creating a new table in a database, or a new content model in a CMS like Contentful.
What this looks like in practice:
src/content/
# All newsletters have the same frontmatter properties
newsletters/
week-1.md
week-2.md
week-3.md
# All blog posts have the same frontmatter properties
blog/
columbia.md
enterprise.md
endeavour.md
Collections are considered one level deep, so you cannot nest collections (or collection schemas) within other collections. However, we will allow nested directories to better organize your content. This is vital for certain use cases like internationalization:
src/content/
docs/
# docs schema applies to all nested directories 👇
en/
es/
...
All nested directories will share the same (optional) schema defined at the top level. Which brings us to...
Schemas are an optional way to enforce frontmatter types in a collection. To configure schemas, you can create a src/content/config.{js|mjs|ts}
file. This file should:
export
acollections
object, with each object key corresponding to the collection's folder name. We will offer adefineCollection
utility similar todefineConfig
in yourastro.config.*
today (see example below).- Use a Zod object to define schema properties. The
z
utility will be built-in and exposed byastro:content
.
For instance, say every blog/
entry should have a title
, slug
, a list of tags
, and an optional image
url. We can specify each object property like so:
// src/content/config.ts
import { z, defineCollection } from "astro:content";
const blog = defineCollection({
schema: {
title: z.string(),
slug: z.string(),
// mark optional properties with `.optional()`
image: z.string().optional(),
tags: z.array(z.string()),
},
});
export const collections = { blog };
You can also include dashes -
in your collection name using a string as the key:
const myNewsletter = defineCollection({...});
export const collections = { 'my-newsletter': myNewsletter };
We chose Zod since it offers key benefits over plain TypeScript types. Namely:
- specifying default values for optional fields using
.default()
- checking the shape of string values with built-in regexes, like
.url()
for URLs and.email()
for emails
...
// "language" is optional, with a default value of english
language: z.enum(['es', 'de', 'en']).default('en'),
// allow email strings only. i.e. "jeff" would fail to parse, but "[email protected]" would pass
authorContact: z.string().email(),
// allow url strings only. i.e. "/post" would fail, but `https://blog.biz/post` would pass
canonicalURL: z.string().url(),
You can browse Zod's documentation for a complete rundown of features. However, given frontmatter is limited to primitive types like strings and booleans, we expect new users to stick to simple string()
, number()
, and boolean()
utilities.
Astro provides 2 functions to query collections:
getCollection
- get all entries in a collection, or based on a frontmatter filtergetEntry
- get a specific entry in a collection by file name
These functions will have typed based on collections that exist. In other words, getCollection('banana')
will raise a type error if there is no src/content/banana/
.
---
import { getCollection, getEntry } from 'astro:content';
// Get all `blog` entries
const allBlogPosts = await getCollection('blog');
// Filter blog posts by entry properties
const draftBlogPosts = await getCollection('blog', ({ id, slug, data }) => {
return data.status === 'draft';
});
// Get a specific blog post by file name
const enterprise = await getEntry('blog', 'enterprise.md');
---
Assume the blog
collection schema looks like this:
// src/content/config.ts
import { defineCollection, z } from "astro:content";
const blog = defineCollection({
schema: {
title: z.string(),
slug: z.string(),
image: z.string().optional(),
tags: z.array(z.string()),
},
});
export const collections = { blog };
await getCollection('blog')
will return entries of the following type:
{
// parsed frontmatter
data: {
title: string;
slug: string;
image?: string;
tags: string[];
};
// unique identifier. Today, the file path relative to src/content/[collection]
id: '[filePath]'; // union from all entries in src/content/[collection]
// URL-ready slug computed from ID, relative to collection
// ex. "docs/home.md" -> "home"
slug: '[fileBase]'; // union from all entries in src/content/[collection]
// raw body of the Markdown or MDX document
body: string;
}
We have purposefully generalized Markdown-specific terms like frontmatter
and file
to agnostic names like data
and id
. This also follows naming conventions from headless CMSes like Contentful.
Also note that body
is the raw content of the file. This ensures builds remain performant by avoiding expensive rendering pipelines. See “Moving to src/pages/
" to understand how a <Content />
component could be used to render this file, and pull in that pipeline only where necessary.
As noted earlier, you may organize entries into directories as well. The result will still be a flat array when fetching a collection via getCollection
, with the nested directory reflected in an entry’s id
:
const docsEntries = await getCollection("docs");
console.log(docsEntries);
/*
-> [
{ id: 'en/getting-started.md', slug: 'en/getting-started', data: {...} },
{ id: 'en/structure.md', slug: 'en/structure', data: {...} },
{ id: 'es/getting-started.md', slug: 'es/getting-started', data: {...} },
{ id: 'es/structure.md', slug: 'es/structure', data: {...} },
...
]
*/
This is in-keeping with our database table and CMS collection analogies. Directories are a way to organize your content, but do not effect the underlying, flat collection → entry relationship.
We imagine users will want to map their collections onto live URLs on their site. This should be similar to globbing directories outside of src/pages/
today, using getStaticPaths
to generate routes dynamically.
Say you have a docs
collection subdivided by locale like so:
src/content/
docs/
en/
getting-started.md
...
es/
getting-started.md
...
We want all docs/
entries to be mapped onto pages, with those nested directories respected as nested URLs. We can do the following with getStaticPaths
:
// src/pages/docs/[...slug].astro
import { getCollection } from 'astro:content';
export async function getStaticPaths() {
const blog = await getCollection('docs');
return blog.map(entry => ({
params: { slug: entry.slug },
});
}
This will generate routes for every entry in our collection, mapping each entry slug (a path relative to src/content/docs
) to a URL.
You may prefer to compute custom slugs for each entry instead of relying on the auto-generated slug
. This is useful for generating slugs based on frontmatter, or mapping your preferred directory structure (ex. /content/blog/2022-05-10/post.md
) to URLs on your site (ex. /content/blog/post
). You can add the slug
argument to your collection config (src/content/config.*
) like so:
import { defineCollection } from 'astro:content';
const blog = defineCollection({
slug({ id, defaultSlug, data, body }) {
return data.slug ?? myCustomSlugify(id);
},
schema: {...}
});
export const collections = { blog };
Note the slug
function can access the default generated slug, entry ID, parsed frontmatter, and raw body. This allows you to generate a custom slug based on any of those values.
The above example generates routes, but what about rendering our .md
files on the page? We suggest reading the Render Content proposal for full details on how getCollection
will compliment that story.
To wire up type inferencing in those getCollection
helpers, we'll need to generate some code under-the-hood. Let's explore the engineering work required.
As discussed in Detailed Usage, the .astro
directory will be home to generated type definitions. Today, this includes a TypeScript file to declare the astro:content
module:
my-project-directory/
# added to base project directory
.astro/
content-types.d.ts
See Appendix for a sample of how this content-types
file might look.
The .astro
directory will be generated when starting the dev server and during production builds. This means type inferencing will not work until either of these commands are run.
We will add this generation step to the astro check
command as well, in case users want a lightweight command to generate the .astro
directory during development. This is inspired by Svelte's svelte-kit sync
CLI command.
Content Collections will expose Zod as an Astro built-in. This will be available from the astro:content
module:
import { z } from "astro:content";
This avoids exposing Zod from the base astro
package, preventing unecessary dependencies in core (SSR builds being the main consideration). However, to populate this virtual module, we will need a separate astro/zod
module to expose all utilities. Here's an example of how a zod.mjs
package export may look:
// zod.mjs
import * as mod from "zod";
export { mod as z };
export default mod;
We will then expose the z
utility from astro/zod
in the astro:content
virtual module. Assuming z
is only used in schema configuration files, this will avoid pulling astro/zod
into your production builds.
Alongside this manifest, we will expose getCollection
and getEntry
helpers. Users will import these helpers from the astro:content
module like so:
---
// src/pages/index.astro
import { getCollection, getEntry } from 'astro:content';
---
astro:content
will be defined as a Vite virtual module.
By avoiding the Astro
global, these fetchers are framework-agnostic. This unlocks usage in UI component frameworks and endpoint files.
Since this feature relies on a magic directory (src/content/
), we will need tests up to the end-to-end level. To summarize:
- dev server e2e tests: validate that
getCollection
andgetEntry
are usable as schema and entry files change. astro check
integration test: ensure generated types pass our type validator.astro build
integration test: ensure entries and collections build successfully, with expected data present in the build.- type file generator unit tests: Type gen should be independently testable, so it's worth a snapshot test validating the output.
By adding structure, we are also adding complexity to your code. This has a few consequences:
- Zod means a new learning curve for users already familiar with TypeScript. We will need to document common uses cases like string parsing, regexing, and transforming to
Date
objects so users can onboard easily. We will also consider CLI tools to spin upschema
entries to give new users a starting point. - Magic is always scary, especially given Astro's bias towards being explicit. Introducing a reserved directory with a sanctioned way to import from that directory is a hurdle to adoption.
There are alternative solutions to consider across several categories:
We considered a few alternatives to using Zod for schemas:
- Generate schemas from a TypeScript type. This would let users reuse frontmatter types they already have and avoid the learning curve of a new tool. However, TypeScript is missing a few surface-level features that Zod covers:
- Constraining the shape of a given value. For instance, setting a
min
ormax
character length, or testing strings againstemail
orURL
regexes. - Transforming a frontmatter value into a new data type. For example, parsing a date string to a
Date
object, and raising a helpful error for invalid dates.
- Constraining the shape of a given value. For instance, setting a
- Invent our own JSON or YAML-based schema format. This would fall in-line with a similar open source project, ContentLayer, that specifies types with plain JS. Main drawbacks: replacing one learning curve with another, and increasing the maintenance cost of schemas overtime.
In the end, we've chosen Zod since it can scale to complex use cases and takes the maintenance burden off of Astro's shoulders.
We expect most users to compare getCollection
with Astro.glob
. There is a notable difference in how each will grab content:
Astro.glob
accepts wild cards (i.e. `/posts/*/.md) to grab entries multiple directories deep, filter by file extension, etc.getCollection
accepts a collection name only, with an optional filter function to filter by entry values.
The latter limits users to fetching a single collection at a time, and removes nested directories as a filtering option (unless you regex the ID by hand). One alternative could be to mirror Contentlayer's approach, wiring schemas to wildcards of any shape:
// Snippet from Contentlayer documentation
// <https://www.contentlayer.dev/docs/sources/files/mapping-document-types#resolving-document-type-with-filepathpattern>
const Post = defineDocumentType(() => ({
name: "Post",
filePathPattern: `posts/**/*.md`,
// ...
}));
Still, we've chosen a flat collection
+ schema file approach to a) mirror Astro's file-based routing, and b) establish a familiar database table or headless CMS analogy.
Introducing a new reserved directory (src/content/
) will be a breaking change for users that have their own src/content/
directory. So, we intend to:
- Release behind an experimental flag before 2.0 to get initial feedback
- Baseline this flag for Astro 2.0
This should give us time to address all aspects and corner cases of content Collections.
We intend src/content/
to be the recommended way to store Markdown and MDX content in your Astro project. Documentation will be very important to guide adoption! So, we will speak with the docs team on the best information hierarchy. Not only should we surface the concept of a src/content/
early for new users, but also guide existing users (who may visit the "Markdown & MDX" and Astro glob documentation) to src/content/
naturally. A few initial ideas:
- Expand Project Structure to explain
src/content/
- Update Markdown & MDX to reference the Project Structure docs, and expand Importing Markdown to a more holistic "Using Markdown" section
- Add a "local content" section to the Data Fetching page
We can also ease migration for users that already have a src/content/
directory used for other purposes. For instance, we can warn users with a src/content/
that a) contains other file types or b) does not contain any schema
files.
This is a pretty major addition, so we invite readers to raise questions below! Still, these are a few our team has today:
- Should the generated
.astro
directory path be configurable? - How should we “pitch” Content Collections to
Astro.glob
users today?
This is subject to change, but should clarify what code we need to generate.
We will generate a manifest for types only, and rely on import.meta.glob
to retrieve frontmatter in code. The generated type manifest will look something like this:
// src/.astro/content-types.d.ts
declare module 'astro:content' {
export { z } from 'astro/zod';
// Get type of a given collection schema
export type CollectionEntry<C extends keyof typeof entryMap> =
typeof entryMap[C][keyof typeof entryMap[C]];
// Generate `getStaticPaths` result
export function collectionToPaths<C extends keyof typeof entryMap>(
collection: C
): Promise<import('astro').GetStaticPathsResult>;
type BaseCollectionConfig = { schema: import('astro/zod').ZodRawShape };
export function defineCollection<C extends BaseCollectionConfig>(input: C): C;
export function getEntry<C extends keyof typeof entryMap, E extends keyof typeof entryMap[C]>(
collection: C,
entryKey: E
): Promise<typeof entryMap[C][E]>;
export function getCollection<
C extends keyof typeof entryMap,
E extends keyof typeof entryMap[C]
>(
collection: C,
filter?: (data: typeof entryMap[C][E]) => boolean
): Promise<typeof entryMap[C][keyof typeof entryMap[C]][]>;
export function renderEntry<
C extends keyof typeof entryMap,
E extends keyof typeof entryMap[C]
>(entry: {
collection: C;
id: E;
}): Promise<{
Content: import('astro').MarkdownInstance<{}>['Content'];
headings: import('astro').MarkdownHeading[];
}>;
type InferEntrySchema<C extends keyof typeof entryMap> = import('astro/zod').infer<
import('astro/zod').ZodObject<CollectionsConfig['collections'][C]['schema']>
>;
const entryMap: {
"blog": {
"first-post.md": {
id: "first-post.md",
slug: "first-post",
body: string,
collection: "blog",
data: InferEntrySchema<"blog">
},
"markdown-style-guide.md": {
id: "markdown-style-guide.md",
slug: "markdown-style-guide",
body: string,
collection: "blog",
data: InferEntrySchema<"blog">
},
...
},
};
type CollectionsConfig = typeof import('/path/to/project/src/content/config');
}