Would a custom categories collection cause any issues down the line?

Hi there, I’m a maintainer of JKAN, open data catalog software powered by Jekyll. JKAN has a datasets collection, and each dataset can have one or more category. Until now, the list of categories has been in a single _data/categories.yml file. But now we want to have a page generated for each category, so we’re planning to move them to a categories collection, e.g. _categories/planning-zoning.md

This seems to work fine, but I’m aware Jekyll has a built-in concept of categories that takes up the site.categories namespace already, and this is overwriting it. We don’t use posts in JKAN, so this doesn’t seem to have an impact, but I wanted to check whether anyone could foresee any issues down the line caused by this?

Normally I’d try to call it something else instead of category to avoid the conflict (or even jkan_category), but I’m trying to align to an established data standard, which uses the property category.

According to this documentation, categories are part of the site.categories collection. While I feel you are not explicitly overriding categories, Jekyll does not allow you to differentiate between the two by calling site.categories (built-in) or site.collections.categories (custom). Instead, you call built-in or custom custom categories using site.categories.

I created a custom categories collection with two items: cat1 and cat2. Next, I iterated through them with the following code:

<h2>Categories</h2>
{%- assign categories = site.categories -%}
{% for category in categories %}
    <h3>category title: {{ category.title }}</h3>
{% endfor %}

The output was as expected:
Screenshot 2023-02-07 at 1.06.29 PM

Next, I listed all the posts in my site, along with the categories tag. There are three posts, one of which contains the categories YML front matter. Here is the code I ran:

<h2>Posts</h2>
{%- assign posts = site.posts -%}
{% for post in posts %}
<h3>{{post.title}} - {{post.categories}}</h3>
{% endfor %}

Sure enough, Jekyll displayed all the posts and, where applicable, displayed the post.categories.
Screenshot 2023-02-07 at 1.08.07 PM

At first glance, it would seem you can create a new categories custom collection. However, if we apply the logic associated with the Tags and Categories documentation, you will see some sample code for tags that you can also apply to categories.

For example, when I run this code with the custom categories collection:

<h2>Jekyll Categories</h2>
{% for category in site.categories %}
  <h3>{{ category[0] }}</h3>
  <ul>
    {% for post in category[1] %}
      <li><a href="{{ post.url }}">{{ post.title }} - {{ post.categories }}</a></li>
    {% endfor %}
  </ul>
{% endfor %}

I receive the following error:

If I comment out the custom categories collection in _config.yml, I get the following expected output:

Screenshot 2023-02-07 at 1.14.24 PM

You can introduce breaking code into the solution using a custom categories collection, as that last bit of code should have run successfully.

In my opinion, even if the code runs now, it might be a breaking change later (like in future versions of Jekyll, for example). If you want to use the built-in categories, along with all the features Jekyll provides, then this seems like a bad idea.

My recommendation is to use a different word as you suggested. Maybe categorization or classification? I like categorization because at least it still seems like categories :-).

Perhaps it is possible to override the built-in categories collection? @ashmaroli, would you have a moment to provide your thoughts on this?

Hi,
As of v4.3.2, Jekyll only supports categories and tags for its ‘posts’ collection. i.e. site.categories only consists of categories defined for individual posts, disregarding categories defined for documents in user-defined collections.

When a user configures a custom collection named categories, it doesn’t override the built-in site.categories per se. Just bypasses it when exposed to Liquid templates. The built-in object is still accessible untouched via Ruby.

Regarding breaking change in the future, it is possible. But it will be only shipped as part of a major release.

1 Like

Just wanted to say thanks to both of you for the detailed replies. We’ve gone with the name dataset_categories. I hate it :face_vomiting: but at least it’s clear and won’t cause conflict (but it does raise the question of whether we should rename our other collection, organizations, to dataset_organizations for consistency…). Anyway, thanks again!

At least now you know :slight_smile:

Glad you have a solution, even if it is not your ideal solution!