Working towards a better 404 "page not found" experience

I see many requests in my analytics for pages that don’t exist, examples:

  • URL has null character at the end (assuming a bug in a client/browser somewhere)
  • URL is missing final character(s) (copy and paste error)
  • URL has added text such as time (copy and paste error)

I’m looking to provide a better experience for my viewers and reduce the number of these error pages.

Is there a way partial URLs could be resolved?

For example, a valid URL path:

  • /2023/08/22/this-is-an-example

And invalid examples would be:

  • /2023/08/22/this-is-an-exampl
  • /2023/08/22/this-is-an-example[null]
  • /2023/08/22/this-is-an-example22:14
  • /2023/08/22/this-is-an-exgoogleample

Can we resolve to the closest match? Or use the date to match the correct post?

Normally I would do this with server config so I am unsure how to proceed.

I host on Netlify.

The impression I get from my analytics (mostly Matomo) is that human-made URL-typos are very rare. The type of errors you’re seeing seem more typical of buggy crawlers that are incorrectly triggering analytics. Of course, no matter what the source, it is wise to provide people with a good 404 experience (Robots are people too! :-).

Most static hosting services have only limited (Netlify) or no (Github-Pages) support for server-side redirects. Nothing like Apache’s mod_speling is available. One option is to use client-side Javascript to help out. A good example of this is Balter’s “Helpful 404s”:

3 Likes

That’s exactly the type of thing I was looking for.

Last night I implemented something similar using a google custom search but it’s not as good as a totally “local” solution. You can try it on my site now.

But I’ll implement this solution and replace what I have this weekend. Thank you.

ps: all the examples in my OP were pulled from my analytics. The most interesting ones for me are people typing something else in the middle of the URL. Oops!

yikes, sadly my TypeScript knowledge is non-existent so I have no idea how to implement this in jekyll.

Yea, the TypeScript makes it complicated. Here a scrapping of the compiled JS from Balter’s site (I tested locally and it seemed to be working):

Suggestion: <span id="four-oh-four-suggestion"><span>

<script>
const En = new Uint32Array(65536),
Sn = (t, e) => {
	if (t.length < e.length) {
		const n = e;
		e = t,
		t = n
	}
	return 0 === e.length ? t.length : t.length <= 32 ? ((t, e) => {
		const n = t.length,
			i = e.length,
			r = 1 << n - 1;
		let s = -1,
			o = 0,
			a = n,
			c = n;
		for (; c--;)
			En[t.charCodeAt(c)] |= 1 << c;
		for (c = 0; c < i; c++) {
			let t = En[e.charCodeAt(c)];
			const n = t | o;
			t |= (t & s) + s ^ s,
			o |= ~(t | s),
			s &= t,
			o & r && a++,
			s & r && a--,
			o = o << 1 | 1,
			s = s << 1 | ~(n | o),
			o &= n
		}
		for (c = n; c--;)
			En[t.charCodeAt(c)] = 0;
		return a
	})(t, e) : ((t, e) => {
		const n = e.length,
			i = t.length,
			r = [],
			s = [],
			o = Math.ceil(n / 32),
			a = Math.ceil(i / 32);
		for (let t = 0; t < o; t++)
			s[t] = -1,
			r[t] = 0;
		let c = 0;
		for (; c < a - 1; c++) {
			let o = 0,
				a = -1;
			const l = 32 * c,
				u = Math.min(32, i) + l;
			for (let e = l; e < u; e++)
				En[t.charCodeAt(e)] |= 1 << e;
			for (let t = 0; t < n; t++) {
				const n = En[e.charCodeAt(t)],
					i = s[t / 32 | 0] >>> t & 1,
					c = r[t / 32 | 0] >>> t & 1,
					l = n | o,
					u = ((n | c) & a) + a ^ a | n | c;
				let h = o | ~(u | a),
					d = a & u;
				h >>> 31 ^ i && (s[t / 32 | 0] ^= 1 << t),
				d >>> 31 ^ c && (r[t / 32 | 0] ^= 1 << t),
				h = h << 1 | i,
				d = d << 1 | c,
				a = d | ~(l | h),
				o = h & l
			}
			for (let e = l; e < u; e++)
				En[t.charCodeAt(e)] = 0
		}
		let l = 0,
			u = -1;
		const h = 32 * c,
			d = Math.min(32, i - h) + h;
		for (let e = h; e < d; e++)
			En[t.charCodeAt(e)] |= 1 << e;
		let f = i;
		for (let t = 0; t < n; t++) {
			const n = En[e.charCodeAt(t)],
				o = s[t / 32 | 0] >>> t & 1,
				a = r[t / 32 | 0] >>> t & 1,
				c = n | l,
				h = ((n | a) & u) + u ^ u | n | a;
			let d = l | ~(h | u),
				m = u & h;
			f += d >>> i - 1 & 1,
			f -= m >>> i - 1 & 1,
			d >>> 31 ^ o && (s[t / 32 | 0] ^= 1 << t),
			m >>> 31 ^ a && (r[t / 32 | 0] ^= 1 << t),
			d = d << 1 | o,
			m = m << 1 | a,
			u = m | ~(c | d),
			l = d & c
		}
		for (let e = h; e < d; e++)
			En[t.charCodeAt(e)] = 0;
		return f
	})(t, e)
};

const e = document.getElementById("four-oh-four-suggestion");
if (null != e) {
	const t = new XMLHttpRequest;
	t.onload = () => {
		if (200 === t.status) {
			const n = t.responseXML,
				i = Array.from(n.querySelectorAll("urlset > url > loc")).map((t => t.textContent)),
				r = new URL(((t, e) => {
					let n = 1 / 0,
						i = 0;
					for (let r = 0; r < e.length; r++) {
						const s = Sn(t, e[r]);
						s < n && (n = s, i = r)
					}
					return e[i]
				})(window.location.href, i));
			e.innerHTML = `<a href="${r.href}">${r.pathname}</a>`
		} else
			e.innerHTML = '<a href="/">/</a>'
	},
	t.open("GET", `${window.location.protocol}//${window.location.host}/sitemap.xml`),
	t.send()
}
</script>
1 Like

Implemented! Many thanks.

My blog is on my profile if you want to check it out.

1 Like

I expanded on the same idea and implemented a rudimentary search page.

Currently it matches against the URL path/slug only.

Better than nothing!

1 Like