Getting old URL from a redirected page

Hello!

I use Jekyll via Gitlab Pages. I’ve set up a 404 page, and I see via telemetry that it hits quite often after the move to Jekyll from Wordpress.

Is there a way for me to find which URL on my site redirected to this 404? Something I could add to the post URL or post title that I can see on my telemetry dashboard - like https://site/404/link-to-not-working-url/ – which still shows up as the 404 page to visitors, but lets me collect this info and fix things up.

Edit: Noting that I currently have my 404 page set up as a 404.html page in my root dir with Permalink: 404.html.

Thanks!

Typical telemetry/web-analytics will record the Title and URL of the page being viewed. You could then filter by the 404 page Title to find the broken page URLs. Another alternative is to record a custom event for 404 pages. What telemetry system are you using?

If you can only see URLs in your telemetry, then one trick would be to force the 404 page to always navigate to a /404/... URL before telemetry is recorded. Something like could be added to 404.html (untested code):

<script>
if (! window.location.pathname.startsWith('/404/')) {
    window.location.pathname = '/404/'+window.location.pathname
}
</script>

A bit kludgy, but it should give /404/-marked URLs in the telemetry.

Thank you! I was hoping for a Jekyll variable I could use.

This works and does the job, though, so all good for now.

If someone’s looking for inspiration to implement something in jekyll, this could be a nifty thing to do.

Spoke too soon - I use goatcounter as the telemetry site, and it just displays /404.html as the page URL… So this solution still doesn’t work.

Are you using GoatCounter’s Javascript or Image-based tracker code? Since your building a static Jekyll site, the Image-based tracker wouldn’t be able to provide path/title info for 404 pages.

Browsing around GoatCounter docs, it looks like it will use a page’s canonical URL if it is present, rather than the actual page URL. I’m guessing your 404.html has something like the following in its head (due to Github-Pages default use of the Jekyll-SEO-Plugin):

<link rel="canonical" href="https://example.com/404.html" />

This causes GoatCounter to always report the URL as /404.html instead of the real page URL. This is really a bug/feature of GoatCounter depending on one’s point of view. Doc at: https://www.goatcounter.com/help/path

Possible Work-Arounds (untested):

Hey this is brilliant! Thank you for digging into this.

I added

+<script>
+    window.goatcounter = {
+        path: '/404'+location.pathname,
+    }
+</script>

And it shows up as /404/missing-content in goatcounter logs.

I hadn’t realized it was the SEO plugin causing things to be munged up. Turned out to not be a Jekyll problem after all.

Thank you!

1 Like

Yes, there are a few ways you can track the URLs that are leading to your 404 page.

  1. Use server logs: Most web servers keep logs of all requests that they receive, including those that result in a 404 error. You can use these logs to identify which URLs are leading to your 404 page. The exact method for accessing and analyzing server logs will depend on your hosting provider and server setup.
  2. Use Google Analytics: If you have Google Analytics set up on your website, you can create a custom 404 error pageview tracking code that will send an event to Google Analytics whenever someone visits your 404 page. You can then view this data in your Google Analytics dashboard to see which URLs are leading to your 404 page.

Here’s an example of how you can implement this tracking code:

In your 404.html file, add the following JavaScript code:

javascriptCopy code

<script>
  // Send an event to Google Analytics when the 404 page loads
  ga('send', 'event', '404 Error', 'Page Not Found', window.location.pathname);
</script>

This code assumes that you have Google Analytics set up on your site, and that you’re using the classic analytics tracking code (which uses the ga function).

  1. Use a third-party error tracking service: There are several third-party services that can help you track and analyze 404 errors on your website, such as Sentry, Bugsnag, or Rollbar. These services typically require you to add a JavaScript snippet to your website, and they can provide more detailed information about the errors, including the referring URL.

By adding a URL parameter or modifying the post URL or post title, as you suggested, you can include additional information in the error tracking data that you collect. For example, you could append a query parameter to the URL like https://site.com/not-found?referrer=bad-link, or modify the URL like https://site.com/not-found/bad-link. This will allow you to more easily identify the referring URL that led to the 404 error in your tracking data.