Matomo integration in HTMLy

This integration originated from a very practical need: obtaining meaningful statistics from Matomo installations on small or minimally managed hosting environments.

Matomo Logo

In many setups the problem is not Matomo itself, but the surrounding infrastructure. If you run a site on a fully controlled server you can usually mitigate bad traffic and tracking issues using tools such as firewall rules, bot filtering, reverse proxies, or automated banning systems. However, many users do not have that level of control over their hosting environment. Typical situations include:

shared hosting environments
simple VPS setups without advanced security tooling
systems without automated protection mechanisms such as Fail2ban
sites that cannot easily deploy reverse proxies, WAF rules, or custom firewall logic

In these contexts, analytics data can quickly become polluted by automated traffic, scanners, crawlers, and other unwanted requests. As a result, the statistics collected by Matomo can become unreliable or misleading.

Another practical issue is that many sites rely on external protection layers that introduce additional friction for visitors. A common example is the well-known “I'm not a robot” challenges introduced by services like Cloudflare. While such systems may protect the site, they can also degrade user experience and distort analytics by blocking or altering legitimate traffic patterns.

The goal of this work is therefore simple:

obtain cleaner Matomo statistics
reduce the impact of automated or malicious traffic
avoid intrusive third-party protection layers
keep the setup compatible with minimal hosting environments

In short, the idea is to improve the quality of analytics without requiring advanced infrastructure or heavy external services.

A secondary motivation is philosophical: Matomo is designed as a self-hosted analytics platform that gives full control over the data and the tracking environment. Running it in a way that minimizes external dependencies and unnecessary intermediaries aligns well with that philosophy.

Javascript tracking

Matomo Javascript tracking for HTMLy is standard Matomo tracking function, added at the end of each page. Templates must include it, same as for Google Analytics, at the end of template file layout.html.php:

    <?php if (analytics()): ?><?php echo analytics(); ?><?php endif; ?>
    <?php if (matomo()): ?><?php echo matomo($locals); ?><?php endif; ?>
</body>
</html>

Javascript tracking excludes by design all browsers not supporting javascript, mainly bots and crawlers. If that's a desired behaviour if you want to only see "real" visitors in you Matomo stats, it prevents you to know if someone is pinging your site constantly or grabbing content, wasting bandwidth and resources.

PHP backend tracking

PHP backend tracking uses official Matomo PHP tracker.

However, simple PHP tracking hides some pitfalls. It tracks everything - that means bot, crawlers, everything you can find in Apache/Nginx logs. That gives you a whole picture of the site accesses, but it didn't differentiate bots/crawlers from real visitors. Matomo has an integrated bot recognition system based on user agent, but user agent is easily faked, and a lot of bot/crawlers send real browsers data.

HTMLy PHP backend doesn't simply use Matomo PHP Tracker, but perform some checks to identify bots and block ASN.

Features

A list of available features and configuration from Matomo section inside HTML widgets configuration:

cookies: you can enable or disable cookies in Matomo tracking; cookies let you identify returning visitors, but needs European GDPR tracking cookies acceptance:
ASN bot list: needs GeoIP proper configuration; let HTMLy identify bots based on ASN (datacenters are usually bots) and send this information to Matomo;
ASN block list: needs GeoIP proper configuration; let HTMLy send a 403 unauthorized error, preventing access to the blog to that ASN; page view counter: you can decide if counters are incremented on each visit, or exclude bot/crawlers from the counters;
visitor type custom dimension: capability to populate Matomo custom dimension to set "real" or "bot/crawler" value, letting statistics to be segmented
bots Site ID: specify a different site ID for bots statistics
Matomo sending method: define if first visit information are sent using a cronjob or on pageload - cronjob is recommended if you can set it up

Matomo tracking

Basic configuration with Matomo server URL, site ID, authentication key (needed for PHP backend only).

Cookies settings

This affects mainly javascript tracking, as PHP tracking avoids anyway to use the cookie. Considering strict GDPR European compliance, setting it off saves a lot of policy acceptance form and privacy agreements. Disabling cookies leaves only the standard "technical" PHP session cookie.

BOT identification

All request coming from ASN bot list are marked as bots, while ASN in block list will receive a 403 status (unauthorized). You can set real/bot value using Matomo custom dimensions, and same for ASN/Provider. This let you create segments in Matomo showing only real or bot traffic. Bots visits can be also sent to a specific site ID (one site ID for real visits, one site ID for bots traffic). The bot recognition system needs to save first visit (or sequential visits for browsers not supporting PHP session cookie) in a JSON temporary file (server side) and send info to Matomo asynchronously. Asynchronous statistics can be sent on page load, or using a cronjob (recommended).

Cronjob

Chosing cronjob to send asyncronoous data needs a cronjob to be set running:

php /var/www/htmly/system/resources/php/matomo-cronjob.php

where:

/var/www/htmly

have to be adjusted to the folder of your HTMLy installation. Do not set it using wget, because that folder is only accessible using CLI for security reasons.

How it works

First detection is based on JavaScript. Usually bots and crawlers just gather content, without supporting JavaScript. Matomo implementation in HTMLy at first visit check JavaScript (client side) calling a specific file using JavaScript and sending screen information (width, height, color space and pixel ratio). No information is sent to Matomo at this time.

A JSON file with information to be sent is temporarily saved for later. If JavaScript call is performed, the called script (js, but really PHP - URL rewrite) add screen data and send everything to Matomo, deleting the file. If JavaScript call is not performed (so no javasc ript support), the JSON file stay there and is sent to Matomo asynchronously by a cronjob or next page load.

In both situations user agent is checked for bots using browscap extension (https://browscap.org and https://www.php.net/manual/en/function.get-browser.php), and IP address against a list of ASN - Autonomous System Number, used by Internet Service Providers (ISPs) and large organizations (e.g. Google, Facebook...) to facilitate BGP (Border Gateway Protocol) routing.

ASN information comes from GeoIP plugin from MaxMind (Apache integration, $_SERVER['MM_ASN'] and $_SERVER['MM_ASORG'] variables) and a list of datacenter IP ranges to be saved in config/data/datacenters.csv - the list of datacenter IP ranges can be updated from https://github.com/growlfm/ipcat

Any ipcat csv file works (started from client9 repo https://github.com/client9/ipcat), in format: ip_start,ip_end,ANS_name,ASN_link and it is used as fallback from GeoIP Plugin.

Emidio
08 November 2025
Published on GitHub - https://github.com/Emidio - the modded HTMLy 3.1.1 and the Ovi theme with comments integration.
Fred
07 November 2025
Finally, someone with enough technical coding knowledge to set up a comment system for this application. I anxiously await further information (where to download) etc. It is looking awesome.

Matomo integration in HTMLy

Javascript tracking

PHP backend tracking

Features

Matomo tracking

Cookies settings

BOT identification

Cronjob

How it works

Comments

Leave a Comment

Recent posts

Comments

Webcam

Search