schmonz.com is a Fediverse instance that uses the ActivityPub protocol. In other words, users at this host can communicate with people that use software like Mastodon, Pleroma, Friendica, etc. all around the world.
This server runs the snac software and there is no automatic sign-up process.
Are you new to Home Assistant? Have you been using it for a while? Are you noticing it might be getting a little sluggish, or maybe you’re running out of disk space?
Congratulations, you’re not alone – and it’s something that frustrates me about Home Assistant to no end.
When I finally had the time to sit down and learn about this topic and issues – I figured out one of the wins that would help is getting the database size down.
However – this was nowhere near an easy task to achieve, plus it highlighted significant opportunities in the Home Assistant application that – if addressed – could really improve performance and reduce system and device impacts.
The frustration from figuring this caused me to fire off a social thread on Mastodon so I could vent my understandable frustration – which also got some good attention and engagement from the Home Assistant users who frequent the platform. It also caused me to get off my behind, and open a Feature Request discussion on HA’s Github regarding this very problem, in the hope some good might come from it.
But that’s not what you’re here for.
You’re here because you’re having the same problem, and want to either avoid the issue as you get started with Home Assistant, or recover from it after using the platform for a while. So let’s help you with that – by learning from what I did to address this.

A small request – add your upvote to a Home Assistant Feature Request for fixing database management and optimisation
If you too are frustrated by how Home Assistant handles database setup, management and hygiene, or the lack of tools in the UI to keep on top of this across the whole data and sensor lifecycle…
Please take a moment to upvote this feature request on Github:
https://github.com/orgs/home-assistant/discussions/2899
If this opportunity is picked up by Home Assistant, it would significantly contribute to helping others avoid these issues in future.
My inspiration from tackling this issue was drawn from
In scope for this article was from Ed Voncken’s January 2025 blog post Reduce the size of your Home Assistant database, that came up during my research on this issue. It was closest to the most substantial article I’d seen on HA database management and reduction… not great.
Reading through, along with other content explaining the HA data lifecycle, it prompted me to see how I could expand this to achieve the following goals for myself, and those with the same needs – specifically:
Out of scope: There’s some things that I won’t be touching – which include:
If Home Assistant Backups are something you don’t have setup and are not familiar with – then this is a very good time to stop and focus on backups before proceeding any further.
And the good news for you – I’ve personally dogfooded and implemented what I’m covering below; so this isn’t just some theoretical article, It’s real world steps you can use to get this right – even if you’re new to Databases, SQL queries, and the like.
One of the key pieces of knowledge to this issue is understanding the three distinct stages – and thus database tables – that make up Home Assistant’s statistics data lifecycle. These are:
If you’d like a deeper dive into all of this, I’d highly recommend reading the January 2025 article Understanding Home Assistant’s Database and Statistics Model from the Smart Home Scene website.
As you would understand by now – the longer you use Home Assistant, and the more data it captures – the bigger your database becomes, and the slower it can become to access and retrieve statistics.
And for those running Home Assistant on single-board computers, and/or using flash storage (memory card, USB, SSD) for disk drive purposes – this can start to have an appreciable impact.
This is why keeping ‘chatty’ or ephemeral entities out of your database. As this data enters the Recorder, it cannot bloat your short and long term statistics, or your backups too, and avoids excessive writes to your storage medium of choice.
A solid approach for keeping your Home Assistant at peak performance from a database perspective covers three core activities:
recorder: by telling it:
Starting with the right apps and integrations will help make this painless: DBStats (for “what’s big”) and SQLite Web (for quick, safe queries).
Add DBStats to Home Assistant
https://github.com/jehy/hass-addons.sqlite:////homeassistant/home-assistant_v2.db.Add SQLite Web to Home Assistant
Spreadsheet application and a text/code editor
If using SQLite to measure and decide what needs to go, you’ll want to have a spreadsheet application handy to edit your CSV and track your decisions – as this will help you decide on filters for implementing your decisions.
A text/code editor will also be handy for the wildcard rules you’ll be writing and validating, and any SQL queries that you need to change,
Recent backups of your Home Assistant instance
As noted earlier in the piece – If Home Assistant Backups are something you don’t have setup and are not familiar with – then this is a very good time to stop and focus on backups before proceeding any further.
Make sure you:
You can take a manual backup of Home Assistant at any time, and the more cautious and knowledgable amongst you have likely already done this.
This will ensure that, in the unlikely event anything goes wrong, you
Find the biggest entities
There are two ways we can go about this – either getting a list of just your biggest entities, or getting a list of all of them and their impact across the various tables.
Option A — DBStats (no SQL): Just the biggest entries
Option B — SQLite Web (precise and exportable)
WITH-- Distinct text IDs present in each tablestate_ids AS ( SELECT DISTINCT sm.entity_id FROM states s JOIN states_meta sm ON sm.metadata_id = s.metadata_id),sts_ids AS ( SELECT DISTINCT stm.statistic_id AS entity_id FROM statistics_short_term sst JOIN statistics_meta stm ON stm.id = sst.metadata_id), lts_ids AS ( SELECT DISTINCT stm.statistic_id AS entity_id FROM statistics s JOIN statistics_meta stm ON stm.id = s.metadata_id),
-- Master set of all text IDs across the three tablesall_ids AS ( SELECT entity_id FROM state_ids UNION SELECT entity_id FROM sts_ids UNION SELECT entity_id FROM lts_ids ),
-- Per-entity counts for each tablestates_counts AS ( SELECT sm.entity_id, COUNT(*) AS states_count FROM states s JOIN states_meta sm ON sm.metadata_id = s.metadata_id GROUP BY sm.entity_id),sts_counts AS ( SELECT stm.statistic_id AS entity_id, COUNT(*) AS sts_count FROM statistics_short_term sst JOIN statistics_meta stm ON stm.id = sst.metadata_id GROUP BY stm.statistic_id),lts_counts AS ( SELECT stm.statistic_id AS entity_id, COUNT(*) AS lts_count FROM statistics s JOIN statistics_meta stm ON stm.id = s.metadata_id GROUP BY stm.statistic_id)SELECT a.entity_id, COALESCE(sc.states_count, 0) AS states_count, COALESCE(stc.sts_count, 0) AS statistics_short_term_count, COALESCE(lc.lts_count, 0) AS statistics_count, (COALESCE(sc.states_count,0) + COALESCE(stc.sts_count,0) + COALESCE(lc.lts_count,0)) AS total_rowsFROM all_ids aLEFT JOIN states_counts sc ON sc.entity_id = a.entity_idLEFT JOIN sts_counts stc ON stc.entity_id = a.entity_idLEFT JOIN lts_counts lc ON lc.entity_id = a.entity_idORDER BY total_rows DESC, a.entity_id;
This query will give you a really nice result, which you can export to CSV, and see all of the entities in your instance, and how many records they have across the three statistics tables:
Screenshot of CSV results of SQLite query, listing all entities in a Home Assistant instance, with a count of the number of rows in each of the statistics tables, and a total count of rows across all tables.
Now that we know what entities exist in our Home Assistant instance, and how many records each is already creating – let’s move to the next steps.
Review your lists
If you chose to use Option A (DBStats): You can decide which of these biggest entity_id’s you want to get rid of, then move onto the next step.
If you chose to use Option B (SQLite export): Let’s open up the CSV file in the spreadsheet tool of your choice, and sort the data by entity_id. This should group entities of the same type and name together.
I’d also recommend adding a new column at this point (call it Remove?), and for anything you want to get rid of – add a Y to this column every time you no longer want to keep something.
Deciding what to keep, and what not to keep, long term
Think about what is shown in the list – and ask yourself these two questions for each entity:
If the answer to both questions is no, then filter its data out of the the recorder entirely. You don’t need it, as you either won’t use it, or can get it another way without much effort.
Some of the things I’ve excluded from my recorder include (this is a big list, but then again my HA instance has 3000+ entities of all types and stripes):
Deciding how long you want to keep data in the states table
When performing your dry run, and eventual delete, queries – we need to know how long in days you want to keep things for (the purge_keep_days: value in the recorder: configuration).
I choose to offload data into short-term and long-term storage after 30 days, but you may want to keep it longer or for less – the choice is up to you.
When making this decision, consider these factors:
This information will allow you to find the right balance between system performance, dashboard performance, database size, and read/write operations to your storage.
Building your rules and wildcards
As noted earlier, if using Option B and working with the CSV – there’s a good chance you’ve already decided to mark all the items that are not needed and destined for removal. Let’s start building our rules.
Sensor name
No. of States Entries
Remove?
binary_sensor.aurora_alert
43
y
binary_sensor.aurora_outlook
41
y
binary_sensor.aurora_watch
41
y
binary_sensor.bedtime_period
71
y
binary_sensor.combined_door_window_and_motion_sensors
3260
binary_sensor.courtyard_shed_door_sensor_zha_contact
292
binary_sensor.dining_room_ikea_window_sensor_zha_contact
294
binary_sensor.dining_room_motion_sensor_matter_occupancy
133
binary_sensor.door_and_window_sensors
574
binary_sensor.hallway_motion_sensor_2_zha_occupancy
1461
Here are some examples of safe, and unsafe rules for this list of sensors:
Safe wildcard rules
Unsafe wildcard rules
binary_sensor.aurora_* (as this gets us all the aurora sensors)
binary_sensor.bedtime_* (as this gets us any bedtime sensors)
binary_sensor.* (as this would remove recording for all the binary sensors in our system, including the ones we’ve decided to keep)
Build and test your query
Next, we want to build a query from the rules we’ve created, so we can check what will be removed by our rules.
Perform a dry run and check the results
And now, let’s do a dry run with our filters, to check exactly what will be removed when we proceed to the next step:
-- === DRY RUN: show what WOULD be deleted (no changes) ===
PRAGMA foreign_keys=ON;
-- 60-day cutoff in Unix seconds (UTC)
WITH cutoff(ts) AS (
SELECT CAST(strftime('%s','now','-60 days') AS REAL)
)
SELECT datetime(ts, 'unixepoch', 'localtime') AS cutoff_local
FROM cutoff;
-- 1) Your wildcard list as GLOB patterns (exactly as provided; '*' means "any string")
CREATE TEMP TABLE kill_globs(glob TEXT PRIMARY KEY);
INSERT INTO kill_globs(glob) VALUES
('automation.*'),
('sensor.flightradar24_*'),
('sensor.*_linkquality'),
('sensor.*_battery'),
('select.*_identify'),
('button.*_identify'),
('button.*_identify*'),
('sensor.*_voltage'),
('sensor.system_monitor_*'),
-- Things to do with updates
('sensor.*_newest_version'),
('switch.*_pre_release'),
('sensor.*_update_state'),
('update.*'),
-- Things to do with an integration or device
('sensor.*_power_outage_count'),
('button.*_thread_sync_time'),
('switch.esphome_device_builder'),
('media_player.*'),
('remote.*'),
('tts.*'),
-- Lights/switches noise
('sensor.*_color_options'),
('sensor.*_matter_battery_type'),
('sensor.*_matter_battery_voltage'),
('sensor.*_zha_occupancy_timeout'),
('select.*_zha_effect'),
('select.*_thread_effect'),
('select.*_zha_power_on_behavior'),
('select.*_thread_power_on_behavior'),
('switch.*_zha_child_lock'),
('switch.*_zha_led_enable'),
('sensor.*_zha_requested_brightness_level'),
('sensor.*_zha_requested_brightness_percent');
-- 2) Resolve target metadata ids from your globs
CREATE TEMP TABLE target_states_meta AS
SELECT sm.metadata_id
FROM states_meta sm
JOIN kill_globs kg ON sm.entity_id GLOB kg.glob;
CREATE TEMP TABLE target_statistics_meta AS
SELECT stm.id AS metadata_id
FROM statistics_meta stm
JOIN kill_globs kg ON stm.statistic_id GLOB kg.glob;
-- 3) Show which ids we matched (optional samples)
SELECT 'states_meta matches' AS what, COUNT(*) AS cnt FROM target_states_meta
UNION ALL
SELECT 'statistics_meta matches', COUNT(*) FROM target_statistics_meta;
-- 4) Show would-be delete counts per table
WITH cutoff(ts) AS (SELECT CAST(strftime('%s','now','-60 days') AS REAL))
SELECT 'states' AS table_name, COUNT(*) AS rows_would_delete
FROM states, cutoff
WHERE states.metadata_id IN (SELECT metadata_id FROM target_states_meta)
AND states.last_updated_ts < cutoff.ts
UNION ALL
SELECT 'statistics_short_term', COUNT(*)
FROM statistics_short_term, cutoff
WHERE statistics_short_term.metadata_id IN (SELECT metadata_id FROM target_statistics_meta)
AND statistics_short_term.start_ts < cutoff.ts
UNION ALL
SELECT 'statistics', COUNT(*)
FROM statistics, cutoff
WHERE statistics.metadata_id IN (SELECT metadata_id FROM target_statistics_meta)
AND statistics.start_ts < cutoff.ts;
SELECT CAST(strftime('%s','now','-60 days') AS REAL), replacing 60 with the number of days you wish to keep data in the states table for.For example, if you choose 30 days, the line should read:SELECT CAST(strftime('%s','now','-30 days') AS REAL)
INSERT INTO kill_globs(glob) VALUES with the list of wildcard entries you’ve listed to exclude from your statistics. Always make sure the correct format for each entry is used:('automation.*'),Note: The last entry with your wildcards should always be as follows:('sensor.*_zha_requested_brightness_percent');
Repeating these steps until you have only the entities you want to remove will ensure you get this right first time, and don’t end up either accidently deleting data you want, or keeping data you don’t.
This is where the danger beings, as we’re going to start changing and removing what information is stored within the Home Assistant database.
If you’re unsure about anything, then stop, take a breath, and re-do all of the Test steps one last time before proceeding – so you know what is going to be affected.
Update configuration.yaml for recorder exclusions
configuration.yaml file to add the recorder: block based on your retention day decisions and final wildcards you tested and validated in the earlier dry run testing stages. Adding comments to your exclusions is also a good idea, so you know what was added, and why you added it, for future reference.Below is an example of my recorder: block configuration, with comments, so you can see the types of things I don’t have a need for, and the comments I added for them (and yes, this is a long list).
recorder:
auto_purge: true
purge_keep_days: 60
commit_interval: 30
exclude:
entity_globs:
# Automations
- automation.*
# Generic noise - stuff you're never going to analyse, or want to look at again
- sensor.*_linkquality
- sensor.*_battery
- select.*_identify
- button.*_identify
- button.*_identify*
- sensor.*_voltage
- sensor.system_monitor_*
# Update chatter
- sensor.*_newest_version
- switch.*_pre_release
- sensor.*_update_state
- update.*
# Integration/device chatter
- sensor.*_power_outage_count # Your choice to keep this one or not
- button.*_thread_sync_time
- switch.esphome_device_builder
- media_player.*
- remote.*
- tts.*
# Lights/switches - things you'll likely never want to analyse
- sensor.*_color_options
- sensor.*_matter_battery_type
- sensor.*_matter_battery_voltage
- sensor.*_zha_occupancy_timeout
- select.*_zha_effect
- select.*_thread_effect
- select.*_zha_power_on_behavior
- select.*_thread_power_on_behavior
- switch.*_zha_child_lock
- switch.*_zha_led_enable
- sensor.*_zha_requested_brightness_level
- sensor.*_zha_requested_brightness_percent
# Calibration
- number.*_humidity_calibration
- number.*_temperature_calibration
# Mobile telemetry
- sensor.your_mobile_phone_charger_type
- sensor.your_mobile_phone_current_version
- sensor.your_mobile_phone_total_*
- sensor.your_mobile_phone_wi_fi_*
- sensor.your_mobile_phone_audio_output
- sensor.your_mobile_phone_audio_output
# Audio system and TV's - because nobody wants to know what volume you had the audio at on average for the last three years
- number.audio_system_*
- sensor.audio_system_*
- switch.audio_system_*
- binary_sensor.audio_system_*
- device_tracker.audio_system_*
- button.tv_system_*
- device_tracker.tv_systems_*
# Trackers - Because if you have a network integration, these are just extra sensors you won't need outside a specific point in time for troubleshooting purposes, and you won't be coming to Home Assistant for that
- device_tracker.*
# Core cloud
- switch.cloud_*
# Weather forecasts
- sensor.placename_*
- sensor.*_icon_descriptor_*
- sensor.*_extended_text_*
- sensor.*_mdi_icon_*
- sensor.home_air_quality_day_*
- sensor.home_apparent_temperature
- sensor.home_cloud_ceiling
- sensor.home_condition_day_*
- sensor.home_condition_night_*
- sensor.home_dew_point
- sensor.home_grass_pollen_day_*
- sensor.home_hours_of_sun_day_*
- sensor.home_humidity
- sensor.home_lightning_*
- sensor.home_precipitation
- sensor.home_pressure
- sensor.home_realfeel_temperature*
- sensor.home_thunderstorm_probability_day_*
- sensor.home_thunderstorm_probability_night_*
- sensor.home_tree_pollen_day_*
- sensor.home_uv_index*
- sensor.home_wind_gust_speed
- sensor.home_wind_speed
- sensor.home_wind_speed_day_*
- sensor.home_wind_speed_night_*
# OpenWeatherMap sensors
- sensor.openweathermap_*
- weather.openweathermap
- weather.placename
# Zigbee2MQTT bridge
- select.zigbee2mqtt_bridge_log_level
- switch.zigbee2mqtt_bridge_permit_join
- button.zigbee2mqtt_bridge_restart
- sensor.zigbee2mqtt_bridge_version
# Network integration data
- sensor.network_router_*
# Uptime Kuma - if you use it, list all your sensors here, otherwise you're doubling up on data it already holds.
- sensor.site_name_*
# Servers, hardware and office equipment
- sensor.rpi_* # broader than sensor.rpi_rpi_*
- sensor.hp_* # OfficeJet + printer families
# Your Bluetooth connected toothbrush (Yes, Home Assistant can get data from these)
- sensor.io_series_*_signal_strengthconfiguration.yaml, and close the file.Use SQLite to delete old data
WITH
kill_globs(glob) AS (
VALUES
('automation.*'),
('sensor.flightradar24_*'),
('sensor.*_linkquality'),
('sensor.*_battery'),
('select.*_identify'),
('button.*_identify'),
('button.*_identify*'),
('sensor.*_voltage'),
('sensor.system_monitor_*'),
('sensor.*_newest_version'),
('switch.*_pre_release'),
('sensor.*_update_state'),
('update.*'),
('sensor.*_power_outage_count'),
('button.*_thread_sync_time'),
('switch.esphome_device_builder'),
('media_player.*'),
('remote.*'),
('tts.*'),
('sensor.*_color_options'),
('sensor.*_matter_battery_type'),
('sensor.*_matter_battery_voltage'),
('sensor.*_zha_occupancy_timeout'),
('select.*_zha_effect'),
('select.*_thread_effect'),
('select.*_zha_power_on_behavior'),
('select.*_thread_power_on_behavior'),
('switch.*_zha_child_lock'),
('switch.*_zha_led_enable'),
('sensor.*_zha_requested_brightness_level'),
('sensor.*_zha_requested_brightness_percent')
),
cutoff(ts) AS (SELECT CAST(strftime('%s','now','-60 days') AS REAL))
DELETE FROM statistics_short_term
WHERE metadata_id IN (
SELECT stm.id
FROM statistics_meta stm
JOIN kill_globs kg ON stm.statistic_id GLOB kg.glob
)
AND start_ts < (SELECT ts FROM cutoff);
WITH
kill_globs(glob) AS (
VALUES
('automation.*'),
('sensor.flightradar24_*'),
('sensor.*_linkquality'),
('sensor.*_battery'),
('select.*_identify'),
('button.*_identify'),
('button.*_identify*'),
('sensor.*_voltage'),
('sensor.system_monitor_*'),
('sensor.*_newest_version'),
('switch.*_pre_release'),
('sensor.*_update_state'),
('update.*'),
('sensor.*_power_outage_count'),
('button.*_thread_sync_time'),
('switch.esphome_device_builder'),
('media_player.*'),
('remote.*'),
('tts.*'),
('sensor.*_color_options'),
('sensor.*_matter_battery_type'),
('sensor.*_matter_battery_voltage'),
('sensor.*_zha_occupancy_timeout'),
('select.*_zha_effect'),
('select.*_thread_effect'),
('select.*_zha_power_on_behavior'),
('select.*_thread_power_on_behavior'),
('switch.*_zha_child_lock'),
('switch.*_zha_led_enable'),
('sensor.*_zha_requested_brightness_level'),
('sensor.*_zha_requested_brightness_percent'),
('number.*_humidity_calibration'),
('number.*_temperature_calibration')
),
cutoff(ts) AS (SELECT CAST(strftime('%s','now','-60 days') AS REAL))
DELETE FROM statistics
WHERE metadata_id IN (
SELECT stm.id
FROM statistics_meta stm
JOIN kill_globs kg ON stm.statistic_id GLOB kg.glob
)
AND start_ts < (SELECT ts FROM cutoff);
WITH
kill_globs(glob) AS (
VALUES
('automation.*'),
('sensor.flightradar24_*'),
('sensor.*_linkquality'),
('sensor.*_battery'),
('select.*_identify'),
('button.*_identify'),
('button.*_identify*'),
('sensor.*_voltage'),
('sensor.system_monitor_*'),
('sensor.*_newest_version'),
('switch.*_pre_release'),
('sensor.*_update_state'),
('update.*'),
('sensor.*_power_outage_count'),
('button.*_thread_sync_time'),
('switch.esphome_device_builder'),
('media_player.*'),
('remote.*'),
('tts.*'),
('sensor.*_color_options'),
('sensor.*_matter_battery_type'),
('sensor.*_matter_battery_voltage'),
('sensor.*_zha_occupancy_timeout'),
('select.*_zha_effect'),
('select.*_thread_effect'),
('select.*_zha_power_on_behavior'),
('select.*_thread_power_on_behavior'),
('switch.*_zha_child_lock'),
('switch.*_zha_led_enable'),
('sensor.*_zha_requested_brightness_level'),
('sensor.*_zha_requested_brightness_percent'),
('number.*_humidity_calibration'),
('number.*_temperature_calibration')
),
cutoff(ts) AS (SELECT CAST(strftime('%s','now','-60 days') AS REAL))
DELETE FROM states
WHERE metadata_id IN (
SELECT sm.metadata_id
FROM states_meta sm
JOIN kill_globs kg ON sm.entity_id GLOB kg.glob
)
AND last_updated_ts < (SELECT ts FROM cutoff);
recorder.disable, then press Perform action.recorder.enable, then press Perform action.PRAGMA wal_checkpoint(TRUNCATE); VACUUM;
For more information on why this helps, see: Recorder docs; and you can also run step using recorder.purge from Developer Tools > Actions if you wish.
And that’s it, we’re done. And well done to you too.
You’ve successfully cleaned up the database of your Home Assistant instance, and have it back in a position where it’s only recording the things it needs to record for future analysis.
It might also be a good time to get a hot beverage and a biscut, followed by up-voting my Github feature request post for this to all be native and baked into Home Assistant from day one.
I'd prefer #NetBSD: https://schmonz.com/2024/06/07/small-arms/
Staged latest shairport-sync for #pkgsrc. Builds on NetBSD, #macOS. Normally I'd commit, wait for evbearmv6hf-el binary package, forget.
Trying something new today: https://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/doc/HOWTO-use-crosscompile