schmonz.com

Search results for tag #rpi

Glenn boosted

All things empty » 🌐 2026-02-18 / 2026-02-18
@allthingsempty.wordpress.com@allthingsempty.wordpress.com

Optimising Your Home Assistant Database (Without Tears)

Are you new to Home Assistant? Have you been using it for a while? Are you noticing it might be getting a little sluggish, or maybe you're running out of disk space? Congratulations, you're not alone - and it's something that frustrates me about Home Assistant to no end. When I finally had the time to sit down and learn about this topic and issues - I figured out one of the wins that would help is getting the database size down.However - this was nowhere near an easy task to achieve, plus […] [SENSITIVE CONTENT]

Are you new to Home Assistant? Have you been using it for a while? Are you noticing it might be getting a little sluggish, or maybe you’re running out of disk space?

Congratulations, you’re not alone – and it’s something that frustrates me about Home Assistant to no end.

When I finally had the time to sit down and learn about this topic and issues – I figured out one of the wins that would help is getting the database size down.

However – this was nowhere near an easy task to achieve, plus it highlighted significant opportunities in the Home Assistant application that – if addressed – could really improve performance and reduce system and device impacts.

The frustration from figuring this caused me to fire off a social thread on Mastodon so I could vent my understandable frustration – which also got some good attention and engagement from the Home Assistant users who frequent the platform. It also caused me to get off my behind, and open a Feature Request discussion on HA’s Github regarding this very problem, in the hope some good might come from it.

But that’s not what you’re here for.

You’re here because you’re having the same problem, and want to either avoid the issue as you get started with Home Assistant, or recover from it after using the platform for a while. So let’s help you with that – by learning from what I did to address this.

A small request – add your upvote to a Home Assistant Feature Request for fixing database management and optimisation

If you too are frustrated by how Home Assistant handles database setup, management and hygiene, or the lack of tools in the UI to keep on top of this across the whole data and sensor lifecycle…

Please take a moment to upvote this feature request on Github:

https://github.com/orgs/home-assistant/discussions/2899

If this opportunity is picked up by Home Assistant, it would significantly contribute to helping others avoid these issues in future.

In this post…

What you’ll achieve by the end of this how to
Understanding Home Assistant’s data lifecycle
What’s the right approach for keeping your database under control?
Tool up: Getting the right tools for the job
Measure: Finding your entries and their size impact
- Find the biggest entities
Plan: What data are we going to keep, or get rid of
Test: Have we got it right before proceeding with destructive actions?
- Build and test your query
Clean up: Take the steps to implement and do the necessary housekeeping
- Update configuration.yaml for recorder exclusions
- Use SQLite to delete old data
You’ve reached the finish line

What you’ll achieve by the end of this how to

My inspiration from tackling this issue was drawn from

In scope for this article was from Ed Voncken’s January 2025 blog post Reduce the size of your Home Assistant database, that came up during my research on this issue. It was closest to the most substantial article I’d seen on HA database management and reduction… not great.

Reading through, along with other content explaining the HA data lifecycle, it prompted me to see how I could expand this to achieve the following goals for myself, and those with the same needs – specifically:

Understanding how Home Assistant handles statistics: I’ll give you a brief rundown of this, with links to the relevant resources if you’re curious, so you understand the statistics lifecycle that will support you with the rest of these steps.
How to identify what you have, and make decisions about what to keep and what to throw out.
How to implement these decisions the right way, so your life becomes that little bit easier.

Out of scope: There’s some things that I won’t be touching – which include:

Backing up Home Assistant: We need to assume that your instance is being properly backed up, both onsite and offsite, and that you know how to access these backups if you need to roll back a file or the entire instance.
If Home Assistant Backups are something you don’t have setup and are not familiar with – then this is a very good time to stop and focus on backups before proceeding any further.
Storing long-term statistics in a more efficient database, and how to visual the data in same: If you want to know more about that, it’s time to do some reading on InfluxDB and Grafana at a later time.
Choosing and moving your Home Assistant database into another database platform: I won’t even touch this myself, given the risks and additional maintenance overhead this and can will create. It’s possible, but you need to do your own reserach before even thinking about going down that path.

And the good news for you – I’ve personally dogfooded and implemented what I’m covering below; so this isn’t just some theoretical article, It’s real world steps you can use to get this right – even if you’re new to Databases, SQL queries, and the like.

Understanding Home Assistant’s data lifecycle

One of the key pieces of knowledge to this issue is understanding the three distinct stages – and thus database tables – that make up Home Assistant’s statistics data lifecycle. These are:

States: This is the raw history of each entity in your Home Assistant instance. Every new data point lands here first, and it’s the largest part of the database.
Short‑Term Statistics (STS): Home Assistant also filters and stores the data down to ~5‑minute snapshots, derived from states. It will use these statistics for the Graphs you may see in your dashboards. These short-term statistics are generally auto‑purged (every 10 days by default), to keep this light.
Long‑Term Statistics (LTS): These are hourly rollups derived from short-term statistics. The contents of long-term statistics is compact but never auto‑purged. This means that the data in here can really add up (24 rows/day/entity adds up over time) – especially if ephemeral data you don’t need to keep is written here.

If you’d like a deeper dive into all of this, I’d highly recommend reading the January 2025 article Understanding Home Assistant’s Database and Statistics Model from the Smart Home Scene website.

As you would understand by now – the longer you use Home Assistant, and the more data it captures – the bigger your database becomes, and the slower it can become to access and retrieve statistics.

And for those running Home Assistant on single-board computers, and/or using flash storage (memory card, USB, SSD) for disk drive purposes – this can start to have an appreciable impact.

This is why keeping ‘chatty’ or ephemeral entities out of your database. As this data enters the Recorder, it cannot bloat your short and long term statistics, or your backups too, and avoids excessive writes to your storage medium of choice.

What’s the right approach for keeping your database under control?

A solid approach for keeping your Home Assistant at peak performance from a database perspective covers three core activities:

Tool up: Get the right tools for the job.
Measure: Working out what you need to keep, and what not to keep, by:
- Identifying all your entries
- Identifying your biggest entries
Plan: Determining what is and isn’t of value in your long term statistics;
Test: Have we got it right before proceeding with destructive actions;
Clean up: Keep Home Assistant clean – by:
- Configuring recorder: by telling it:
  - What should never been recorded (filtered out) in the first place.
  - How often to clean up.
- Removing entries you no longer need from short and long-term storage tables in the database.

Tool up: Getting the right tools for the job

Starting with the right apps and integrations will help make this painless: DBStats (for “what’s big”) and SQLite Web (for quick, safe queries).

Add DBStats to Home Assistant

Open Settings > Apps.
Press Install App.
In the top-right of screen, click on the three dots (hamburger menu), and press Repositories.
Copy the link to this repository, and paste it into the textbox at the bottom of the Manage app repositories screen, before pressing + Add: https://github.com/jehy/hass-addons.
From the list of available apps, select and install DBStats.
If you use the default SQLite database, and are prompted for a connection string, this is usually:
sqlite:////homeassistant/home-assistant_v2.db.
Once installed, click on then Start DBStats and open its Web UI (preferably in a new tab to make things easier).

Add SQLite Web to Home Assistant

Go back to Settings > Apps in your current tab.
Press Install App.
From the list of available apps, install SQLite Web from the Add‑on Store.
Once installed, click on then Start SQLite, before opening its Web UI (again – preferably in a new tab to make things easier).

Spreadsheet application and a text/code editor

If using SQLite to measure and decide what needs to go, you’ll want to have a spreadsheet application handy to edit your CSV and track your decisions – as this will help you decide on filters for implementing your decisions.

A text/code editor will also be handy for the wildcard rules you’ll be writing and validating, and any SQL queries that you need to change,

Recent backups of your Home Assistant instance

As noted earlier in the piece – If Home Assistant Backups are something you don’t have setup and are not familiar with – then this is a very good time to stop and focus on backups before proceeding any further.

Make sure you:

Have at least a recent backup (within last 24hrs) of your Home Assistant instance;
Know where to download your backup from, and/or access your backup if stored on a cloud service;
Have your encryption key if your backups are encrypted by Home Assistant;
Know how to restore your Home Assistant instance from a backup; and
Know what tools to use for decrypting and accessing the backup and its contents – in case you only need to extract a single file.

You can take a manual backup of Home Assistant at any time, and the more cautious and knowledgable amongst you have likely already done this.

This will ensure that, in the unlikely event anything goes wrong, you

Measure: Finding your entries and their size impact

Find the biggest entities

There are two ways we can go about this – either getting a list of just your biggest entities, or getting a list of all of them and their impact across the various tables.

Option A — DBStats (no SQL): Just the biggest entries

Open the DBStats UI.
Make a note of the entity_id names for the top entities, domains and tables by size.

Option B — SQLite Web (precise and exportable)

In Home Assistant, open In SQLite Web.
Tap the Query button (top right corner), then paste and run this read‑only query, exporting to CSV:
WITH
-- Distinct text IDs present in each table
state_ids AS (
SELECT DISTINCT sm.entity_id
FROM states s
JOIN states_meta sm ON sm.metadata_id = s.metadata_id
),
sts_ids AS (
SELECT DISTINCT stm.statistic_id AS entity_id
FROM statistics_short_term sst
JOIN statistics_meta stm ON stm.id = sst.metadata_id
), lts_ids AS (
SELECT DISTINCT stm.statistic_id AS entity_id
FROM statistics s
JOIN statistics_meta stm ON stm.id = s.metadata_id
),
-- Master set of all text IDs across the three tables
all_ids AS (
SELECT entity_id FROM state_ids
UNION
SELECT entity_id FROM sts_ids
UNION
SELECT entity_id FROM lts_ids ),
-- Per-entity counts for each table
states_counts AS (
SELECT sm.entity_id, COUNT(*) AS states_count
FROM states s
JOIN states_meta sm ON sm.metadata_id = s.metadata_id
GROUP BY sm.entity_id
),
sts_counts AS (
SELECT stm.statistic_id AS entity_id, COUNT(*) AS sts_count
FROM statistics_short_term sst
JOIN statistics_meta stm ON stm.id = sst.metadata_id
GROUP BY stm.statistic_id
),
lts_counts AS (
SELECT stm.statistic_id AS entity_id, COUNT(*) AS lts_count
FROM statistics s
JOIN statistics_meta stm ON stm.id = s.metadata_id
GROUP BY stm.statistic_id
)
SELECT a.entity_id,
COALESCE(sc.states_count, 0) AS states_count,
COALESCE(stc.sts_count, 0) AS statistics_short_term_count,
COALESCE(lc.lts_count, 0) AS statistics_count,
(COALESCE(sc.states_count,0)
+ COALESCE(stc.sts_count,0)
+ COALESCE(lc.lts_count,0)) AS total_rows
FROM all_ids a
LEFT JOIN states_counts sc ON sc.entity_id = a.entity_id
LEFT JOIN sts_counts stc ON stc.entity_id = a.entity_id
LEFT JOIN lts_counts lc ON lc.entity_id = a.entity_id
ORDER BY total_rows DESC, a.entity_id;
This query will give you a really nice result, which you can export to CSV, and see all of the entities in your instance, and how many records they have across the three statistics tables:

Screenshot of CSV results of SQLite query, listing all entities in a Home Assistant instance, with a count of the number of rows in each of the statistics tables, and a total count of rows across all tables.

Plan: What data are we going to keep, or get rid of

Now that we know what entities exist in our Home Assistant instance, and how many records each is already creating – let’s move to the next steps.

Review your lists

If you chose to use Option A (DBStats): You can decide which of these biggest entity_id’s you want to get rid of, then move onto the next step.

If you chose to use Option B (SQLite export): Let’s open up the CSV file in the spreadsheet tool of your choice, and sort the data by entity_id. This should group entities of the same type and name together.

I’d also recommend adding a new column at this point (call it Remove?), and for anything you want to get rid of – add a Y to this column every time you no longer want to keep something.

Deciding what to keep, and what not to keep, long term

Think about what is shown in the list – and ask yourself these two questions for each entity:

Is this data I’m likely to use in a years’ time or longer?
Is the data difficult to access from a public and online source, that I can easily use if I need to?

If the answer to both questions is no, then filter its data out of the the recorder entirely. You don’t need it, as you either won’t use it, or can get it another way without much effort.

Some of the things I’ve excluded from my recorder include (this is a big list, but then again my HA instance has 3000+ entities of all types and stripes):

Automations (because I don’t need to know what an automation did, and when, from 3yrs ago)
- automation.*
Device sensors
- sensor.*_linkquality (this is your device keeping track of signal quality on your Zigbee, Thread or other radio-based network for connectivity)
- sensor.*_battery (a device that was using AAA batteries when it was installed is going to be using AAA batteries always, we don’t need to keep this)
- select.*_identify*, and button.*_identify* (this keeps track if you used features to help identify your device)
- sensor.*_voltage (the amount of voltage a device draws isn’t going to change, and there’s other energy sensors you can and should be using from the device to track energy consumption)
Device and integration updates (there’s no need to keep track of when a device needed an update, or if it was waiting on one)
- sensor.*_newest_version
- switch.*_pre_release
- sensor.*_update_state
- update.*
Device settings and configuration (do you really need to know a device was configured a year ago, or what volume the TV was set at 18mths ago at 3am in the morning)
- sensor.*_color_options
- sensor.*_matter_battery_type
- sensor.*_matter_battery_voltage
- sensor.*_zha_occupancy_timeout
- select.*_zha_effect
- select.*_thread_effect
- select.*_zha_power_on_behavior
- select.*_thread_power_on_behavior
- switch.*_zha_child_lock
- switch.*_zha_led_enable
- sensor.*_zha_requested_brightness_level
- sensor.*_zha_requested_brightness_percent
- number.*_humidity_calibration
- number.*_temperature_calibration
- number.audio_system_*
- sensor.audio_system_*
- switch.audio_system_*
- binary_sensor.audio_system_*
- device_tracker.audio_system_*
- button.tv_system_*
- device_tracker.tv_systems_*
Device and integration chatter (again, information which is only relevant at a point in time, and not useful long term)
- sensor.*_power_outage_count
- button.*_thread_sync_time
- switch.esphome_device_builder
- media_player.*
- remote.*
- tts.*
Mobile devices (again, this is data that has no long-term value)
- sensor.your_mobile_phone_charger_type
- sensor.your_mobile_phone_current_version
- sensor.your_mobile_phone_total_*
- sensor.your_mobile_phone_wi_fi_*
- sensor.your_mobile_phone_audio_output
- sensor.your_mobile_phone_audio_output
Trackers – Because if you have a network integration, these are just extra sensors you won’t need outside a specific point in time for troubleshooting purposes, and you won’t be coming to Home Assistant for that
- device_tracker.*
HA cloud settings
- switch.cloud_*
Weather forecasts (because we don’t need to know what a forecast was 12 months ago, we only need to know what the values were – and that’s information you can easily get from your local Meterology organisation:
- sensor.placename_*
- sensor.*_icon_descriptor_*
- sensor.*_extended_text_*
- sensor.*_mdi_icon_*
- sensor.home_air_quality_day_*
- sensor.home_apparent_temperature
- sensor.home_cloud_ceiling
- sensor.home_condition_day_*
- sensor.home_condition_night_*
- sensor.home_dew_point
- sensor.home_grass_pollen_day_*
- sensor.home_hours_of_sun_day_*
- sensor.home_humidity
- sensor.home_lightning_*
- sensor.home_precipitation
- sensor.home_pressure
- sensor.home_realfeel_temperature*
- sensor.home_thunderstorm_probability_day_*
- sensor.home_thunderstorm_probability_night_*
- sensor.home_tree_pollen_day_*
- sensor.home_uv_index*
- sensor.home_wind_gust_speed
- sensor.home_wind_speed
- sensor.home_wind_speed_day_*
- sensor.home_wind_speed_night_*
- sensor.openweathermap_*
- weather.openweathermap
- weather.placename
Zigbee2MQTT bridge (again, ephemerial data that’s only useful when troubleshooting an issue, not long term)
- select.zigbee2mqtt_bridge_log_level
- switch.zigbee2mqtt_bridge_permit_join
- button.zigbee2mqtt_bridge_restart
- sensor.zigbee2mqtt_bridge_version
Local Hardware, servers, and office equipment
- sensor.rpi_* (This is the machine name showing up in a sensor)
- sensor.hp_* (Printers)
Specific integrations
- Flightradar24 (this captures a LOT of data, and there is no value to storing it locally when you can get it online)
  - sensor.flightradar24_*
- Uptime Kuma – if you use it, list all your sensors here, otherwise you’re doubling up on data it already holds.
  - sensor.site_name_*
- Network integration data
  - sensor.network_router_*

Deciding how long you want to keep data in the states table

When performing your dry run, and eventual delete, queries – we need to know how long in days you want to keep things for (the purge_keep_days: value in the recorder: configuration).

I choose to offload data into short-term and long-term storage after 30 days, but you may want to keep it longer or for less – the choice is up to you.

When making this decision, consider these factors:

The performance capabilities of the machine running your Home Assistant instance;
How often you run queries for data;
Do you have Lovelace cards that display Graph/History data on your dashboard, and what’s the longest length of time one of these graphs shows?

This information will allow you to find the right balance between system performance, dashboard performance, database size, and read/write operations to your storage.

Building your rules and wildcards

As noted earlier, if using Option B and working with the CSV – there’s a good chance you’ve already decided to mark all the items that are not needed and destined for removal. Let’s start building our rules.

Filter your Remove? column to only those entries with Y as the value.
Look at these entities that you want to remove:

Sensor name

No. of States Entries

Remove?

binary_sensor.aurora_alert

binary_sensor.aurora_outlook

binary_sensor.aurora_watch

binary_sensor.bedtime_period

binary_sensor.combined_door_window_and_motion_sensors

3260

binary_sensor.courtyard_shed_door_sensor_zha_contact

292

binary_sensor.dining_room_ikea_window_sensor_zha_contact

294

binary_sensor.dining_room_motion_sensor_matter_occupancy

133

binary_sensor.door_and_window_sensors

574

binary_sensor.hallway_motion_sensor_2_zha_occupancy

1461

Using your text editor, type in your safe wildcard rule to remove the relevant entries, and keep working through all of your entities until you have rules to cover all of the ones you wish to remove from recorder. Keep this file handy, as it will be needed later.
Here are some examples of safe, and unsafe rules for this list of sensors:

Safe wildcard rules

Unsafe wildcard rules

binary_sensor.aurora_* (as this gets us all the aurora sensors)

binary_sensor.bedtime_* (as this gets us any bedtime sensors)

binary_sensor.* (as this would remove recording for all the binary sensors in our system, including the ones we’ve decided to keep)

Test: Have we got it right before proceeding with destructive actions?

Build and test your query

Next, we want to build a query from the rules we’ve created, so we can check what will be removed by our rules.

Perform a dry run and check the results

And now, let’s do a dry run with our filters, to check exactly what will be removed when we proceed to the next step:

Open a text or code editor of your choice.
Copy the following SQL code:
-- === DRY RUN: show what WOULD be deleted (no changes) === PRAGMA foreign_keys=ON; -- 60-day cutoff in Unix seconds (UTC) WITH cutoff(ts) AS ( SELECT CAST(strftime('%s','now','-60 days') AS REAL) ) SELECT datetime(ts, 'unixepoch', 'localtime') AS cutoff_local FROM cutoff; -- 1) Your wildcard list as GLOB patterns (exactly as provided; '*' means "any string") CREATE TEMP TABLE kill_globs(glob TEXT PRIMARY KEY); INSERT INTO kill_globs(glob) VALUES ('automation.*'), ('sensor.flightradar24_*'), ('sensor.*_linkquality'), ('sensor.*_battery'), ('select.*_identify'), ('button.*_identify'), ('button.*_identify*'), ('sensor.*_voltage'), ('sensor.system_monitor_*'), -- Things to do with updates ('sensor.*_newest_version'), ('switch.*_pre_release'), ('sensor.*_update_state'), ('update.*'), -- Things to do with an integration or device ('sensor.*_power_outage_count'), ('button.*_thread_sync_time'), ('switch.esphome_device_builder'), ('media_player.*'), ('remote.*'), ('tts.*'), -- Lights/switches noise ('sensor.*_color_options'), ('sensor.*_matter_battery_type'), ('sensor.*_matter_battery_voltage'), ('sensor.*_zha_occupancy_timeout'), ('select.*_zha_effect'), ('select.*_thread_effect'), ('select.*_zha_power_on_behavior'), ('select.*_thread_power_on_behavior'), ('switch.*_zha_child_lock'), ('switch.*_zha_led_enable'), ('sensor.*_zha_requested_brightness_level'), ('sensor.*_zha_requested_brightness_percent'); -- 2) Resolve target metadata ids from your globs CREATE TEMP TABLE target_states_meta AS SELECT sm.metadata_id FROM states_meta sm JOIN kill_globs kg ON sm.entity_id GLOB kg.glob; CREATE TEMP TABLE target_statistics_meta AS SELECT stm.id AS metadata_id FROM statistics_meta stm JOIN kill_globs kg ON stm.statistic_id GLOB kg.glob; -- 3) Show which ids we matched (optional samples) SELECT 'states_meta matches' AS what, COUNT(*) AS cnt FROM target_states_meta UNION ALL SELECT 'statistics_meta matches', COUNT(*) FROM target_statistics_meta; -- 4) Show would-be delete counts per table WITH cutoff(ts) AS (SELECT CAST(strftime('%s','now','-60 days') AS REAL)) SELECT 'states' AS table_name, COUNT(*) AS rows_would_delete FROM states, cutoff WHERE states.metadata_id IN (SELECT metadata_id FROM target_states_meta) AND states.last_updated_ts < cutoff.ts UNION ALL SELECT 'statistics_short_term', COUNT(*) FROM statistics_short_term, cutoff WHERE statistics_short_term.metadata_id IN (SELECT metadata_id FROM target_statistics_meta) AND statistics_short_term.start_ts < cutoff.ts UNION ALL SELECT 'statistics', COUNT(*) FROM statistics, cutoff WHERE statistics.metadata_id IN (SELECT metadata_id FROM target_statistics_meta) AND statistics.start_ts < cutoff.ts;
Update the following parts of the query in your text/code editor:
- SELECT CAST(strftime('%s','now','-60 days') AS REAL), replacing 60 with the number of days you wish to keep data in the states table for.
  For example, if you choose 30 days, the line should read:
  SELECT CAST(strftime('%s','now','-30 days') AS REAL)
- INSERT INTO kill_globs(glob) VALUES with the list of wildcard entries you’ve listed to exclude from your statistics. Always make sure the correct format for each entry is used:
  ('automation.*'),
  Note: The last entry with your wildcards should always be as follows:
  ('sensor.*_zha_requested_brightness_percent');
Copy your completed query from your text or code editor.
Navigate to Home Assistant, and open SQLite Web.
Tap the Query button (top right corner),
Paste and run this read‑only query, exporting the results to CSV when the query has run.
Review and compare the output, to what you wish to remove:
Check against your CSV file of rows where of Removed? = Y, to see if there are any entities returned by this query are ones that are either:
- Deleted, but their entity name isn’t included in the list; or
- Kept, and their entity name is included in the list.
Adjust your wildcards in your query based on what you found when reviewing the results, run your Dry run query again, then check again.
Repeating these steps until you have only the entities you want to remove will ensure you get this right first time, and don’t end up either accidently deleting data you want, or keeping data you don’t.

Clean up: Take the steps to implement and do the necessary housekeeping

This is where the danger beings, as we’re going to start changing and removing what information is stored within the Home Assistant database.

If you’re unsure about anything, then stop, take a breath, and re-do all of the Test steps one last time before proceeding – so you know what is going to be affected.

Update configuration.yaml for recorder exclusions

Within Home Assistant, open your editor and the configuration.yaml file to add the recorder: block based on your retention day decisions and final wildcards you tested and validated in the earlier dry run testing stages. Adding comments to your exclusions is also a good idea, so you know what was added, and why you added it, for future reference.
Below is an example of my recorder: block configuration, with comments, so you can see the types of things I don’t have a need for, and the comments I added for them (and yes, this is a long list).
recorder: auto_purge: true purge_keep_days: 60 commit_interval: 30 exclude: entity_globs: # Automations - automation.* # Generic noise - stuff you're never going to analyse, or want to look at again - sensor.*_linkquality - sensor.*_battery - select.*_identify - button.*_identify - button.*_identify* - sensor.*_voltage - sensor.system_monitor_* # Update chatter - sensor.*_newest_version - switch.*_pre_release - sensor.*_update_state - update.* # Integration/device chatter - sensor.*_power_outage_count # Your choice to keep this one or not - button.*_thread_sync_time - switch.esphome_device_builder - media_player.* - remote.* - tts.* # Lights/switches - things you'll likely never want to analyse - sensor.*_color_options - sensor.*_matter_battery_type - sensor.*_matter_battery_voltage - sensor.*_zha_occupancy_timeout - select.*_zha_effect - select.*_thread_effect - select.*_zha_power_on_behavior - select.*_thread_power_on_behavior - switch.*_zha_child_lock - switch.*_zha_led_enable - sensor.*_zha_requested_brightness_level - sensor.*_zha_requested_brightness_percent # Calibration - number.*_humidity_calibration - number.*_temperature_calibration # Mobile telemetry - sensor.your_mobile_phone_charger_type - sensor.your_mobile_phone_current_version - sensor.your_mobile_phone_total_* - sensor.your_mobile_phone_wi_fi_* - sensor.your_mobile_phone_audio_output - sensor.your_mobile_phone_audio_output # Audio system and TV's - because nobody wants to know what volume you had the audio at on average for the last three years - number.audio_system_* - sensor.audio_system_* - switch.audio_system_* - binary_sensor.audio_system_* - device_tracker.audio_system_* - button.tv_system_* - device_tracker.tv_systems_* # Trackers - Because if you have a network integration, these are just extra sensors you won't need outside a specific point in time for troubleshooting purposes, and you won't be coming to Home Assistant for that - device_tracker.* # Core cloud - switch.cloud_* # Weather forecasts - sensor.placename_* - sensor.*_icon_descriptor_* - sensor.*_extended_text_* - sensor.*_mdi_icon_* - sensor.home_air_quality_day_* - sensor.home_apparent_temperature - sensor.home_cloud_ceiling - sensor.home_condition_day_* - sensor.home_condition_night_* - sensor.home_dew_point - sensor.home_grass_pollen_day_* - sensor.home_hours_of_sun_day_* - sensor.home_humidity - sensor.home_lightning_* - sensor.home_precipitation - sensor.home_pressure - sensor.home_realfeel_temperature* - sensor.home_thunderstorm_probability_day_* - sensor.home_thunderstorm_probability_night_* - sensor.home_tree_pollen_day_* - sensor.home_uv_index* - sensor.home_wind_gust_speed - sensor.home_wind_speed - sensor.home_wind_speed_day_* - sensor.home_wind_speed_night_* # OpenWeatherMap sensors - sensor.openweathermap_* - weather.openweathermap - weather.placename # Zigbee2MQTT bridge - select.zigbee2mqtt_bridge_log_level - switch.zigbee2mqtt_bridge_permit_join - button.zigbee2mqtt_bridge_restart - sensor.zigbee2mqtt_bridge_version # Network integration data - sensor.network_router_* # Uptime Kuma - if you use it, list all your sensors here, otherwise you're doubling up on data it already holds. - sensor.site_name_* # Servers, hardware and office equipment - sensor.rpi_* # broader than sensor.rpi_rpi_* - sensor.hp_* # OfficeJet + printer families # Your Bluetooth connected toothbrush (Yes, Home Assistant can get data from these) - sensor.io_series_*_signal_strength
Save configuration.yaml, and close the file.
Validate your YAML:
1. Navigate to Settings > Developer Tools > Check and restart.
2. Press Check Configuration.
3. If it comes back with the message “Configuration will not prevent Home Assistant from starting!” you are good to proceed. If not, go back to Step 1 and review your YAML at the line(s) indicated by the error message(s).
Return back to Settings, and Restart Home Assistant.
(Yes, this might seem unnecessary – however when you’re running an instance with 3,000+ entities like mine – reloading the configuration from Developer Tools > YAML Configuration Loading takes much longer than a restart).
Restart Home Assistant, and wait for it to restart.

Use SQLite to delete old data

Back in your text or code editor, copy the following SQL queries and edit these with your validated wildcards:
1. Query: Deletion from short-term statistics:
  WITH kill_globs(glob) AS ( VALUES ('automation.*'), ('sensor.flightradar24_*'), ('sensor.*_linkquality'), ('sensor.*_battery'), ('select.*_identify'), ('button.*_identify'), ('button.*_identify*'), ('sensor.*_voltage'), ('sensor.system_monitor_*'), ('sensor.*_newest_version'), ('switch.*_pre_release'), ('sensor.*_update_state'), ('update.*'), ('sensor.*_power_outage_count'), ('button.*_thread_sync_time'), ('switch.esphome_device_builder'), ('media_player.*'), ('remote.*'), ('tts.*'), ('sensor.*_color_options'), ('sensor.*_matter_battery_type'), ('sensor.*_matter_battery_voltage'), ('sensor.*_zha_occupancy_timeout'), ('select.*_zha_effect'), ('select.*_thread_effect'), ('select.*_zha_power_on_behavior'), ('select.*_thread_power_on_behavior'), ('switch.*_zha_child_lock'), ('switch.*_zha_led_enable'), ('sensor.*_zha_requested_brightness_level'), ('sensor.*_zha_requested_brightness_percent') ), cutoff(ts) AS (SELECT CAST(strftime('%s','now','-60 days') AS REAL)) DELETE FROM statistics_short_term WHERE metadata_id IN ( SELECT stm.id FROM statistics_meta stm JOIN kill_globs kg ON stm.statistic_id GLOB kg.glob ) AND start_ts < (SELECT ts FROM cutoff);
2. Query: Deletion from long-term statistics:
  WITH kill_globs(glob) AS ( VALUES ('automation.*'), ('sensor.flightradar24_*'), ('sensor.*_linkquality'), ('sensor.*_battery'), ('select.*_identify'), ('button.*_identify'), ('button.*_identify*'), ('sensor.*_voltage'), ('sensor.system_monitor_*'), ('sensor.*_newest_version'), ('switch.*_pre_release'), ('sensor.*_update_state'), ('update.*'), ('sensor.*_power_outage_count'), ('button.*_thread_sync_time'), ('switch.esphome_device_builder'), ('media_player.*'), ('remote.*'), ('tts.*'), ('sensor.*_color_options'), ('sensor.*_matter_battery_type'), ('sensor.*_matter_battery_voltage'), ('sensor.*_zha_occupancy_timeout'), ('select.*_zha_effect'), ('select.*_thread_effect'), ('select.*_zha_power_on_behavior'), ('select.*_thread_power_on_behavior'), ('switch.*_zha_child_lock'), ('switch.*_zha_led_enable'), ('sensor.*_zha_requested_brightness_level'), ('sensor.*_zha_requested_brightness_percent'), ('number.*_humidity_calibration'), ('number.*_temperature_calibration') ), cutoff(ts) AS (SELECT CAST(strftime('%s','now','-60 days') AS REAL)) DELETE FROM statistics WHERE metadata_id IN ( SELECT stm.id FROM statistics_meta stm JOIN kill_globs kg ON stm.statistic_id GLOB kg.glob ) AND start_ts < (SELECT ts FROM cutoff);
3. OPTIONAL: Query: Deletion from states:
  WITH kill_globs(glob) AS ( VALUES ('automation.*'), ('sensor.flightradar24_*'), ('sensor.*_linkquality'), ('sensor.*_battery'), ('select.*_identify'), ('button.*_identify'), ('button.*_identify*'), ('sensor.*_voltage'), ('sensor.system_monitor_*'), ('sensor.*_newest_version'), ('switch.*_pre_release'), ('sensor.*_update_state'), ('update.*'), ('sensor.*_power_outage_count'), ('button.*_thread_sync_time'), ('switch.esphome_device_builder'), ('media_player.*'), ('remote.*'), ('tts.*'), ('sensor.*_color_options'), ('sensor.*_matter_battery_type'), ('sensor.*_matter_battery_voltage'), ('sensor.*_zha_occupancy_timeout'), ('select.*_zha_effect'), ('select.*_thread_effect'), ('select.*_zha_power_on_behavior'), ('select.*_thread_power_on_behavior'), ('switch.*_zha_child_lock'), ('switch.*_zha_led_enable'), ('sensor.*_zha_requested_brightness_level'), ('sensor.*_zha_requested_brightness_percent'), ('number.*_humidity_calibration'), ('number.*_temperature_calibration') ), cutoff(ts) AS (SELECT CAST(strftime('%s','now','-60 days') AS REAL)) DELETE FROM states WHERE metadata_id IN ( SELECT sm.metadata_id FROM states_meta sm JOIN kill_globs kg ON sm.entity_id GLOB kg.glob ) AND last_updated_ts < (SELECT ts FROM cutoff);
Stop the recorder:
Navigate to Developer Tools > Actions, and select recorder.disable, then press Perform action.
(This will help SQLite run faster whilst deleting entries, but will also mean your logs might lose a couple of minutes of data).
Run your queries queries:
1. Navigate to SQLite in Home Assistant, and press the Query button in the top-right corner.
2. Copy and paste the first query from your text or code editor, then press the Execute button.
3. Wait for it to finish (this may take a while, and you might not get an indicator apart from the Execute button being available again, so be aware).
4. Repeat steps 1-3 for each of the remaining queries.
5. If the Execute button is greyed out or disabled, or you receive a database lock message when trying to Execute a query, try again in a few minutes.
Re-enable the recorder:
1. Return back to Developer Tools > Actions, and select recorder.enable, then press Perform action.
  (This ensures data is going back into SQLite).
Perform database clean-up:
1. Return back to SQLite > Query, and run the following to perform final cleanup on your database:
  PRAGMA wal_checkpoint(TRUNCATE); VACUUM;
  For more information on why this helps, see: Recorder docs; and you can also run step using recorder.purge from Developer Tools > Actions if you wish.
Take a new manual backup of your Home Assistant instance.
Reboot Home Assistant one final time, and wait for it to return to service.

You’ve reached the finish line

And that’s it, we’re done. And well done to you too.

You’ve successfully cleaned up the database of your Home Assistant instance, and have it back in a position where it’s only recording the things it needs to record for future analysis.

It might also be a good time to get a hot beverage and a biscut, followed by up-voting my Github feature request post for this to all be native and baked into Home Assistant from day one.