Preset uses a wide variety of data, some of which is readily accessible and easily accessed — this type of data can be fetched from its source whenever needed. Unfortunately, this doesn’t hold true for all sources of information. In these scenarios, data needs to be cached in order to facilitate more efficient access.
In this article, we will introduce use cases for Preset's caching capabilities, how they lead to faster access, and share the relevant backend architecture. We'll also look at how caching can be managed by the user to help improve performance.
In all scenarios, Preset uses a 24-hour Redis cache by default, with identical queries pulled from the last cache results.
Caching Chart Data on a Dashboard
Cache JSON results via Redis
This use case serves as a speed layer between the web app and analytics databases that Preset connects to. Query results that are used to build charts are cached in order to speed up subsequent requests for the same chart. Also note that any defined filter options are also cached.
Architecture: Caching visualization data
In order to further maximize cache performance, chart data can be calculated in advance so that dashboard viewers don't need to wait for on-the-fly queries to complete. A dashboard owner can “warm up the cache” daily by scheduling or loading a dashboard before other users are active to leverage the cache for the rest of the day.
One important note is that cached data is specific to the filter combination that is applied to the dashboard or chart. A change to a filter modifies the underlying SQL, so filter changes will still trigger a request to the database.
Additionally, while caching can really accelerate dashboard load times and protect your database from significant common workloads, it doesn’t really improve interactive workflows like exploring data on the Explore page or applying filters to a dashboard.
Caching with SQL Lab Queries
Using asynchronous mode for SQL Lab queries
When a database has Asynchronous query execution enabled, SQL Lab queries will be dispatched to Celery workers, which then writes results to a shared cache. This results in improved performance, as it enables web app instances to quickly retrieve data for display to a Preset user.
If the Asynchronous query execution is not enabled, then the SQL query will run as a web request instead of being dispatched to a Celery worker.
How to enable asynchronous query execution
By default, asynchronous query is disabled. In order to take advantage of the shared cache method described above, you'll need to enable asynchronous query in a database.
- Navigate to the Databases screen.
- Select the pencil Edit icon for a database.
- Select the Advanced tab and then expand the Performance section.
- Select Asynchronous query execution to enable.
- When done, select Finish.
Architecture: Caching with SQL Lab queries
Caching Other Elements
Caching various "memoized" functions
There are several other caching opportunities used by the Preset codebase that are designed to speed up repetitive tasks performed against the platform's metadata store. Examples include the fetching of databases for a given user, parsing column types for a given analytics database, and fetching table names for SQL Lab.
Cache dashboard thumbnails
As part of the dashboard thumbnail system, content is cached in order to reduce the amount of time spent fetching data from the S3 bucket in which the source image resides.
Setting Cache Timeouts
A cache timeout is the duration, in seconds, of how long data from a table, schema, chart, or dataset will remain accessible in the Preset cache. All of these values can be defined, with the default being the globally set timeout. If set to 0, then the cache will never expire.
If you wish to define cache timeouts, then a major consideration is how frequently the user's data changes. Infrequently-changed elements, like schemas, may be defined with a 4-week long cache timeout; conversely, a chart that uses real-time data may need to be set to as low as 5 minutes.
Let's have a closer look at some potential cache timeout settings, in order from longest to shortest duration.
In general, schemas do not change very often, so a longer cache timeout is appropriate.
Recommended cache timeout: 1 to 4 weeks (604,800 to 2,419,200 seconds)
Like schemas, changes for datasets are less frequent. Columns are rarely altered (i.e., added or removed) and types rarely change.
Recommended cache timeout: 1 day to 1 week (86,400 to 604,800 seconds)
Generally speaking, table caches should be shorter. If your users frequently create their own tables, then a smaller value is recommended to ensure that they appear in drop-downs. It's also important to consider the database being used; Apache Druid, for example, creating a table takes time and planning, so a longer cache is more appropriate.
Recommended cache timeout: 1 hour to 1 week (3,600 to 604,800 seconds)
The shortest cache duration should be reserved for charts, which may need to access more recent data, particularly if real time critical data is involved.
Recommended cache timeout: 5 minutes to 1 day (300 to 86,400 seconds)
How to Define Cache Timeouts
Define cache for schemas, tables, and charts
Follow the navigation directions above for enabling asynchronous query execution and, in the Performance section, enter cache timeout values for the Chart Cache Timeout, Schema Cache Timeout, and Table Cache Timeout fields, as needed.
Define cache for datasets
- Navigate to the Datasets screen.
- Select the pencil Edit icon for a dataset.
- Select the Settings tab.
- In the Cache Timeout field, enter a value.
- When done, select Save.