Skip to main content

Data cleanup (Housekeeping)

Target audience: Developers & Modelers & Administrators

Available from: 3.11

Overview

This how-to introduces the Housekeeping capabilities of Flowable Work and how this is integrated with Flowable Design and Flowable Control. The how-to is separated in multiple parts that are targeting different types of users.

For Administrators and how to configure the default housekeeping in the system you can read this part.

For Modelers and how to create your own custom housekeeping processes you can read this part.

For Developers and how you can use the Java API to trigger more complex Housekeeping jobs you can read this part.

Why?

Often there are regulatory and compliance requirements, requiring that data older than a certain period needs to be deleted. In addition to that regulations such as GDPR require particular data to be deleted on demand. It's also recommended to delete historic data that is older than a specific time period, to keep the database size from increasing to a large volume. Starting from 3.11 Flowable provides an easy way to delete the data you want.

What is deleted?

When a completed process / case is deleted then all data associated with it will be deleted as well. e.g. All documents, audit entries, comments, sub processes / cases, Elasticsearch, etc. The data will be entirely deleted from Flowable and will no longer be available.

How?

The deletion of the completed instances is done asynchronously in batches in background threads.

When the batch is created, a batch part with the process instance or case instance ids for the first page with a length of the configured batch size is created and a job to delete the instances data is created. Then the job is executed by the async executor and it will delete the instances data. Then the next batch part is created with the next page with the length of the configured batch size and the job to execute it. This will continue until there are no pages left for the process instances or case instances that match the configured housekeeping time period. For the housekeeping there is only one job active at the same time.

Automatic periodic deletion

Target audience: Developers & Administrators

Flowable provides a way to configure an automatic deletion of completed root process / cases instances. The available properties for this are:

PropertyDescriptionDefault value
flowable.enable-history-cleaningWhether automatic history cleaning should be enabled.false
flowable.history-cleaning-cycleThe time cycle for the automatic history cleaning. This is a Cron Expression. By default it is done every day at 1 am0 0 1 * * ?
flowable.history-cleaning-afterAfter which duration should completed root historic instances be deleted. e.g. for deleting data older than 2 years: P730DP365D
flowable.history-cleaning-batch-sizeHow many historic instances should be deleted in one batch.100

When the history cleaning is enabled Flowable will create so-called Housekeeping Runs that will be run according to the configured flowable.history-cleaning-cycle. These Runs can be monitored in Flowable Control. By default, when using Flowable Work only completed root process / case instances within the configured flowable.history-cleaning-after will be deleted.

It is possible to change what kind of instances are deleted by providing your own implementation of the HistoryCleaningManager and CmmnHistoryCleaningManager interfaces.

note

This functionality is also available when using Flowable Open Source. The only difference there is that the deletion is done for any completed Process / Case instance. In addition to that, data from other engines, such as Form Instances is not automatically deleted.

note

Apart from completed root process / case instances the automatic periodic deletion is also going to delete all batches that are older than the configured period. These batches are what Flowable uses to perform the deletion.

Housekeeping Monitoring

Flowable Control provides a way to monitor and inspect the Housekeeping Runs. This is available under the Housekeeping section of Flowable Control.

Control Housekeeping Runs

In here the runs can be searched on using different parameters including their Status and Type. You can then drill down in each run to see its configuration and check its status.

Control Housekeeping Run Details

In the configuration details above we can see that this run was configured to run with a batch size of 100, run in sequence and run with a query that will return all instances finished before a given date that do not have a subprocess and callback id (i.e. they are root instances). As can be seen in the housekeeping run details screenshot there are 2 delete batch parts which have more information about the instance ids that were deleted.

Control Housekeeping Run Delete Parts

Here we can see all the "delete" batch parts, and we can filter them based on their status. Each batch part is used to perform the actual deletion of the instances with the provided batch size.

If we click on one delete part we are going to see the following:

Control Housekeeping Run Delete Part Details

Here we can see the instance ids that were deleted and also the instance ids that failed to be deleted due to some exception.

On Demand deletion

In the Housekeeping section of Flowable Control there are sections labeled "Cases" and "Processes". These sections display the completed root process and case instances and allow you to start a batch deletion of the instances matching the query criteria. Due to the fact that the deletion will remove any trace of the data from the Flowable Tables it is not allowed to perform the deletion without specifying any query parameters. Once you define your criteria and click the "Start delete batch" button you'll have an option to set the name for the Housekeeping run and actually run it.

Control Housekeeping Start Delete Popup

Once you enter the name and press "Start deletion" the Housekeeping run will be initiated and the deletion of the data will start. You'll also be navigated to the started Housekeeping run

Modeling Housekeeping Runs

Target audience: Developers & Modelers

Flowable Design has a Housekeeping Task that can be used to model and start different Housekeeping Runs. This can be used in custom processes that need to perform data cleanup in some custom way, e.g. GDPR, different deletion periods for different definitions. The housekeeping task will only perform deletion of completed root instances in the same tenant as the process that is being executed. With one Housekeeping Task it is not possible to delete data cross other tenants.

The Housekeeping Task is located in the Flowable Work section of the palette.

Design Housekeeping Task

The Task has the following options:

Design Housekeeping Task Details

  • Instances to Delete - Mandatory option between "Processes" or "Cases" to choose what kind of completed instances will be deleted
  • Housekeeping run name - The name of the housekeeping run. If it is empty then the name of the activity will be used
  • Housekeeping batch size - How many processes / cases should be deleted in one go (batch). If nothing is defined then the engine level configuration will be used.
  • Housekeeping Run ID variable name - The name of the variable in which the housekeeping run ID will be persisted
  • Store Housekeeping ID transiently - Flag indicating that the housekeeping ID should be stored as a transient variable. Only used if Housekeeping Run ID variable name is set
  • Query Parameters - The query parameters that should be used to find the instances that should be deleted. At least one query parameter has to be set

Design Housekeeping Task Query Parameters

The Housekeeping Task Query Parameters are used to select the instances that should be deleted. At least one parameter has to be defined to perform the deletion.

The currently supported parameters are:

Design Housekeeping Task Query Parameters

  • Definition Key - Will delete all instances with the given definition key.
  • Definition Keys - Will delete all instances with the given definition keys. The value can be a String, comma separated String, a Collection or an ArrayNode.
  • Started After - Will delete all instance started after the given date. The value can be a Date or an Instant.
  • Started Before - Will delete all instance started before the given date. The value can be a Date or an Instant.
  • Finished After - Will delete all instance finished after the given date. The value can be a Date or an Instant.
  • Finished Before - Will delete all instance finished before the given date. The value can be a Date or an Instant.
  • Business Key - Will delete all instance matching the given business key.

When multiple parameters are set then instances matching all parameters will be deleted. e.g. If you only want to delete instance for a given Definition that have finished within a given time period then the Definition Key, Finished After and Finished Before parameters need to be defined.

Monitoring modeled Housekeeping Runs

The modeled Housekeeping runs can be monitored as explained in the previous section

Deleting Instance using the Java API

Target audience: Developers

The Flowable Housekeeping is implemented in such a way that it uses the HistoricProcessInstanceQuery and HistoricCaseInstanceQuery to perform the deletion. These both queries implement the BatchDeleteQuery that has the following method:

  • String deleteSequentiallyUsingBatch(int batchSize, String batchName) - Perform the deletion in sequentially (one batch at a time) using the given batch size and batch name

This means that all query options, including the "OR" capabilities of the HistoricProcessInstanceQuery and HistoricCaseInstanceQuery can be used to perform the deletion.