Flowable Work Indexing

In Flowable Work, critical business data is continuously indexed as users are working with processes, cases, tasks, etc. Indexing in this context means that the data is transformed, enriched, and then ingested by Elasticsearch. By having the data indexed in Elasticsearch, this enables fast searches (regular, full-text or even fuzzy) and analytical queries. Flowable Platform/Work expose API’s (in REST and Java) to query and work with the data efficiently.

This document describes in detail the architecture and APIs with regards to indexing and possible extensions/enhancements that are available out-of-the-box within Flowable Platform/Work.

Furthermore, this document explains:

  • How variables are indexed automatically and how the values of those variables get indexed for easy searching.

  • How the default index mappings can be extended to store additional data or change the default way that data is stored.

  • How to create custom search aliases or dynamic queries can be defined to build powerful dashboards.

  • Why reindexing is sometimes needed and how it is triggered.

The general principles apply to Flowable Engage indexing also (for conversations and messages), but the specifics are described in another document.

Architecture

Elasticsearch indices need a mapping definition that defines how the data in the index is structured, what type of analysis needs to happen on which fields, etc. More information about mapping can be found in the Elasticsearch documentation.

On the bootup of the Flowable server, the Elasticsearch instance (which can be a cluster of multiple nodes) is checked to have the correct mapping definitions. The default mappings in Flowable are defined in mapping configuration files that ship with the product. Their default location on the classpath is com/flowable/indexing/mapping/default. These mapping configurations are versioned, and Flowable automatically upgrades the mappings on startup (if possible).

It is possible to override the default mapping configuration files completely or add custom mapping configuration by placing mapping files in either com/flowable/indexing/mapping/default or com/flowable/indexing/mapping/custom and setting flowable.indexing.enable-default-mappings to false in the application.properties file.

An example mapping file looks as follows:

{
  "name": "tasks",
  "version": 1,
  "filter": {
    "types": [ "TSK" ]
  },

  "mappings": {
    "dynamic": "strict",
    "properties": {
      "assignee": {
        "type": "keyword"
      },
      "category": {
        "type": "text"
      },
      "claimTime": {
        "type": "date"
      },
      "createTime": {
        "type": "date"
      },
      "description": {
        "type": "text",
        "copy_to": [
          "full_text_typeAhead"
        ]
      },
      ...

Flowable creates a unique name for each of the default indices. The name starts with the value of the flowable.indexing.index-name-prefix property if set. If not set, the name starts with flowable.project-name or the empty string if that one is missing. Next comes the actual index name (e.g., my-project-tasks), followed by a timestamp suffix.

For each such index, an alias is created with the 'regular' name. For example, the alias tasks might point to the index my-project-tasks-<timestamp>. This is important for reindexing, as during a reindex the alias points to the original index until indexing is completed and then the alias is mapped to the new index.

When users (or an automated system) interact with the API’s of Flowable processes, cases, rules, forms, etc. are deployed, started, completed, etc. This generates a lot of data that gets indexed in Elasticsearch. The mechanism used behind the scenes is the Flowable Async History Executor, which is a component found in the Flowable Core engines that stores data in dedicated database tables:

Async History Executor

Without going into too many technical details, the important thing here is that whatever operation the user (or automated system) is doing, the data for the indices is stored during the same database transaction as the runtime data. This means that everything is all-or-nothing and data is handled atomically when going from one stable state to another (e.g., completing a user task with a form moves the case instance atomically from one stable state to another). The data meant for indexing is using this mechanism to be always correct and in sync (taking into account the eventual consistency of Elasticsearch) with regards to the data of the engines.

Indexing Diagram

The Flowable products (Platform/Work/Engage) enhance the Core engines with this indexing behavior out-of-the-box. All the data produced by the engines is handled in a transactionally correct way and is exposed through REST and Java API’s that allow to query, search and manage this data in a way consistent with all the API’s in the other Flowable products.

Elasticsearch Functionality

Flowable Platform/Work is designed in such a way that there only exists a thin layer around Elasticsearch functionality. As described in the following sections, as much as possible (mapping, querying, etc.) are exposed in the 'native' Elasticsearch format. The design decision behind this is that whenever a new feature gets added to Elasticsearch, this means it is automatically available for use with Flowable indexing.

For people familiar with the Edoras One product, this is a change from the default indexing and the index addon in that product. In Flowable, the custom query language and mapping definition is the native Elasticsearch format.

Indexed Data

Flowable creates the following indices by default:

  • work: Contains the 'work instances'. A work instance is a root process or case instance.

  • case-instances: Contains all case instances (runtime and historical).

  • process-instances: Contains all process instances (runtime and historical).

  • tasks: All tasks (process, case, or standalone) get indexed here.

  • users: Users and user information gets indexed here.

  • plan-items: Stores information about plan item instances of a case instance and this is useful for heat maps or analyzing case instance optimizations.

  • activities: Stores information about activities executed as part of process instance executions and this is useful for heat maps or analyzing process instance optimizations.

The data which is stored for the particular index is defined by the default index mapping files and differs from type to type.

The work, case-instances, process-instances and tasks indices share some common data though, and they contain:

  • variables and identityLinks.

  • fields for indicating the instance information (e.g., the case instance id and name for a task in a case).

  • fields to indicate the parent scope: parentScopeId, parentScopeType and parentScopeName (if the instance has a parent). For example, a child process instance of a case instance would have the id and name of the parent case and 'cmmn' as type.

  • fields to indicate the parent scope definition: parentScopeDefinitionId, parentScopeKey, parentScopeDefinitionName. For example, a child process instance of a case instance would have the definition id, key, and of the deployed root case model.

  • fields to indicate the root scope: rootScopeId, rootScopeType and rootScopeName. For example, a nested process (does not matter how deeply nested) of a root case instance would have the id and name of the root case instance. The type would be 'cmmn'.

  • fields to indicate the root scope definition: rootScopeDefinitionId, rootScopeKey, rootScopeDefinitionName. For example, a nested process (does not matter how deeply nested) of a root case instance would have the id, key, and name of the deployed root case model.

Indexing Variables

Variables are of the utmost importance when it comes to indexing, as typically processes and cases gather a lot of data through forms, automated tasks, or other ways. This data is often utilized to build dashboards, analytical pages or reports that are used to gain new insights or to improve the way users can work with process or cases. Of course, fast querying of instances based on variable data is possible too.

This particular functionality has typically been a problem performance-wise for relational databases when the amount of data gets to a certain size. For this reason, Flowable comes with indexing of variables out-of-the-box in a query-friendly but extensible way.

Variables are automatically indexed for the following indices:

  • Work instances (/work)

  • Case instances (/case-instances)

  • Process instances (/process-instances)

  • Tasks (/tasks)

Variables are stored as a nested collection on each of JSON documents representing a work, process, case instance, or task. When querying for a task directly in Elasticsearch for example, the response contains a variable array:

{
    ...
    "hits": {
        "total": 1,
        "max_score": 1,
        "hits": [
        {
            ...
            "_source": {
               "variables" : [ ]
            }
        }
        ...

Variable Values

Each variable is stored as a nested JSON document. Every variable document has, at a minimum the following fields:

  • name: The name of the variable, for example, the name of the form field, the variable set in a service task, a mapped in/out parameter, etc.

  • type: The type of the variable (text, number, date, etc.). This corresponds directly to the types that are available in the Flowable engines.

  • id: the internal (database) id of the row in the variable table. This is used internally (e.g., for tracking updates).

The actual value of a variable is the most important aspect. The value is stored in a dedicated typed field, which depends on the type of the variable:

  • dateValue: for date (regular Java dates or Joda-Time dates) variables, stored as an Elasticsearch date field.

  • booleanValue: for boolean variables, stored as an Elasticsearch boolean field.

  • numberValue: for number (integer/long/short/…​) variables, stored as an Elasticsearch long field.

  • decimalValue: for decimal (double/float) variables, stored as an Elasticsearch double field.

  • textValue: for textual and UUID variables, stored as an Elasticsearch text (full-text searchable) field. Furthermore, the same textual value is stored in the textValueKeyWord field, which allows for exact queries (non-full-text search matching).

  • jsonValue: for JSON variables, which are stored as-is for retrieval. If there is a need for querying or sorting on a field of a JSON variable value, that value needs to be extracted.

There is also a rawValue field (keyword type), which gets a copy of the variable value with the type serialized to text. For example, for a number field, it would contain the number as a string, for a boolean it would contain 'true' or 'false', etc.

Best practice: avoid storing Java serializable variables (like storing a java.util.List, java.util.Map, or other variables). They make querying impossible and can lead to subtle bugs in application code (e.g., serialization/deserialization between JVM versions causes issues). For lists or maps, use a Jackson JsonNode instead. The ObjectMapper has methods to convert between JSON and objects easily.

There also is a field 'variableTypeAhead' defined in the mapping files. This field is used internally when a variable is marked as needing a full-text search as part of its owning entity (case/process/task). There is more about this field in a later section.

Variable Scoping

Often, a case or process model can have 'child' cases or processes which again can have their own 'child' cases or processes. Technically this forms a 'tree' structure: the root case or process is the root of the tree, and the lower level processes or cases are the leaves.

The notion of a tree is essential, as it defines scoping rules for variables:

  • Each element of the tree stores variables by default on its own element. A form on a child process/case instance stores those form variables on that process/case instance by default.

  • Explicit in/out mappings can be used to copy variables up and downwards the tree.

  • In the Flowable products (not in the open source engines) it is possible to reference the parent or the root in forms, expressions, etc. irrespective of how deeply nested in the tree.

Given this context, every variable includes scope information that can be used to pinpoint the origin of the variable precisely:

  • scopeId: The instance (e.g., process or case instance) id where this variable was created.

  • scopeType: The type (e.g., BPMN or CMMN) of the scope on which this variable was created.

  • scopeHierarchyType: The 'relationship' in the tree from where this variable value was added and where it originally came from. It can be PARENT, ROOT, or TASK (for task-local variables).

  • scopeDefinitionId: The definition (e.g., process or case definition) id corresponding to the scope where this variable originated from.

  • scopeDefinitionKey: The definition (e.g., process or case definition) key corresponding to the scope where this variable originated from.

With this information, queries like 'give me all variable values with name ABC for root case with definition XYZ' are easily expressed and execute quickly as all the information is natively indexed.

How variables are propagated in the tree needs additional explanation. When a new instance (work/process/case/task) is created, the tree is traversed upwards to gather the parent and root variables. When a new variable gets set (e.g., in a service task), the tree is traversed downwards to propagate the variable to the child instances (work/process/case/task).

Let us use an example to clarify the propagation. First, assume that we have the following tree structure (and assume for simplicity all instances are created on the start of the root case):

root case
      - case CA
          - process PA
          - process PB
                - task TA
                - task TB
      - case CB
          - process PC
          - task TC

Assume that the root case has a start form. Any form variables are set on the root case and are available to all children in the tree. This means that each instance in this tree (CA/CB/PA/PA/PB/PC/TA/TB/TC) has this variable in its indexed JSON document with:

  • scopeDefinitionId/Key the key of the root case.

  • scopeId is the case instance id of the root case.

  • scopeType is CMMN.

Technically, the lookup progresses upwards: on the creation of the child case/process/task the tree is inspected and the root variable is found and indexed.

Now suppose the root case contains a service task that gets executed sometime after starting the case instance. When that service task sets a variable (or for that matter, anything that sets a variable), the same propagation mechanism sets the variable on all its children in a downwards manner.

In the index, the data looks as follows:

{
    "id": "VAR-d06e1552-8151-11e9-8e41-38c986587585",
    "name": "startFormField",
    "type": "string",
    "textValue": "startValue",
    "textValueKeyword": "startValue",
    "rawValue": "startValue",
    "scopeId": "CAS-cdb780ce-8151-11e9-8e41-38c986587585",
    "scopeType": "cmmn",
    "scopeDefinitionId": "CAS-0f8996a7-8151-11e9-ae26-38c986587585",
    "scopeDefinitionKey": "myRootCase",
    "scopeHierarchyType": "root",
}

Variables that are set on a lower level task, for example, on task TB only exist on that task:

{
    "id": "VAR-d06e1552-8151-11e9-8e41-23c986587585",
    "name": "taskFormField",
    "type": "string",
    "textValue": "test",
    "textValueKeyword": "test",
    "rawValue": "test",
    "scopeId": "TSK-cdb780ce-8151-11e9-8e41-38c598727585",
    "scopeType": "bpmn",
    "scopeDefinitionId": "PRC-0f8996a7-8151-11e9-ae26-38c986587585",
    "scopeDefinitionKey": "processPB",
}

Parent variables are indexed using the same mechanism, but instead of being applied to the whole tree they are only applicable to one level. For example, for task TC, a variable from its parent case could like like:

 {
    "id": "VAR-cf4dd840-8151-11e9-8e41-38c986587585",
    "name": "initiator",
    "type": "string",
    "textValue": "admin",
    "textValueKeyword": "admin"
    "rawValue": "admin",
    "scopeId": "CAS-cdb780ce-8151-11e9-8e41-38c986587585",
    "scopeType": "cmmn",
    "scopeDefinitionId": "CAS-0f8996a7-8151-11e9-ae26-38c986587585",
    "scopeDefinitionKey": "myCase",
    "scopeHierarchyType": "parent",
},
If the parent and the root are the same (e.g., in the example above for case CA and CB), the variables get indexed twice, but with different scope information. This is to be consistent with how the form engine and the REST endpoints handle form variables.

Variable Updates

In the previous section, explained how each work/process/case/task instance gets variables indexed from itself, its parent, and the root. The obvious next question is how updates are handled.

Updates to variables are propagated through the whole tree for all instances (work/process/case/task) that have visibility on these variables. This means that an update on a root variable is propagated to all nodes. An update in the middle of a tree is propagated to the direct child (as the variable is a parent variable for that child).

Thus no special processing is required for data changes: the Flowable indexing logic ensures that updates are propagated correctly, and the index matches the real variable data.

Mapping Extensions

In some cases, the data indexing in the default mapping files is insufficient. This could be because a calculated field needs to be added; data needs to be fetched from some other data store, etc.

A Mapping Extension needs to be defined to extend the default index mappings. These mapping extensions are JSON files that are put on the classpath in the location com/flowable/indexing/mapping-extension/custom/. These mapping extension files are read on server bootup. Thus when changes are needed, a server reboot is required. Multiple extensions for the same index are possible.

Let us look at the structure of such an extension. Assume this file named, my-task-mapping-extension.json, is placed in the correct classpath location:

{
  "key": "my-task-mapping-extension",
  "extends": "tasks",
  "version": 1,

  "properties": {
    "customField1": {
      "type": "keyword"
    },
    "customField2": {
      "type": "geo-point"
    }
  }
}

At the top of the configuration, there are the following fields:

  • key: A mandatory field that uniquely identifies this extension. This key is important as it is used (in combination with the version) to determine if an upgrade is needed to the Elasticsearch mapping file.

  • extends: A mandatory field that defines which index mapping this configuration extends and can be any of the default indices (work/case-instances/process-instances/tasks/users/plan-items/activities).

  • version: An optional field (if missing then version 0 is assumed). Sequentially increment this field when changes are made to the extension. On server bootup, the mapping is automatically upgraded, if Elasticsearch allows the mapping change upgrade.

The properties field is where the actual extension is defined. The content of this property is taken as-is and merged with an existing mapping file before it is sent to Elasticsearch. As such, the format to use is the native Elasticsearch mapping definition format.

The properties are added after the original mapping is processed. This means that existing default property mappings are overridden as they replace the existing default definition.

Providing Data For Mapping Extensions

The previous section described the mechanism for extending the index mapping and explained the structure of the data. The section shows how to provide data for that structure and define how it needs to be returned in the APIs.

Providing the data is done by implementing the com.flowable.indexing.api.PlatformIndexedDataEnhancer interface (or extending the com.flowable.indexing.impl.IndexedDataEnhancerAdapter) and putting that bean in a Spring @Configuration class (or other Spring-compatible definitions). Multiple beans can be defined by implementing the same interface.

The PlatformIndexedDataEnhancer interface has callback methods that are called when data gets created. The entity in question gets passed along with the JSON data (as an ObjectNode instance). Note that at the data creation point, no knowledge exists about which index this data is added (because Flowable supports mapping the same data to multiple indices).

The methods on the interface follow the same pattern for each instance type:

  • one method for capturing the data when an instance is started.

  • one method for capturing the data when an instance is ended.

  • methods for updating specific data of the instance.

  • methods for reindexing specific data of the historic instance.

For example, for case instances, the following methods exist:

  • enhanceCaseInstanceStartData

  • enhanceCaseInstanceNameChangeData

  • enhanceCaseInstanceEndData

  • enhanceHistoricCaseInstanceReindexData

Similarly named methods exist for process instances, case instances, tasks, activities and plan item instances (covering all the default indices).

Additionally, there are callbacks for the variables (as often custom logic needs to be applied to them):

  • enhanceVariableCreateData

  • enhanceVariableUpdateData

  • enhanceVariableRemoveData

Let us look at an example where the task JSON is enhanced:

import com.flowable.indexing.api.IndexingJsonConstants;

public class CustomIndexedDataEnhancer extends IndexedDataEnhancerAdapter {

   @Override
   public void enhanceTaskCreateData(TaskEntity taskEntity, ObjectNode data,
            IndexingManagerHelper indexingManagerHelper) {

       if (data.has(CREATED_VARIABLES)) {
           JsonNode createdVariables = data.get(CREATED_VARIABLES);
           if (!createdVariables.isNull() && createdVariables.size() > 0) {
               for (JsonNode variableNode : createdVariables) {
                   if ("customerName".equals(variableNode.get(FIELD_VARIABLE_NAME).asText())) {
                       data.put("customProperty1", variableNode.get(FIELD_VARIABLE_TEXT_VALUE).asText());
                   }
               }
           }
       }

   }
}

Here, the created variable array is retrieved. If one of the variables is the customerName variables, the variable is copied to the customProperty1 on the root level of the JSON for the task. This may facilitate doing keyword queries against the value later.

The associated mapping extension would be:

{
  "key": "my-task-mapping-extension",
  "extends": "tasks",
  "version": 1,

  "properties": {
    "customProperty1": {
      "type": "keyword"
    }
  }
}

The last piece of the puzzle is to define how this data is to be returned. There is a dedicated result enhancer for each type (e.g., com.flowable.platform.service.task.TaskResultMapper.Enhancer). Continuing the example from above, assume we want to put the customProperty next to the regular variables:

public class MyTaskEnhancer implements TaskResultMapper.Enhancer {

    @Override
    public void enhance(TaskSearchRepresentation response, JsonNode indexedData) {
        if (indexedData.get("customProperty1").hasNonNull()) {
            response.getVariables().put("customProperty1", jsonNode.get("customProperty1").asText());
        }
    }
}

In case this is insufficient, it is also possible to replace the entire ResultMapper class. There is a dedicated result mapper for each type (e.g., TaskResultMapper, CaseInstanceResultMapper, etc.). When replacing the entire class, your implementation is responsible for creating the entire response (although it is possible to extend from the default implementation):

public class CustomTaskJsonMapper extends TaskJsonMapper {

   @Override
   public TaskSearchRepresentation convert(JsonNode jsonNode) {
       TaskSearchRepresentation taskResponse = super.convert(jsonNode);

       if (jsonNode.get("customProperty1").hasNonNull()) {
           taskResponse.getVariables().put("customProperty1",
                            jsonNode.get("customProperty1").asText());
       }

       return taskResponse;
   }
}

Note how this example extends the default implementation of TaskJsonMapper to avoid having to duplicate the default fields.

Extracting Variable Values Into Custom Variable Fields

This feature is available starting with version 3.2.1.

For some use cases, the default way variable values are indexed is insufficient. For example, for certain fields, a different tokenizer or analyzer is required to support different languages or different ways of indexing the data.

For regular indexed fields, changing this is possible by adding a mapping extension that overrides the default mapping for that field or adds a copy_to to another field with a different configuration.

For variables, as they are nested within the variables property and have an extensive way of tracking updates and deletions, this is not simple.

For this purpose, one uses a variable extractor in the mapping extension. For example:

{
  "version": 1,
  "extends" : "tasks",
  "key": "example-variable-properties-extractor",

  "variableProperties": {
    "customAnalyzedField": {
      "type": "text",
      "analyzer": "whitespace"
    },
    "customAnalyzedFieldKeyWord": {
      "type": "keyword"
    }
  },

  "variableExtractors": [
    {
      "filter": {
        "name": "customVariable",
        "scopeDefinitionKey": "oneTaskProcess"
      },
      "to": "customAnalyzedField",
      "type": "string"
    },
    {
      "filter": {
        "name": "customVariable",
        "scopeDefinitionKey": "oneTaskProcess"
      },
      "to": "customAnalyzedFieldKeyWord",
      "type": "string"
    }
  ]
}

What is unique about a mapping extension is there is now a variableProperties property. Similar to the properties of a mapping extensions, the content of this property is taken as-is and added under the variables properties in the index mapping.

In this example, two fields are added:

  • a customAnalyzedField with a custom analyzer

  • a customAnalyzedFieldKeyWord with the keyword type

This part defines the structure of the new properties in the index. The second part is how the values get mapped into these new fields, which is done by the variableExtractors section.

This is an array of one or more extractors that require at a minimum:

  • A filter: A variable extractor has a filter that defines when the extractor is applied. Any of the fields of the mapping can be used here. The filter is checked on variable create, update, or delete. An explicit null value means that no value should be set for the field (e.g., "scopeDefinitionName": null means that only those instances that have no name for the definition match this filter).

  • A to: The to defines the name of the field to map into.

  • A type: The type in the extractor is required as it configures how the JSON field is mapped into the actual field. The supported types are:

    • boolean

    • date

    • double

    • integer

    • long

    • short

    • string

    • uuid

    • null

These types are the same variable types supported in the Flowable engines.

In the example above, there are two extractors (lines 18-23 and 26-31) that both work on the same customVariable in the oneTaskProcess, as defined by the filter. One first extractor maps it to the customAnalyzedField field (line 22), the other to the customAnalyzedFieldKeyWord field (line 30). These fields are defined in lines 7 and 11 above in variableProperties.

When looking at the index, these new fields are part of the variables property of tasks, case, process, or work instances:

"variables": [
  {
    "name": "customVariable",
    "type": "string",
    "textValue": "test",
    "textValueKeyword": "test"
    "customAnalyzedField": "test",
    "customAnalyzedFieldKeyWord": "test",
  },
  ...
]

The textValue and textValueKeyWord are the default fields when using a String variable. The customAnalyzedField and customAnalyzedFieldKeyWord were added by the configuration above. These fields can now be used in queries (see the section on custom queries):

{
  "version": 1,
  "name": "example-custom-query",
  "type": "query",
  "sourceIndex": "tasks",

  "customFilter": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "variables",
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "variables.name": "customVariable"
                    }
                  },
                  {
                    "match": {
                      "variables.customAnalyzedFieldKeyWord": "test"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

Extracting Values from JSON Variables

This feature is available starting with version 3.2.1.

The Flowable engines support storing variables as JSON (as ObjectNode or ArrayNode). When these variables are indexed, the value gets stored as-is in the jsonValue field.

Elasticsearch does not support different types for the same property. Hence the reason the jsonValue gets stored without any additional analyzing or indexing out of the box.

When properties of a JSON variable need to be used in queries or sorting, they need to be extracted using a variableExtractor.

Such an extractor is part of a mapping extension configuration and looks as follows:

"variableExtractors": [
    {
      "filter": {
        "name": "customer",
        "scopeDefinitionKey": "myDefinition"
      },
      "path": "/nestedField/customerName",
      "to": "extractedCustomerName",
      "type": "string"
    }
  ]

A variable extractor has a filter that defines when the extractor is applied. Any of the fields of the mapping can be used here. The filter is checked on variable create, update or delete. An explicit null value means that no value should be set for the field (e.g., "scopeDefinitionName": null would mean that only those instances that have no name for the definition match this filter).

In this example, when a variable with the name customer is created, updated, or deleted, and the definition of the case or process definition is myDefinition, the extractor is applied.

The path is a JSON Pointer expression that defines the path to the property that needs to be extracted. The to parameter is the name of the variable which contains the extracted value.

Here, the customerName of the customer JSON variable can be found under the nestedField in the customerName property, and it is mapped to extractedCustomerName. This means that in the index, there is another variable (arguably this is a virtual variable as there is no counterpart in the process or case instance) indexed with the name extractedCustomerName. However, contrary to the JSON value, this is a first-class variable, and it can now be used in querying or sorting.

The type in the extractor is essential, as it configures how the JSON field is mapped to the actual field. Supported types are:

  • boolean

  • date

  • double

  • integer

  • long

  • short

  • string

  • uuid

  • null

These types are the same variable types supported in the Flowable engines.

This type is interpreted as a variable type, meaning that the rules described in variable values apply.

In this example, as the type is string, this means that the extractedCustomerName has a textValue and textValueKeyWord like any other string variable.

Using this variable extractor, there are now two entries in the index: one for the original (JSON) variable and one for the extracted value. Both are stored as a variable under the variables property and thus can be used in queries, sorting, etc.

"variables": [
    {
      "name": "customer",
      "type": "json"
      "jsonValue": {
        "nestedField": {
            "customerName": "John Doe",
            "customerAge": 42
        }
      },
      ...
    },
    {
        "name": "extractedCustomerName",
        "type": "string",
        "textValue": "John Doe",
        "textValue": "John Doe",
        "textValueKeyword": "John Doe"
    },
    ...
]

Using Variable Values For Full-Text Search

In some situations, using variables values in a full-text query for finding cases, process, work, or task instances is required. To avoid potentially many mismatches by including all variables in the index in a search, it is possible to precisely define which variables are taken into account during the full-text search.

To define the variables, a mapping extension with a fullTextVariables section is used:

{
  "key": "my-task-mapping-extension",
  "extends": "tasks",
  "version": 1,

  "fullTextVariables": [
      {
          "name": "customerName",
          "scopeDefinitionKey": "myRootCase"
       }
  ],

  "properties": {
    ...
}

The fullTextVariables property is an array of variable matching definitions. Any field (see above) that is indexed for a variable can be used in this definition. In the example here, customerName, when defined in a case or process with definition key myRootCase matches this definition. Note that when omitting the scopeDefinitionKey the specification would match the customerName variable in any deployed case or process.

The value of all variables that match the definition(s) are copied into a field fullTextVariables that is defined on the JSON document of the instance (work/process/case/task) itself and can, therefore, be used to query in a full-text manner. In case multiple variable definitions are configured, all the variables are copied into the full-text field (e.g., we want to query based on customer name, location, or description at the same time).

Again looking at the example above, this means that the value of customerName is indexed both in the variables array of a task JSON document and copied to the fullTextVariables for full-text search. A query, give me all tasks where fullTextVariables contains with 'abc' thus works right out of the box:

{
    "query": {
        "match" : {
            "fullTextVariables" : {
                "query" : "abc"
            }
        }
    }
}

Implementation-wise, fullTextVariables is nothing more than a shorthand for a variable extractor that copies that particular variable to the variableTypeAhead field that is part of the index mappings (more precise: it’s a field of every element of the variables array of a task, case, process or work element). That variableTypeAhead field is configured out-of-the-box to copy its value to the fullTextVariables field on the root of the json document.

Given the fact that configuring the fullTextVariables comes down to a variable extractor behind the scenes, this means that all of the usual callbacks for enhancing or changing data can be used. However, take in account that the variable extractor already has been applied when the callback is executed. This means that the value that will be copied to the fullTextVariables field will be in the special variableTypeAhead field. For example, suppose we want to always change the text value to uppercase for some reason, we can write the following enhancer:

 public class ExampleEnhancer extends IndexedDataEnhancerAdapter {

            @Override
            public void enhanceVariableCreateData(VariableInstanceEntity variable, String id, String type,
                    String hierarchyType, ObjectNode data, IndexingManagerHelper indexingManagerHelper) {
                enhanceVariableValueIfNeeded(variable.getName(), variable.getValue(), data, indexingManagerHelper);
            }

            @Override
            public void enhanceVariableUpdateData(VariableInstanceEntity variable, String id, String type,
                    String hierarchyType, ObjectNode data, IndexingManagerHelper indexingManagerHelper) {
                enhanceVariableValueIfNeeded(variable.getName(), variable.getValue(), data, indexingManagerHelper);
            }

            @Override
            public void enhanceHistoricVariableReindexData(HistoricVariableInstance historicVariableInstance, String id,
                    String type, String hierarchyType, ObjectNode data, IndexingManagerHelper indexingManagerHelper) {
                enhanceVariableValueIfNeeded(historicVariableInstance.getVariableName(), historicVariableInstance.getValue(), data, indexingManagerHelper);
            }

            private void enhanceVariableValueIfNeeded(String variableName, Object variableValue, ObjectNode data, IndexingManagerHelper indexingManagerHelper) {
                if (variableName.equals("employeeName")) {
                    List<ObjectNode> variableObjectNodes = indexingManagerHelper.getVariableDataObjectNodes(data, "employeeName");
                    for (ObjectNode variableObjectNode : variableObjectNodes) {
                        variableObjectNode.put(IndexingJsonConstants.FIELD_VARIABLE_TYPE_AHEAD, variableValue.toString().toUpperCase());
                    }
                }
            }


        }
    }

Note that, to apply the change all the time we’d need to implement the create/update/reindex callbacks. Also note the getVariableDataObjectNodes method. It looks through all the gathered data and returns all matching json ObjectNodes that match the variable name. At the point of the callback the variable values have been duplicated for the self/parent/root use cases (see variable scoping for more details) thus (maybe counter-intuitively) more than one instance will typically be returned as shown in the example above.

Custom Aliases

Defining a custom query on the indexed data is often needed, for example for dashboards, reports, or other use cases. To define such queries, Flowable leverages the alias functionality of Elasticsearch. Such an alias is conceptually a 'view' on the data in an index that returns only matching data. Such an alias is always made in the context of an existing index. The reason for this is that all security and permissions checks for that particular index are added to make sure no data is returned that the user is not allowed to access.

If you do need to expose full and unrestricted querying, it is always possible to add in a custom REST controller class that exposes the com.flowable.indexing.SearchService functionality.

Such an alias is defined in a JSON configuration file found on the classpath at: com/flowable/indexing/mapping-extension/custom.

Let us look at an example of such a custom alias:

{
  "key": "custom-tasks",
  "sourceIndex": "tasks",
  "type": "alias",
  "version": 1,

  "customFilter": {
    "bool": {
      "must": [
        {
          "term": {
            "scopeDefinitionKey": "myProcess"
          }
        },
        {
          "nested": {
            "path": "variables",
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "variables.name": "accountNumber"
                    }
                  },
                  {
                    "match": {
                      "variables.numberValue": 123
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

The key, together with the version is important as they are used to determined if changes have happened to the definition. If so, the old alias is deleted and replaced by the new one. Type needs to be alias when defining an alias.

The customFilter is where the actual query is defined. The content of this field is a native Elasticsearch query. Anything that can be in Elasticsearch is possible. Note that the query is enhanced by Flowable code to include permission checks.

As permission checks get added automatically to the query, it is mandatory that the query is bool at the root of the query.

When the server is booted up with this alias configuration, the data can now be fetched by doing a REST call to the platform-api/search/query-tasks/alias/custom-tasks endpoint. In the example above, all tasks that are part of instances with a definition that has a key 'myProcess' and where the 'accountNumber' variable is equal to '123' are returned.

The following API’s are available, for each type respectively:

  • Tasks: platform-api/search/query-tasks/alias/{aliasKey}

  • Case instances: platform-api/search/query-case-instances/alias/{aliasKey}

  • Process instances: platform-api/search/query-process-instances/alias/{aliasKey}

  • Work instances: platform-api/search/query-work-instances/alias/{aliasKey}

The response format of these REST API’s are consistent with those for the Form engine:

  • root variables are found under the root property

  • parent variables are found under the parent property

  • variables defined on the scope itself are properties directly on the response

Dynamic Queries

In the previous section, an alias was created that gives a view on the index. Aliases are a powerful mechanism as they are exposed to the highest level in Elasticsearch (meaning you can query or do anything with an alias that you can do with a regular index). However, an alias cannot utilize dynamic parameters (this is a limitation of Elasticsearch aliases). For this, Flowable allows the definition of dynamic queries.

A dynamic query definition is a JSON configuration file found in the classpath location com/flowable/indexing/mapping-extension/custom. A dynamic query definition looks similar to an alias definition:

{
  "key": "custom-query-tasks",
  "type": "query",
  "sourceIndex": "tasks",
  "version": 1,

  "parameters": {
    "accountNumber": "number"
  },

  "customFilter": {
    "bool": {
      "must": [
        {
          "term": {
            "scopeDefinitionKey": "myProcess"
          }
        },
        {
          "nested": {
            "path": "variables",
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "variables.name": "accountNumber"
                    }
                  },
                  {
                    "match": {
                      "variables.numberValue": "{accountNumber}"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

The key, type, version and sourceIndex properties have the same functionality as they do an alias. Also, the customFilter looks similar to the customFilter of the alias definition but instead of a hardcoded accountNumber variable in the query, it is a dynamic value that is referenced using curly braces {accountNumber}. The parameters section above defines which parameters this query accepts and what the type is (which is needed to properly parse the parameters when coming through the REST API for example).

The valid parameter types are:

  • string: The value is passed to the query as-is.

  • number: The value is passed as a quoted string (i.e., 55 → "55").

  • boolean: The value is passed as a quoted string (i.e., true → "true").

  • stringList: A comma separated list of values (i.e., ?myTerms=val1,val2).

    • Each value in the list is quoted before it is passed to the query.

    • Example query usage with above: { terms: [ "{myTerms}" ] } is expanded into {terms: ["val1", "val2"]} (Note: The list ([]) for the query is declared in the query mapping).

  • simpleList: Is the same as the stringList above except the values are passed as is (i.e., val1,val2 → [val1, val2]).

When the server is booted up with the alias configuration shown above, the data is fetched by doing a REST call to the /search/query-tasks/query/custom-query-tasks endpoint. Continuing with the example, all tasks that are part of instances with a definition that has a key 'myProcess' and where the 'accountNumber' variable is equal to the provided value are returned. Now use cases that take in parameters from a form field are possible:

  • /search/query-tasks/query/custom-query-tasks?accountNumber=123

  • /search/query-tasks/query/custom-query-tasks?accountNumber=9876554

  • …​

The following API’s are available, for each type, respectively:

  • Tasks: platform-api/search/query-tasks/query/{queryKey}

  • Case instances: platform-api/search/query-case-instances/query/{queryKey}

  • Process instances: platform-api/search/query-process-instances/query/{queryKey}

  • Work instances: platform-api/search/query-work-instances/query/{queryKey}

The response format is similar to the alias REST response.

Default Parameters

The following parameters are reserved keywords that are available when defining queries:

  • currentUserId injects the id of the currently logged-in user.

  • currentGroups injects an array of the group keys for the current logged-in user.

  • currentTenantId injects the current tenant id of the logged-in user.

The following parameters are reserved keywords that are set when they are passed in the URL:

  • sort: The fields to sort on; this can be a comma-separated list.

  • order: The sort order (asc or desc).

  • start: The start of the requested page of data. The default is zero (0).

  • size: The size of the page of data. The default is 20.

Custom Sorting

This feature is available starting with version 3.2.1.

By default, all fields in the index mapping that are sortable in Elasticsearch (numbers, keywords, etc.) can be passed as a parameter when calling the query REST endpoint, passing in a value for the sort and order parameter.

For example (where example is the name of query):

  • /platform-api/search/query-tasks/query/example?sort=scopeDefinitionKey&order=asc

  • /platform-api/search/query-tasks/query/example?sort=priority&order=asc

Multiple levels of sorting is also possible:

  • /platform-api/search/query-tasks/query/example?sort=scopeDefinitionKey,priority&order=asc,desc

Queries of a similar form are available for process, case, and work instances.

Often, there is also a need to sort on a variable value. To do so, a sortParameters configuration needs to be added to the query definition.

For example:

{
  "version": 1,
  "name": "example1",
  "type": "query",
  "sourceIndex": "tasks",

  "sortParameters": {

    "customerName": {
      "type": "text",
      "variable": true
    }

  },

  "customFilter" : {
    "bool": {
      "must": [
        {"term": {"scopeDefinitionKey": "myDefinition"} }
      ]
    }
  }
}

Here, the customerName is a variable. The type is important to define correctly in this definition. Supported types are text, number, date, boolean, decimal. The variable flag needs to be set to true.

The type needs to be provided as it cannot be determined from the REST endpoint URL when getting the data.

The query can now be used through the following REST URL:

  • /platform-api/search/query-tasks/query/example1?sort=customerName&order=asc

Behind the scenes, Elasticsearch expands this definition into syntax that looks like:
 "sort": [
       {
          "variables.textValueKeyword" : {
             "order" : "asc",
             "nested": {
                "path": "variables",
                "filter": {
                   "term" : { "variables.name" : "customerName" }
                }
             }
          }
       }
    ]

For a numeric variable, this it looks like:

 "sortParameters": {

    "customerAge": {
      "type": "number",
      "variable": true
    }

  }

And for a date variable:

"sortParameters": {
    "customerBirthDate": {
      "type": "date",
      "variable": true
    }
}

Multiple levels of sorting based on variable values are also possible:

"sortParameters": {
    "companyName": {
      "type": "text",
      "variable": true
    },
    "companyAge": {
      "type": "number",
      "variable": true
    }
  }

Sorting can also be done using an extracted JSON value.

For example, given the following variable extractor in a custom mapping:

  "variableExtractors": [
    {
      "filter": {
        "name": "customer",
        "scopeDefinitionKey": "myProcess"
      },
      "path": "/nestedField/companyName",
      "to": "extractedCustomerName",
      "type": "string"
    }
  ]

The extractedCustomerName can now be used for sorting, in the query definition:

"sortParameters": {
    "extractedCustomerName": {
      "type": "text",
      "variable": true
    }
  }

Which in turn can be used in the REST request:

  • /platform-api/search/query-tasks/query/testSortParams6?sort=extractedCustomerName&order=asc

Paging Queries

To fetch a page of data from the index, pass the start and size parameters:

For example: /platform-api/search/query-tasks/query/example?sort=scopeDefinitionKey,priority&order=asc,desc&start=10&size=50

Dynamic Queries with Templates

When the out-of-the-box query mechanisms are insufficient, it is possible to create a fully custom query using a Freemarker template.

Such a query looks as follows:

{
  "version": 1,
  "name": "queryWithTemplateExample",
  "type": "query",
  "sourceIndex": "process-instances",

  "templateResource": "classpath:/com/flowable/test/index/mapping/template/query-process-instances.ftl"

}

This looks like a typical query, except there is only one field to reference the template that needs to be used.

The templateResource is a Spring resource reference. In this case, a template on the classpath is referenced.

Such a template can be anything that Elasticsearch supports. Since this is a Freemarker template, conditional blocks and replacements are possible. For example:

<#ftl output_format="JSON">
{
    "from": 0,
    "size": 20,
    "query": {
        "bool": {
            "must": [
                {"term": {"processDefinitionKey": "${key}"}}
            ]
        }
    },
    <#if customSort == "sortVersionA">
    "sort" : [
        {
            "variables.textValueKeyword" : {
                "order" : "asc",
                "nested": {
                    "path": "variables",
                    "filter": {
                        "term" : { "variables.name" : "customerName" }
                    }
                }
            }
        }
    ]
    </#if>
    <#if customSort == "sortVersionB">
    "sort" : [
        {
            "variables.numberValue" : {
                "order" : "desc",
                "nested": {
                    "path": "variables",
                    "filter": {
                        "term" : { "variables.name" : "customerAge" }
                    }
                }
            }
        }
    ]
    </#if>
}

In this example, the key parameter is used to filter the process definition and two types of sort are possible using the customSort parameter.

The REST endpoint to call these queries is the same as for normal queries. For example, this template could be executed using:

  • platform-api/search/query-process-instances/query/queryWithTemplateExample?key=myProcess&customSort=sortVersionA

  • or platform-api/search/query-process-instances/query/queryWithTemplateExample?key=anotherProcess&customSort=sortVersionB

  • …​

When using a template, no automatic permission checks are applied (as is the case for normal queries). This means that permissions need to be added by the creator of the template.

Low-Level Bulk Request Interceptor

The Low-Level Bulk Request Interceptor is an experimental interface and the APIs may change.

All indexing in Flowable is handled by the async executor which executes history jobs that contain the data that needs to be indexed. Data from the same database transaction are grouped in one bulk request to Elasticsearch for performance reasons.

It is possible to add a custom interceptor, both before and after the execution of a bulk index request. This interceptor exists on the lowest possible level of the indexing logic and can be used to make changes to the request just before it is sent to Elasticsearch or right after the response comes back from ElasticSearch. This interceptor should only be used when no other higher-level mechanism exists.

To define a new interceptor, add a Spring bean implementing the com.flowable.indexing.job.history.asyncBulkIndexRequestInterceptor interface to your Spring @Configuration class. Multiple beans implementing this interface are possible.

Example:

public static class MyInterceptor implements BulkIndexRequestInterceptor {

    @Override
    public void beforeRequestAddToBulkIndexingSession(String index, IndexedDataObject indexedDataObject, ObjectNode originalData) {

    }

    @Override
    public void beforeRequestExecution(BulkRequest bulkRequest) {

    }

    @Override
    public void afterRequestExecution(BulkRequest bulkRequest, BulkResponse bulkResponse) {

    }

Reindexing

Reindexing may be required when:

  • Elasticsearch does not have strong transactional guarantees like a relational database. Although the chance is small, a hardware or network failure could corrupt the indexes which would mandate a reindex. In that case, the 'master data' from the relational database is used to rebuild the indexes.

  • When a Flowable product upgrade adds a new type of data or data that was not previously indexed, a reindex might be needed (this should not frequently happen, as Elasticsearch can cope with many types of changes here). Changes that require reindexing are documented in the upgrade notes for a release.

Reindexing is rebuilding the index for the particular data from scratch, based on the data in the relational database. Reindexing is triggered through Flowable Control or the REST API by an administrator and is executed asynchronously in the background. The reason for this is that it would not be suitable to keep an HTTP thread blocked for so long while reindexing. For example, the server could deem the thread starved and kill it, and load balancers and proxies often terminate connections that stay open for too long.

Reindexing uses the following algorithm:

  1. A new index is created with a unique name.

  2. Data from the database tables are fetched in pages (multiple rows at once) and in parallel.

  3. Each page is processed, and the data from the tables is transformed into a job for the Flowable async executor. This action too, is parallelized.

  4. The async history executor now picks up the job and transforms the job into a bulk index request for Elasticsearch.

  5. Once Elasticsearch acknowledges the indexation, the job is deleted. In case some parts of the bulk index failed, a new job with the failing parts is created to be picked up and retried later.

  6. When all jobs have been processed, the alias is swapped to the index created in step 1.

  7. A new reindexing is now planned to catch any data that has been added, updated, or removed since the reindexing was triggered. This has no impact on users, as it runs asynchronously in the background, and the index was already swapped in step 6.

A dedicated REST endpoint for each type exists for reindexing, making it possible to select which index needs to be rebuilt. This reindexing is done by executing an HTTP POST request using administrator credentials.

  • Work instances: platform-api/work/reindex

  • Case instances: platform-api/case-instances/reindex

  • Process instances: platform-api/process-instances/reindex

  • Tasks: platform-api/tasks/reindex

  • Activities: /activities/reindex

  • Plan items: /plan-items/reindex

  • Users: idm-api/users/reindex

Reindexing is an expensive operation, especially with lots of data. For a production system, the recommendation is always to have a backup using the snapshot mechanism of Elasticsearch.

In the case of a failure, a recent snapshot can be used to restore the system quickly. However, data that was created or changed since the last snapshot is not in that snapshot. The data still resides in the relational database. When now triggering a reindex, the index is updated in the background asynchronously, and the system eventually recovers automatically. This way, the users can continue working with the system, albeit in a state where some data is missing or incorrect.