Background Image

Diffusion 6.1 extends conflation to manage data delivery to slow or disconnected clients

27 Apr 18

In Push Technology’s ongoing effort to simplify data management, optimization, and integration for companies developing business critical applications, Diffusion 6.1 adds new capabilities to manage data delivery to slow or disconnected clients.

Introduction

Diffusion uses topics to deliver data to clients via a publish and subscribe interaction model. A Diffusion topic is an identified stream of data values to which a client can subscribe. Whenever a topic is updated with a new value, all subscribers of the topic are automatically sent the new value. Diffusion’s delta-streaming technology minimises the bytes-on-the-wire required to transfer values. Once a client session has received an initial value for a topic, each subsequent value is sent as a delta – a specialised encoding of the bytes that have changed. Deltas are automatically decoded by the client library and are transparent to the application. Delta-streaming dramatically reduces the network costs of keeping clients up to date.

The Diffusion server has an in-memory queue for each client session which holds messages to be sent to the session. Most messages contain application data in the form of topic updates (values and deltas) for topics to which the session is subscribed. Other messages are either internal service requests and responses – these are used to implement the client API, or are related to connection maintenance – for example, a request to close a connection.

A message queue per client on the server has a number of advantages. First, the size of each message is typically only a few tens of bytes. Queuing allows the server to batch many messages to fill network packets. This must be done with care to avoid introducing unnecessary delays, but is one of the ways Diffusion achieves extreme network efficiency. Second, the queue acts as a buffer to allow bursts of messages to be delivered over slow connections. Third, the queue allows a client to disconnect and then reconnect without loss of data. While the client is disconnected, messages will accumulate in the queue and be delivered on reconnection.

The server memory allocated for these message queues is configurable, but finite. If a client session is slow, unresponsive, or disconnected for a long time, its queue can reach its configured limit. If it exceeds the limit, the client has failed to keep up with the message publication rate so the server will close the session.

Conflation is the process the server uses to intelligently combine topic update messages to remove stale or redundant updates, and so manage the size of the outbound message queues. For example, if there are a several updates for the same topic, they can be combined into a single update. The client receives a single update with the latest known value. Commonly, dropping the interim updates doesn’t affect the correctness of an application, but has the advantage of allowing a disconnected client to reconnect faster; a slow client to keep up; and reduces the per-session memory required on the server, allowing more sessions to be supported. Conflation can be disabled for topics where interim updates are important. The sessions with the largest queues are typically slow consumers or have been disconnected – both will benefit from conflation.

Conflation support in previous Diffusion releases

Conflation was first introduced in Diffusion 2.1.8 (March 2009). Over subsequent releases, conflation capabilities have been extended to expose more control through the server-side publisher API. However, as of Diffusion 6.0, conflation has significant restrictions that make it unwieldy for general Diffusion applications.

The restrictions of Diffusion 6.0 conflation include:

  • Diffusion 6.0 conflation can only be applied to a subset of topics. It can be used with stateless, single-value, and record topics, all of which were deprecated in Diffusion 6.0, but does not support the new suite of json, binary, int64, double, string, recordv2, and time series topic types.
  • Diffusion 6.0 conflation is hard to control through the client API. It is off by default, and must be enabled on a session-by-session basis by an application component. Conflation can be enabled or disabled per topic by changing the server configuration, but per-topic control is not available using the API. Applications can customise conflation calculations by configuring  server-side MessageMatcher and MessageMerger components. This mixture of server configuration and the dependency on server-side APIs prevents applications hosted on Diffusion Cloud™ from using conflation.
  • Diffusion 6.0 conflation performance is limited. When enabled, conflation is calculated eagerly. Each topic update queued for a session is immediately conflated with any existing update for the topic. This minimises the number of updates in the message queue, but the per-update operations have a high run-time cost because the server cannot exploit batch processing.

Conflation in Diffusion 6.1

Diffusion 6.1 delivers a major refresh of conflation, extending new capabilities, making it easier to use, and improving run-time performance.

From Diffusion 6.1, conflation is on by default. This default can be changed in the server configuration, or it can be disabled per-session through the client control setConflated API call. Conflation can be tuned on a per-topic basis through a conflation policy set using the new CONFLATION topic property when a topic is created. We expect many users will benefit from conflation without having to change the default configuration, or even be aware of it.

Conflation now fully supports json, binary, int64, double, string, recordv2, and time series topics. These topic types share common capabilities based on a formal data type model. In particular, they provide standard ways to apply a delta to a value to create a new value; and to combine multiple deltas into a single delta. Exploiting these capabilities allowed conflation to be implemented without the need for a server-side API.

The CONFLATION topic property supports four different conflation policies: off, conflate, always, and unsubscribe.

The off policy simply disables conflation for the topic, so a session will receive every update for the topic. Whether this is necessary depends on the application design, but it is typically only appropriate for a minority of topics. Since the conflation policy is per-topic, conflation will still be applied to updates for topics that have other policies.

The default conflation policy is conflate. Topic updates for topics that have this policy are only conflated when an entire queue is conflated. In Diffusion 6.1, this happens if there is a new message to send to a session with a full queue. If the message cannot be queued without breaching the configured size limit (expressed as number of bytes or number of messages), the server will take the following steps:

  1. If the session is connected, and the network connection can accept data, attempt to send the pending messages.
  2. If no messages could be sent, or there is still insufficient room, conflate the queue. The server will fully conflate all topic updates in the queue, on a topic-by-topic basis. This batching of operations improves the efficiency of the implementation.
  3. If there is still insufficient room for the new message, the client has failed to keep up with the publication rate. The server will close the session.

The implementation considers the value and delta updates in the queue, and the current topic value if known (by default the server stores topic values to supply to new subscribers, but this can be disabled by setting the DONT_RETAIN_LAST_VALUE topic property to false). Conflation will reduce the queued updates to a single a value or a composite delta, based on which requires fewest bytes to send.

Thus, the conflate policy reduces the likelihood of a client session being terminated due to an overflowed message queue.

The always policy provides the Diffusion 6.0 behaviour of eagerly conflating topic updates. A queue will contain at most one update for each topic with the always policy. Consequently, this minimises the number of topic updates in the queue, but comes with a per-operation cost so should be applied sparingly.

The final conflation policy added in Diffusion 6.1 is unsubscribe. This is useful for topics that provide secondary information to clients, and can be safely discarded without affecting the primary functionality of an application. For example, a topic providing a news ticker information. The unsubscribe policy is only applied when conflation is triggered due to a full message queue. If this happens, topic updates for topics using the unsubscribe policy will be discarded and the session will be unsubscribed from the topic. An unsubscription notification is sent to the client with the unsubscribe reason BACK_PRESSURE. The client can later chose to re-subscribe to the topic.

The off and unsubscribe policies are respected for the deprecated stateless, single-value, and record topic types. If such a topic is configured with either always or conflate policy, the Diffusion 6.0 conflation configuration will be applied.

Conflation example

Let’s consider an example of conflation for a single topic with string values. The following diagram depicts a message queue that has been selected for conflation.

Values are depicted by circles and deltas are depicted by triangles. To simplify the diagram, delta operations are represented using plus and minus symbols and operate on the suffix of the values. Real deltas are composed of primitive operations on the binary representation of a value, such as remove bytes 7 to 10 or insert literal bytes “xyz” at position 13, and allow arbitrary transformations of the value.

In the diagram, the server has a queue of three deltas to send to the session, changing the value from a, through the sequence ab, ac, to the current value held by the server acd.

The next diagram shows one possible conflation of the queue. The deltas have been composed into a single composite delta that changes the value from a to acd.

If the composite delta is larger than the current value, the server will send the current value instead, as depicted in the following diagram.

This example covers updates for a single topic. In practice, a session will likely have pending updates for many different topics.

Future enhancements

We are planning to add background conflation to a future release of the product. The idea is to use otherwise idle cycles to conflate large message queues, before they reach their size limits. It may help think of this as analogous to the way a typical Java garbage collector operates – incremental work is done in the background to reduce the need for a forced collection.

We expect background conflation to provide performance improvements, allowing more conflation operations to be performed. The net result will be further reductions in the memory-per-session cost, reduced data rates to slow clients, and faster reconnection.

 


The Diffusion Intelligent Data Platform manages, optimizes, and integrates data among devices, systems, and applications. Push Technology pioneered and is the sole provider of real-time delta-data streaming™ technology that powers mission-critical business applications worldwide. Leading brands use Push Technology to fuel revenue growth, customer engagement, and business operations. The products, Diffusion® and Diffusion Cloud™, are available on-premise, in-the-cloud, or in a hybrid configuration, to fit the specific business and infrastructure requirements of the applications operating in today’s mobile obsessed, everything connected world. Learn how Push Technology can reduce infrastructure costs, and increase speed, efficiency, and reliability, of your web, mobile, and IoT application.

LEARN MORE: Case Studies and Developer Resources