Galileo: Efficiently Encoding Segments for Client-Side Dynamic Configuration

Summary As Careem set out building a dynamic configuration system (for rollouts and A/B tests), we needed to support any kind of segmentation and make sure it is reliable and scalable. This article presents a way to tackle both dynamic segmentation by pushing predicates down as code to our backend SDKs, and static segmentation by running queries on our data warehouse and shipping compressed binary integer sets to an S3 bucket that can be read by our SDKs without decoding.
Team Data & AI
Author(s) Roman Atachiants, Ph.D
About the Author(s) Roman is a seasoned software engineer in Careem, working in the Data & AI team and helping to build a state of the art experimentation, feature toggles platform as well as a machine learning platform.

Introduction

Whether you are deploying a configuration on a canary instance, performing a pilot study of a new application feature to your own colleagues, progressively rolling out a feature to a specific geographical area, or restricting it to only a certain set of VIP users, it all requires some kind of user or traffic segmentation.

Segmentation is a necessary component of any dynamic configuration system, and ultimately it allows you to dynamically restrict the audience of some configuration or a piece of code. As in Careem, we set out to build a dynamic configuration system (for rollouts and A/B tests), we needed to support any kind of segmentation and make sure it is reliable and scalable.

In Careem, we needed the ability to support segmentation of traffic dynamically as well as a more challenging task of allowing people to create static segments that contain potentially millions of users and the membership lookup (i.e. checking if a user is part of a segment) of such segments must not take more than a few microseconds. This brings the challenge of not being able to simply call a database since the mere fact of performing a network call would take around 1000+ microseconds (1ms or more).

Dynamic and Static Segmentation 

Generally speaking, segmentation is simply a way of defining a subset of a population-based on some criterion. It usually applies to users but it could be bookings, points of interest, or any other entity. As mentioned above, you can even think of canarying a change on a specific machine as segmentation.

First, let’s conceptualize the problem at hand. Consider an example where a customer has a few attributes. This is normal modelling we go through in software development. We have an entity “User” which has a few attributes (also known as features) and each of them has been valued at a particular point in time. In this example, I have 10$ in my wallet right now but might go for a coffee later and will have less money in 10 minutes. By the time you read this, Roman might be a 60 years old bankrupt user. Every value here is time-dependent and could change.

Now, what we call a segment is simply a query around attributes of multiple entities within a collection. For example, a segment of “adults in Dubai” would include a couple of attributes, operators, and values.

We broadly categorize segmentation in two categories here:

  1. Dynamic (real-time) segmentation. Where such a query is applied on every request but requires a live input for each value. The simplest and most reliable way of doing this is by providing the attributes and their values to the function that is capable of performing the logical evaluation.
  2. Static (offline) segmentation. Where such a query is applied on a periodic basis on an existing dataset. For example, if you have a “users” table you can execute this query (e.g. SQL query) every 1 hour and store the resulting user IDs in a separate location.

In order to provide support for dynamic segmentation, our SDK requires engineers to provide a context (i.e. a map of key/value pairs) for each request and our UI allows end-users to define filtering conditions. Since the SDK is actually executing code, the UI conditions are translated into LUA executable code.

On the other hand, for the static segmentation, our SDK is able to efficiently load sorted, binary-encoded segments (i.e. lists of user IDs) and does a membership lookup using binary search. Such segments are picked up from S3 by the SDK and transparently integrated with Careem’s data warehouse (e.g. Presto, Spark SQL).

In the image above, you can see how some of these conditions are configured by our users in the web portal, in this specific example, we have a rollout with multiple branches, first of which is a condition that checks if a user belongs to a static segment of “tigers” (our colleagues in Careem). The second branch is a segment of 50% of users in UAE, etc.

Segmentation & Progressive Rollout

In order to support both static and dynamic segmentation in the SDK, we need to have the ability to (a) evaluate the boolean query during the API call efficiently and (b) synchronize large segments of user IDs by shipping them to the SDK efficiently.

The first point is relatively straightforward since our SDK interprets LUA scripts and expression logic is already built. We simply need to generate the code which accepts a dictionary of key/value pairs from the user and generates appropriate if statements.

function main(args)

  city = args[“city”]

  age  = args[“age”]  

  // Rollout #1 on “adults in Dubai” segment

  if age >= 18 and city == “Dubai” then

    …

    return “green”

  end

  // Rollout #2

  return “yellow”

end

Now the second point is quite interesting since it requires the SDK to potentially load large files of user IDs. Our target was to be able to configure static segments with several million IDs and we needed to minimize network transfer and memory usage, as these offline segments need to be updated regularly.

Small static segments (e.g. under 1000 IDs) could be encoded directly in the text and be part of the LUA script itself. On the other hand, large segments will be represented by a file in S3 which the SDK will load/watch similarly to the LUA code itself. In fact, our segments and LUA scripts sit in the same S3 bucket for simplicity. 

Encoding Large Segments

Now, let’s examine how we encode and perform lookups on large segments. Before we dive into the implementation, we first need to examine the requirements we set ourselves when designing the segment encoding algorithm.

  • Encoding should have a good compression ratio, where each 64-bit integer identifier would need to be encoded to around 1-2 bytes, which would keep large segments reasonably sized. Ideally, we would want to keep a segment of 1 million user IDs within the 1-2 MB range.
  • When the SDK receives a segment, decoding should not be required as it should be able to perform a membership check on compressed data.
  • Searching  (i.e. membership check) an element within a segment should be done without copying any memory around. For Golang implementation specifically, we needed to keep heap allocation per search at zero.
  • Search complexity should be kept around O(log n) as every call would potentially need to evaluate a segment (which could be large).
  • Search implementation must be relatively simple as we need to provide multiple implementations for different SDKs in different programming languages.

One of the properties of the problem that we can leverage to our advantage is the nature of segments. If we consider all segments by being a simple set of integers, we can employ integer compression techniques such as run-length encoding or a delta encoding.

For our algorithm, we decided to use delta encoding combined with variable-size integer encoding and binary search for membership checks. If we consider a simple example of a segment containing 8 numbers, we can reduce variance by delta encoding, which in the following example. This leaves us with relatively small numbers and using variable-size integer encoding (i.e. ProtoBuf VarInt) we can reduce most of the deltas to a single byte.

If we simply encode a large number of IDs and apply delta encoding, we can efficiently compress the set of integers. However, this would not be efficient to search as to find the Nth element we would require O(N) operations. In order to make sure we can reduce the complexity and be able to check membership using a binary search, we partition our segment into blocks of deltas and create a header that contains the initial pointer as well as the offset allowing us to jump directly.

For example, in the image above we have 2 blocks of deltas and a header. The header itself is a list of values and offsets:

  • Value represents the actual full-size starting value of the segment, in our case, the first block of deltas starts with 1 and contains [1, 2, 4, 6] whereas the second block starts at 15 and contains [15, 16, 17, 18].
  • Offset is the index in the byte array where the block begins. Note that we don’t need to keep the size, as the size itself can be calculated by comparing the 2 subsequent offsets in the header.

For the bucket size, we currently have a constant 20 deltas size, but it can be made dynamic to accommodate for CPU cache lines.

As you can see, membership check can now be done using a combination of binary search on the header and then a linear search on the block of deltas. Linear search on a small number of elements is relatively fast since the delta calculations happen sequentially and all of the data is very likely to be in the L1-cache. 

Benchmark Results

In order to validate the performance of this simple algorithm, we have implemented this in Go and Java. Below are the results of a benchmark in Go that performs a random membership search performed on segments of random integers. This was executed on an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz laptop.

Segment Size Operations Duration Bytes Allocs
100 7,879,459/sec 151.9 ns/op 0 B/op 0 allocs/op
1000 6,987,966/sec 171.6 ns/op 0 B/op 0 allocs/op
10000 5,835,382/sec 204.6 ns/op 0 B/op 0 allocs/op
100000 4,777,353/sec 249.4 ns/op 0 B/op 0 allocs/op
1000000 4,696,431/sec 253.5 ns/op 0 B/op 0 allocs/op

Similar numbers were observed during the benchmarks of our Java implementation.

Conclusions

In this article, we have presented a way to tackle both static and dynamic segmentation for client-side dynamic configuration at Careem.

  • Dynamic segmentation was tackled by pushing predicates down as code to our backend SDKs running in various microservices.
  • Static segmentation was tackled by running queries on our data warehouse and shipping compressed binary integer sets to an S3 bucket that can be read by our SDKs without decoding.

Overall, we have deployed the solution to production and have seen widespread adoption in Careem of both ways of segmenting users. Being able to check segment membership without decoding while not consuming an excessive amount of memory enabled our internal end-users to roll out features and experiment seamlessly without worrying about the segment size.