Skip to main content

Sampling and Caching

As a consequence of the lazy evaluation paradigm used by all timelines, each timeline is recomputed from scratch every time .collect() is called on it. If you are dealing with small amounts of data and inexpensive operations, this probably isn't a practical concern. If not, you might need to cache the timeline to improve performance.

This is very simple, just call .cache(...) on your timeline. It has exactly the same semantics as .collect(...). Timelines are not cached by default because if you're manipulating enough data for performance to be a concern, storing a cache for each intermediate timeline might use too much memory.

If you explicitly call .collect() many times on the same timeline, it will be easy to spot the performance penalty; but sometimes a collect call is hidden inside another operation. Let's take the code below and see how that could happen:

val n1 = plan.resource("int_resource_1", Numbers.deserializer())
val n2 = plan.resource("int_resource_2", Numbers.deserializer())

// Construct a very expensive profile that we should only evaluate once.
val superExpensiveProfile = n1.map2Values(n2) { l, r, _ -> ackermann(l, r) }

Sampling

Let's say I wanted to sample the value of expensiveProfile at a bunch of different points, rather than using the whole profile. You can just call expensiveProfile.sample(myTime), but be careful because sample just delegates to .collect(Interval.at(myTime)) and unwraps the segment. If you only wanted to sample once, this is great, because it avoids evaluating expensiveProfile for most of the plan, only calculating the one segment you need.

But if you call .sample many times, especially if you call it on the same segment multiple times, it will recalculate many more times than you need. So before you do any sampling, call cache:

// Say we only care about the first day of the plan.
val interval = Duration.days(0) .. duration.days(1)

// Cache the first day
expensiveProfile.cache(interval)

// Sample every 10 minutes for the first day
for (time in interval step Duration.minutes(10)) {
val sample = expensiveProfile.sample(time)
// do something with the sample
}

Timeline reuse

What if expensiveProfile isn't the final product, and needs to be used in the construction of multiple other timelines? In that case, any time those downstream timelines are collected, expensiveProfile will be recomputed too. We can avoid this by caching it on whatever interval we are going to need it for.

// Two intermediate profiles that both depend on `expensiveProfile`
val intermediate1 = expensiveProfile + 1
val intermediate2 = expensiveProfile * 2

// Final result profile that depends on both intermediate profiles,
// and thus transitively depends on `expensiveProfile` twice.
val result = derived1 / derived2

// If we called `result.collect()` here, expensiveProfile would be
// computed twice.

// Cache the whole profile
expensiveProfile.cache()

// Now we can collect without wasted computation
/* do something with */ result.collect()

When writing procedural constraints, keep in mind that you return a Violations timeline which is then collected by the driver outside your constraint code. So if your result depends on a timeline multiple times, that timeline will be computed multiple times unless it is cached, even though you didn't call .collect() yourself.