May 08, 2020

Contemplative practices, optimal stopping, explore/exploit

These days I find myself pulled in a bunch of directions re: contemplative practice – nondual shaiva yoga, trauma-releasing exercises, Internal Family Systems, the Unlocking the Emotional Brain framework, Consciousness Medicine stuff, continuing/deepening my zazen practice, reading more about all this stuff, etc.

I'm not really sure where to point, what to prioritize, how to approach this. (A while ago this not-knowing-where-to-point would have been uncomfortable & anxiety-provoking but currently I find myself just sorta noticing it without getting very activated.)

This morning it occurred to me that this is a "decision-making under uncertainty" problem – I want to allocate between 30 minutes to 3 hours a day to this stuff (I don't feel called to go full monastic), and I don't know how to budget that time.

Flipping through Algorithms to Live By, it seems like this is most analogous to either an optimal-stopping problem or a multi-armed bandit problem.

If it's like optimal stopping, the question is how much time to spend sampling different practices before committing to one:

Before he became a professor of operations research at Carnegie Mellon, Michael Trick was a graduate student, looking for love. “It hit me that the problem has been studied: it is [an optimal-stopping] problem! I had a position to fill [and] a series of applicants, and my goal was to pick the best applicant for the position.”

So he ran the numbers. He didn’t know how many women he could expect to meet in his lifetime, but there’s a certain flexibility in the 37% Rule: it can be applied to either the number of applicants or the time over which one is searching. Assuming that his search would run from ages eighteen to forty, the 37% Rule gave age 26.1 years as the point at which to switch from looking to leaping. A number that, as it happened, was exactly Trick’s age at the time.

So when he found a woman who was a better match than all those he had dated so far, he knew exactly what to do. He leapt. “I didn’t know if she was Perfect (the assumptions of the model don’t allow me to determine that), but there was no doubt that she met the qualifications for this step of the algorithm. So I proposed,” he writes.

This isn't perfect – I don't need to have settled the issue by age 40, and I don't need to commit to just one practice (though if I'm at most dedicating 3 hours/day, and more realistically 30-60 minutes/day, then there is a real tradeoff between which practices I do).

So maybe it's more like a multi-armed bandit problem. Again from Algorithms to Live By:

In general, it seems that people tend to over-explore – to favor the new disproportionately over the best. In a simple demonstration of this phenomenon, published in 1966, Amos Tversky and Ward Edwards conducted experiments where people were shown a box with two lights on it and told that each light would turn on a fixed (but unknown) percentage of the time. They were then given 1,000 opportunities either to observe which light came on, or to place a bet on the outcome without getting to observe it. (Unlike a more traditional bandit problem setup, here one could not make a “pull” that was both wager and observation at once; participants would not learn whether their bets had paid off until the end.)

This is pure exploration vs. exploitation, pitting the gaining of information squarely against the use of it. For the most part, people adopted a sensible strategy of observing for a while, then placing bets on what seemed like the best outcome – but they consistently spent a lot more time observing than they should have. How much more time? In one experiment, one light came on 60% of the time and the other 40% of the time, a difference neither particularly blatant nor particularly subtle. In that case, people chose to observe 505 times, on average, placing bets the other 495 times. But the math says they should have started to bet after just 38 observations – leaving 962 chances to cash in.

...

The standard multi-armed bandit problem assumes that the probabilities with which the arms pay off are fixed over time. But that’s not necessarily true of airlines, restaurants, or other contexts in which people have to make repeated choices. If the probabilities of a payoff on the different arms change over time – what has been termed a “restless bandit” – the problem becomes much harder. (So much harder, in fact, that there’s no tractable algorithm for completely solving it, and it’s believed there never will be.) Part of this difficulty is that it is no longer simply a matter of exploring for a while and then exploiting: when the world can change, continuing to explore can be the right choice. It might be worth going back to that disappointing restaurant you haven’t visited for a few years, just in case it’s under new management.

Hmm... I'm not sure whether or not to model contemplative practices as "restless bandits." From one perspective they're obviously timeless & unchanging (the capital-D Dharma, the Absolute). But I'm an agent embedded in these systems who is expressing these things, and I'm definitely changing, so maybe the practices do "come under new management" over time after all, at least from my perspective.

If they are restless bandits, I think that implies more exploring, more hopping around, more dropping things when they don't appear to yield fruit.

And if they aren't (i.e. if they're bandits with a fixed payout), I think that implies more commitment, more doubling-down (from my present margin).

So where does this leave me?

It's interesting to think about things in this frame, but it feels pretty academic to write about. Which is sorta a shame, because this really seems to touch on a profound question – in some sense "what to do next?" encapsulates the entire thing.

I guess in the end this comes down to taste & intuition.

Some of this stuff feels pretty tasty + wholesome at the moment (e.g. TRE, e.g. reading Kaj's posts about IFS & UtEB), so I'll keep pursuing those. Other of this stuff feels intriguing but also sorta ugh (e.g. nondual shaiva, e.g. sometimes zazen, e.g. actually doing IFS) – probably good practice to keep exploring those too. And some of this stuff doesn't really click at all (e.g. how pjeby talks about doing reconsolidation near the end of this comment), so I won't bother with those until they click.