Millisecond Forums

On constrained randomizations in psychological experiments

By Dave - 8/8/2011

Questions involving complex, constrained (pseudo-)randomizations come up regularly, so I thought I'd post some general remarks on the topic.

0: Introduction

Few people seem to realize that such randomizations pose a number of non-trivial and interrelated mathematical, computational and practical problems, none of which are specific to Inquisit (i.e., they apply to any application). Constrained randomizations may be thought of as specific instances of constraint satisfaction problems (CSPs), a term common in Artificial Intelligence and certain areas of organizational research (see e.g Tsang, 1993). A CSP involves a set of variables V, a finite domain D of values associated with each variable,  and lastly a set of constraints C restricting the values the variables may take on simultaneously. The ultimate goal is to assign a value to each variable such that all given constraints are satisfied. Clearly, randomized sequences or combinations of stimuli with imposed selection constraints fit nicely into this general scheme. I am primarily mentioning the CSP framework as a starting point for further literature research for anyone interested in the technicalities of constraint satisfaction; the CSP framework and its formalization methods will however not play any significant role in the context of the following discussion.

In this discussion, I will use a scaled-down example based on an actual request to illustrate some of the issues involved. However, the remarks apply to any type of CSP, e.g. generating a sequence of items where items of the same class c are not supposed to follow each other more than k times. Or anything else you can think of along those lines. But let's move on to the example. Consider the following  scenario:

Suppose you have a task in which subjects first rate n items on a 3-point rating scale. In the second part you want to randomly pair the rated items into n/2 pairs such that

  • every item is a member of a single pair (constraint C1) and

  • both items making up the pair have received different ratings (constraint C2).

I: Constraint satisfiability

Let's assume n=10 items (A to J) received the following ratings:

    Item:        A    B    C    D    E    F    G    H    I    J
    Rating:      2    3    2    1    1    2    2    2    3    2

Given the above data set, is it at all possible to generate a sequence fulfilling C1 and C2? Recall, we want to assemble five 2-item pairs, every item is only to be used once and the 2 items forming a pair must have been rated differently. The answer is: No.  No matter what you do, you'll always end up with (a) at least one pair where both members share the same rating (violating C2) or (b) at least one item being repeated across pairs (violating C1). In this particular case, it's fairly easy to see that whenever more than n/2 items have received the same rating, either C1 or C2 cannot be satisfied.

While this may seem obvious, the question of constraint satisfiability quickly becomes less straightforward to answer for larger problems (i.e., involving more items, categories and a greater number of  possibly more complex constraints). Mathematically proving the satisifiability of a given CSP is often non-trivial, and sometimes practically impossible due to excessive computational demands (see sections III and IV below).

II: “Naive” accept/reject sampling

So, let's take a look at a different hypothetical data set:

    Item:        A    B    C    D    E    F    G    H    I    J
    Rating:      1    3    2    1    1    2    2    2    3    2

Here the problem is satisfiable in principle, as no more than n/2 items share the same rating. So how do we go about randomly assembling a valid sequence then? An intuitive, but ultimately naïve approach would proceed as follows:

(1) pick an item at random (without replacement),
(2) pick a second item at random,
(3) compare their ratings,
(4) put the second item back in case of identical ratings and start over with (2). Otherwise go to (1) for the next pair.

The process is repeated until all items have been sampled. We might indeed succeed with this approach – if we're lucky. We might as well be not. Let's assume the following sequence which might be generated by the above algorithm:

    Pair:        F/B    I/A    E/C    J/D    H/G
    Ratings:     2/3    3/1    1/2    2/1    2/2
    Valid:       yes    yes    yes    yes    no

After four perfectly valid pairs have been generated on the fly, there are only two items left in our selection pool. Unfortunately, they both share the same rating, thus again violating C2. Whenever you work with a simplistic accept/reject successive sampling approach, the risk of constraint violations increases towards the end of a list, i.e., when less and less items are left in the selection pool. However, for at least some scenarios, on-the-fly accept/reject sampling may just be good enough.

III: Combinatorial explosion of the search space

Now, let's move on to a less intuitive, but ultimately no less naïve way to look at the problem. The pairing problem can viewed as a temporal sequence of n items, with items #1 and #2, #3 and #4, and so on forming the pairs. E.g. {D,J,A,F, … ,I,C}.

Why not (1) generate all possible sequences of those n items (i.e., all permutations of a given list of items) and (2) check every single permutation for constraint violations and either accept or reject it based on the result? Clearly, all constraint-satisfying sequences must be a subset of all possible sequences. Let's scale our example down a little further and only consider n=6 items (A to F) with the following ratings:

    Item:        A    B    C    D    E    F
    Rating:      1    3    2    1    1    2

Even with only n=6 items, there already are 6!=720 permutations of the list A to F:

Only a fraction of those 720 sequences will satisfy the given constraints. Checking them all would take quite some time. It might even take minutes to only identify a single valid sequence. And the problem only gets worse under more realistic assumptions. After all, the typical experiment in psychology involves dozens or even hundreds of items, not just six. With a modest number of n=10 items, you already have 10!=3.628.800 candidate sequences to check, n=20 yields 20!=2.432.902.008.176.640.000 permutations. I don't even know how to pronounce that number. This is sometimes – rather aptly – called a “combinatorial explosion” of the search space.

Even with the combined computing power of the entire world, performing exhaustive searches across large combinatorial spaces such as the above is practically impossible. It would take billions of years to complete. And even for smaller problems, you certainly don't want your subjects to wait for minutes, hours, days, weeks, months or years while your program merrily tries to find a constraint-satisfying sequence of items, do you? So you might rightfully ask (1) how can such problems be tackled computationally (section IV) and, more importantly, (2) what are the implications for  designing computerized experiments involving constrained randomizations (section V)?

IV: Computational and algorithmic strategies and limitations

Algorithms used in solving CSPs employ a number of strategies to reduce the search space to reasonable size and thus make a problem computationally tractable. Some of those techniques are:

  • Scan and repair: First generate a sequence via simple accept/reject sampling (section II). If the sequence violates given constraints, try to make it compliant by flipping item's positions. If necessary, repeat for a maximum number of iterations.

  • Backtracking: When constraints are violated in a given complete sequence of length n, go back and undo an earlier decision and follow a different selection process from there. This can be considered an extension of the simple accept/reject algorithm outlined in section II. If necessary, repeat for a maximum number of iterations.

  • Look ahead: Given a partial sequence of length n-k with k<n, try to determine as early as possible if it can possibly yield a valid solution by also considering the k items which haven't been selected yet (the ones left on the “heap”). If possible, proceed, otherwise backtrack.

  • Heuristics: For some specialized problems, algorithms might also rely on domain-specific heuristics. E.g., in our rating example, the rating category containing the most items is most problematic. Generating a valid solution may be facilitated by making sure this category is exhausted early in the list (sometimes referred to as “hardest first” heuristic).

The above list is by no means complete. Some algorithms may implement only one or two of the outlined strategies, more sophisticated algorithms may use all of them in parallel. However, it should be clear that all of the above techniques are non-deterministic by definition, i.e., no such algorithm is guaranteed to identify a valid solution (but they will at least fail in reasonable time, instead of going on forever). In practice, such algorithms perform well on moderately large problems involving relatively few constraints, i.e., they will usually find one or several valid solutions within minutes provided the CSP is satisfiable (see section I). That however means, you usually should not try to generate constrained random sequences “online” (i.e., while your experiment is running; it will potentially take too long) or even “on-the-fly” via naive accept/reject sampling (which may lead to invalid solutions; see section II).

V: Summary – Then what?

The only pragmatic solution is performing such complex randomizations “offline”: Generate a number of sequences with the desired properties, either by hand or with the aid of a suitable computer program (e.g. MIX; van Casteren & Davis, 2006). Then use these pre-computed, fixed sequences in your experimental programs and procedures. Everybody does it like that, believe me. This is the only approach guaranteeing a valid solution (provided the constraints are satisfiable; see section I). If the above however is not viable due to other considerations, you should think about

  • Relaxing constraints (i.e., make them “soft”, not “hard” constraints) towards the end of a list (as “naive” accept/reject sampling will automatically do). Given the rating example, either allow items to repeat or allow pairs with equal ratings towards the end of a list.

  • Sample non-exhaustively. I.e., provide more items than you will end up sampling. This means there'll always be a certain number of redundant items left on the “heap”, and in most cases there'll be at least one satisfying any given constraint when sampled. Again, applied to the rating example, suppose you have people rate n=20 items, but only use half of them to form pairs later on.

I hope the information given clarifies some of the questions and challenges you might encounter when dealing with constrained randomizations.


Tsang, E. (1993). Foundations of constraint satisfaction. London, Academic Press.

van Casteren, M., & Davis, M.H. (2006). Mix, a program for pseudorandomization. Behavior Research Methods, 38(4), 584-589.