What you need to do is define proper /size, /position, /txbgcolor etc. for the <text> elements in the foreground. Then a single, simple rectangular <shape> in the background is enough. Example:
<text a>
/ items = ("Foo Bar Baz")
/ txcolor = (white)
/ txbgcolor = (black)
/ size = (25%, 0.25px*display.width)
/ position = (50%, 50%)
/ vjustify = top
</text>
<text b>
/ items = ("Foo")
/ txcolor = (white)
/ txbgcolor = (transparent)
/ size = (25%, 0.25px*display.width)
/ position = (50%, 50%)
/ vjustify = center
</text>
<text c>
/ items = ("Foo Bar Baz")
/ txcolor = (white)
/ txbgcolor = (transparent)
/ position = (50%, 50%)
/ size = (25%, 0.25px*display.width)
/ position = (50%, 50%)
/ vjustify = bottom
</text>
<shape bg>
/ color = (blue)
/ size = (26%, 0.26px*display.width)
/ position = (50%, 50%)
</shape>
<trial mytrial>
/ stimulusframes = [1=bg,a,b,c]
/validresponse = (57)
</trial>