In order to do this I tried to create separate text stimuli with the written versions of the audio assuming I could link them and create trails which presented them both at the same time
That's exactly what you need to do:
<trial mytrial>
/ stimulusframes = [1=mysound, mytext]
[...]
</trial>
<sound mysound>
/ items = mysounditems
/ select = noreplace
</sound>
<text mytext>
/ items = mytextitems
/ select = current(mysound)
</text>
<item mysounditems>
/ 1 = "a.wav"
/ 2 = "b.wav"
[...]
</item>
<item mytextitems>
/ 1 = "written contents of a.wav"
/ 2 = "written contents of b.wav"
[...]
</item>
See "How to present stimulus pairs" topic in the Inquisit documentation for further details and examples.