Missing Data in .DAT file


Author
Message
lukeholmes
lukeholmes
Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)
Group: Forum Members
Posts: 7, Visits: 19
Hi everyone,
I hope you are all safe and sound.

Unfortunately I have a problem with my data output from Inquisit 5, and I am hoping one of you might shed some light on this. I am not a particularly experienced Inquisit user and am working with inherited code, so I will do my best to explain, but please ask if you have any questions.

Our experiment displays 6 videos to participants, and after each video, it asks them 3 questions.

The output for this is shown above - the stimulusitem1 column contains first the name of the video, then the full text of each of the three questions in the order they were asked. This allows us to marry the response to the specific question, and all three questions to the relevant video.

We recently wanted to add two more questions to the experiment, and did some relatively simple modifications to the code to achieve this. Although we tested it, we did not catch a key difference between the two DAT files - the full question is no longer listed in stimulusitem1, nor does each question have a unique stimulusnumber1. Instead, the output now looks like this:


Unfortunately, the new questions are about empathy, and so we cannot simply average all 5 questions together.

I have looked at the code, but I cannot determine what could cause a change like this. I have also researched this on these forums, and can confirm (for example) that absolutely no changes were made to the <data> section between files. It appears to me that the only changes were the addition of new questions to a pre-existing "questions" section, and a re-arranging of the trial order to suit this. I am absolutely baffled as to what has caused this change in the output.

Thus, I have two questions:

1. What has caused the change in the data output? I have attached both code files to this post, and I am hoping someone might shed some light on this mystery and how I might fix it.

2. Is there any way I can use the data I have to determine which question is which in the second output? I was thinking, for example, that we know the time and date of the study - is this what is used to seed the RNG? If so, is there any way to determine what the order was after the fact? I realise that this is a long shot, but with current circumstances, it may be a while before we can gather any more data, so I would be quite upset to lose this lot.

Thanks very much for any answers you can give - please let me know if there is any more information I can provide.



Attachments
new code.exp (283 views, 6.00 KB)
old code.exp (273 views, 6.00 KB)
lukeholmes
lukeholmes
Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)
Group: Forum Members
Posts: 7, Visits: 19
Sorry, I should probably clarify - the order of the questions is randomised, which is why the order being visible in the DAT file is very important.

Dave
Dave
Supreme Being (1M reputation)Supreme Being (1M reputation)Supreme Being (1M reputation)Supreme Being (1M reputation)Supreme Being (1M reputation)Supreme Being (1M reputation)Supreme Being (1M reputation)Supreme Being (1M reputation)Supreme Being (1M reputation)
Group: Administrators
Posts: 13K, Visits: 104K
lukeholmes - 3/18/2020
Hi everyone,
I hope you are all safe and sound.

Unfortunately I have a problem with my data output from Inquisit 5, and I am hoping one of you might shed some light on this. I am not a particularly experienced Inquisit user and am working with inherited code, so I will do my best to explain, but please ask if you have any questions.

Our experiment displays 6 videos to participants, and after each video, it asks them 3 questions.

The output for this is shown above - the stimulusitem1 column contains first the name of the video, then the full text of each of the three questions in the order they were asked. This allows us to marry the response to the specific question, and all three questions to the relevant video.

We recently wanted to add two more questions to the experiment, and did some relatively simple modifications to the code to achieve this. Although we tested it, we did not catch a key difference between the two DAT files - the full question is no longer listed in stimulusitem1, nor does each question have a unique stimulusnumber1. Instead, the output now looks like this:


Unfortunately, the new questions are about empathy, and so we cannot simply average all 5 questions together.

I have looked at the code, but I cannot determine what could cause a change like this. I have also researched this on these forums, and can confirm (for example) that absolutely no changes were made to the <data> section between files. It appears to me that the only changes were the addition of new questions to a pre-existing "questions" section, and a re-arranging of the trial order to suit this. I am absolutely baffled as to what has caused this change in the output.

Thus, I have two questions:

1. What has caused the change in the data output? I have attached both code files to this post, and I am hoping someone might shed some light on this mystery and how I might fix it.

2. Is there any way I can use the data I have to determine which question is which in the second output? I was thinking, for example, that we know the time and date of the study - is this what is used to seed the RNG? If so, is there any way to determine what the order was after the fact? I realise that this is a long shot, but with current circumstances, it may be a while before we can gather any more data, so I would be quite upset to lose this lot.

Thanks very much for any answers you can give - please let me know if there is any more information I can provide.



The way stimulusitem, etc. columns work is that the 1st stimulusitem column captures information for the 1st stimulus displayed by a given <trial> element, the 2nd one for the 2nd stimulus displayed, and so forth. This is covered in the documentation for the /columns attribute:

"The first stimulusnumber specified in the columns attribute indicates that the item number of the first stimulus appearing on the current trial should be recorded. The second stimulusnumber indicates that the item number of the second stimulus presented on the trial should be recorded, and so on. If no 1st, 2nd, 3rd, etc. stimulus is presented on a given trial, a '0' is recorded.

Similarly, the first stimulusitem specified in the columns attribute indicates that the actual item presented by the first stimulus appearing on the current trial should be recorded. If the stimulus is <text> or <port>, the actual item is recorded. If the stimulus is a <sound>, <picture>, or <video>, the item's filename is recorded. If the stimulus is a <shape>, the name of the shape is recorded. Like stimulusnumber, each successive stimulusitem records the item from each successive stimulus presented on the trial. If no 1st, 2nd, 3rd, etc. stimulus is presented on a given trial, a '0' is recorded."

Your question <likert> trial presents two stimuli: (1) videoOff and (2) question.

<likert question>
/stimulusframes=[1= videoOff, question]
/ fontstyle = ("MS Shell Dlg 2", 1.8%, false, false, false, false, 5, 1)
/numpoints=7
/ anchorwidth = 5%
/anchors=[1="not at all"; 4="average"; 7="very much"]
/position=(50,75)
</likert>

Thus, the single stimulusitem column (and single stimulusnumber column) you have specified in your data captures only data for the first stimulus, the videoOff <port> element. No data for the 2nd stimulus, the question <text> element is logged.

<data>
/format=tab
/columns=[date time build subject trialcode blockcode blocknum
trialnum latency response stimulusitem stimulusnumber]
</data>

You can resolve this by either adding another stimulusitem / stimulusnumber column


<data>
/format=tab
/columns=[date time build subject trialcode blockcode blocknum
trialnum latency response stimulusitem stimulusnumber stimulusitem stimulusnumber]
</data>

or by simply switching the order of the two stimuli in the question <likert>'s /stimulusframes around

<likert question>
/stimulusframes=[1= question, videoOff]
/ fontstyle = ("MS Shell Dlg 2", 1.8%, false, false, false, false, 5, 1)
/numpoints=7
/ anchorwidth = 5%
/anchors=[1="not at all"; 4="average"; 7="very much"]
/position=(50,75)
</likert>

in which case data for the question <text> element would be captured by the single stimulusitem column (but no data for the videoOff <port> stimulus).

Inquisit 4 and 5 actually behave identically as far as the above is concerned. That is: neither "old code.exp" nor "new code.exp" as attached would capture information about the question <text> element, so I suspect that the data in your 1st screenshot stem from a period when you had not yet added the <port> elements to the trials.

As to your final question, no, unfortunately it is not possible to somehow infer or reconstruct the missing information after the fact.

lukeholmes
lukeholmes
Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)
Group: Forum Members
Posts: 7, Visits: 19
Dave - 3/18/2020
lukeholmes - 3/18/2020
Hi everyone,
I hope you are all safe and sound.

Unfortunately I have a problem with my data output from Inquisit 5, and I am hoping one of you might shed some light on this. I am not a particularly experienced Inquisit user and am working with inherited code, so I will do my best to explain, but please ask if you have any questions.

Our experiment displays 6 videos to participants, and after each video, it asks them 3 questions.

The output for this is shown above - the stimulusitem1 column contains first the name of the video, then the full text of each of the three questions in the order they were asked. This allows us to marry the response to the specific question, and all three questions to the relevant video.

We recently wanted to add two more questions to the experiment, and did some relatively simple modifications to the code to achieve this. Although we tested it, we did not catch a key difference between the two DAT files - the full question is no longer listed in stimulusitem1, nor does each question have a unique stimulusnumber1. Instead, the output now looks like this:


Unfortunately, the new questions are about empathy, and so we cannot simply average all 5 questions together.

I have looked at the code, but I cannot determine what could cause a change like this. I have also researched this on these forums, and can confirm (for example) that absolutely no changes were made to the <data> section between files. It appears to me that the only changes were the addition of new questions to a pre-existing "questions" section, and a re-arranging of the trial order to suit this. I am absolutely baffled as to what has caused this change in the output.

Thus, I have two questions:

1. What has caused the change in the data output? I have attached both code files to this post, and I am hoping someone might shed some light on this mystery and how I might fix it.

2. Is there any way I can use the data I have to determine which question is which in the second output? I was thinking, for example, that we know the time and date of the study - is this what is used to seed the RNG? If so, is there any way to determine what the order was after the fact? I realise that this is a long shot, but with current circumstances, it may be a while before we can gather any more data, so I would be quite upset to lose this lot.

Thanks very much for any answers you can give - please let me know if there is any more information I can provide.



The way stimulusitem, etc. columns work is that the 1st stimulusitem column captures information for the 1st stimulus displayed by a given <trial> element, the 2nd one for the 2nd stimulus displayed, and so forth. This is covered in the documentation for the /columns attribute:

"The first stimulusnumber specified in the columns attribute indicates that the item number of the first stimulus appearing on the current trial should be recorded. The second stimulusnumber indicates that the item number of the second stimulus presented on the trial should be recorded, and so on. If no 1st, 2nd, 3rd, etc. stimulus is presented on a given trial, a '0' is recorded.

Similarly, the first stimulusitem specified in the columns attribute indicates that the actual item presented by the first stimulus appearing on the current trial should be recorded. If the stimulus is <text> or <port>, the actual item is recorded. If the stimulus is a <sound>, <picture>, or <video>, the item's filename is recorded. If the stimulus is a <shape>, the name of the shape is recorded. Like stimulusnumber, each successive stimulusitem records the item from each successive stimulus presented on the trial. If no 1st, 2nd, 3rd, etc. stimulus is presented on a given trial, a '0' is recorded."

Your question <likert> trial presents two stimuli: (1) videoOff and (2) question.

<likert question>
/stimulusframes=[1= videoOff, question]
/ fontstyle = ("MS Shell Dlg 2", 1.8%, false, false, false, false, 5, 1)
/numpoints=7
/ anchorwidth = 5%
/anchors=[1="not at all"; 4="average"; 7="very much"]
/position=(50,75)
</likert>

Thus, the single stimulusitem column (and single stimulusnumber column) you have specified in your data captures only data for the first stimulus, the videoOff <port> element. No data for the 2nd stimulus, the question <text> element is logged.

<data>
/format=tab
/columns=[date time build subject trialcode blockcode blocknum
trialnum latency response stimulusitem stimulusnumber]
</data>

You can resolve this by either adding another stimulusitem / stimulusnumber column


<data>
/format=tab
/columns=[date time build subject trialcode blockcode blocknum
trialnum latency response stimulusitem stimulusnumber stimulusitem stimulusnumber]
</data>

or by simply switching the order of the two stimuli in the question <likert>'s /stimulusframes around

<likert question>
/stimulusframes=[1= question, videoOff]
/ fontstyle = ("MS Shell Dlg 2", 1.8%, false, false, false, false, 5, 1)
/numpoints=7
/ anchorwidth = 5%
/anchors=[1="not at all"; 4="average"; 7="very much"]
/position=(50,75)
</likert>

in which case data for the question <text> element would be captured by the single stimulusitem column (but no data for the videoOff <port> stimulus).

Inquisit 4 and 5 actually behave identically as far as the above is concerned. That is: neither "old code.exp" nor "new code.exp" as attached would capture information about the question <text> element, so I suspect that the data in your 1st screenshot stem from a period when you had not yet added the <port> elements to the trials.

As to your final question, no, unfortunately it is not possible to somehow infer or reconstruct the missing information after the fact.

Thanks so much for the detailed response! I will add extra columns to the stimulusitem output to ensure we capture all the data from now on.

With regards to inferring the old data - how is the RNG seeded exactly? I was hoping you'd say it was seeded by the system time, which I could replicate. I did try looking into this, but all I found was another post saying it's not possible to set it manually.

Thanks again
lukeholmes
lukeholmes
Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)Expert (1.1K reputation)
Group: Forum Members
Posts: 7, Visits: 19
Dave - 3/18/2020
lukeholmes - 3/18/2020
Hi everyone,
I hope you are all safe and sound.

Unfortunately I have a problem with my data output from Inquisit 5, and I am hoping one of you might shed some light on this. I am not a particularly experienced Inquisit user and am working with inherited code, so I will do my best to explain, but please ask if you have any questions.

Our experiment displays 6 videos to participants, and after each video, it asks them 3 questions.

The output for this is shown above - the stimulusitem1 column contains first the name of the video, then the full text of each of the three questions in the order they were asked. This allows us to marry the response to the specific question, and all three questions to the relevant video.

We recently wanted to add two more questions to the experiment, and did some relatively simple modifications to the code to achieve this. Although we tested it, we did not catch a key difference between the two DAT files - the full question is no longer listed in stimulusitem1, nor does each question have a unique stimulusnumber1. Instead, the output now looks like this:


Unfortunately, the new questions are about empathy, and so we cannot simply average all 5 questions together.

I have looked at the code, but I cannot determine what could cause a change like this. I have also researched this on these forums, and can confirm (for example) that absolutely no changes were made to the <data> section between files. It appears to me that the only changes were the addition of new questions to a pre-existing "questions" section, and a re-arranging of the trial order to suit this. I am absolutely baffled as to what has caused this change in the output.

Thus, I have two questions:

1. What has caused the change in the data output? I have attached both code files to this post, and I am hoping someone might shed some light on this mystery and how I might fix it.

2. Is there any way I can use the data I have to determine which question is which in the second output? I was thinking, for example, that we know the time and date of the study - is this what is used to seed the RNG? If so, is there any way to determine what the order was after the fact? I realise that this is a long shot, but with current circumstances, it may be a while before we can gather any more data, so I would be quite upset to lose this lot.

Thanks very much for any answers you can give - please let me know if there is any more information I can provide.



The way stimulusitem, etc. columns work is that the 1st stimulusitem column captures information for the 1st stimulus displayed by a given <trial> element, the 2nd one for the 2nd stimulus displayed, and so forth. This is covered in the documentation for the /columns attribute:

"The first stimulusnumber specified in the columns attribute indicates that the item number of the first stimulus appearing on the current trial should be recorded. The second stimulusnumber indicates that the item number of the second stimulus presented on the trial should be recorded, and so on. If no 1st, 2nd, 3rd, etc. stimulus is presented on a given trial, a '0' is recorded.

Similarly, the first stimulusitem specified in the columns attribute indicates that the actual item presented by the first stimulus appearing on the current trial should be recorded. If the stimulus is <text> or <port>, the actual item is recorded. If the stimulus is a <sound>, <picture>, or <video>, the item's filename is recorded. If the stimulus is a <shape>, the name of the shape is recorded. Like stimulusnumber, each successive stimulusitem records the item from each successive stimulus presented on the trial. If no 1st, 2nd, 3rd, etc. stimulus is presented on a given trial, a '0' is recorded."

Your question <likert> trial presents two stimuli: (1) videoOff and (2) question.

<likert question>
/stimulusframes=[1= videoOff, question]
/ fontstyle = ("MS Shell Dlg 2", 1.8%, false, false, false, false, 5, 1)
/numpoints=7
/ anchorwidth = 5%
/anchors=[1="not at all"; 4="average"; 7="very much"]
/position=(50,75)
</likert>

Thus, the single stimulusitem column (and single stimulusnumber column) you have specified in your data captures only data for the first stimulus, the videoOff <port> element. No data for the 2nd stimulus, the question <text> element is logged.

<data>
/format=tab
/columns=[date time build subject trialcode blockcode blocknum
trialnum latency response stimulusitem stimulusnumber]
</data>

You can resolve this by either adding another stimulusitem / stimulusnumber column


<data>
/format=tab
/columns=[date time build subject trialcode blockcode blocknum
trialnum latency response stimulusitem stimulusnumber stimulusitem stimulusnumber]
</data>

or by simply switching the order of the two stimuli in the question <likert>'s /stimulusframes around

<likert question>
/stimulusframes=[1= question, videoOff]
/ fontstyle = ("MS Shell Dlg 2", 1.8%, false, false, false, false, 5, 1)
/numpoints=7
/ anchorwidth = 5%
/anchors=[1="not at all"; 4="average"; 7="very much"]
/position=(50,75)
</likert>

in which case data for the question <text> element would be captured by the single stimulusitem column (but no data for the videoOff <port> stimulus).

Inquisit 4 and 5 actually behave identically as far as the above is concerned. That is: neither "old code.exp" nor "new code.exp" as attached would capture information about the question <text> element, so I suspect that the data in your 1st screenshot stem from a period when you had not yet added the <port> elements to the trials.

As to your final question, no, unfortunately it is not possible to somehow infer or reconstruct the missing information after the fact.

Hi Dave - I just checked again and you are right. I attached a modified version of the old code. The real old code says:

<likert question>
/stimulusframes=[1=question]
/numpoints=7
/anchors=[1="not at all"; 4="average"; 7="very much"]
/anchorwidth=30
/position=(50,75)
</likert>

Thus, the column records the output of Question instead of videoOff. We were just insanely unlucky with the order in which these variables were listed, and it never mattered until now because our questions suddenly measure 2 seperate things.

Is the data completely lost then? Is there no way to tell what the RNG seed was, or replicate it? I found a thread (from you!) from years ago where you say that it cannot be set as part of the code, but you did not mention how it is generated in the first place. If it is the product of system date and system time (as recorded in the excel sheet), I will spoof that date and time and replicate all 28 participants to record the question order.

If not, I guess we're out of luck!

GO

Merge Selected

Merge into selected topic...



Merge into merge target...



Merge into a specific topic ID...




Reading This Topic

Explore
Messages
Mentions
Search