-
Notifications
You must be signed in to change notification settings - Fork 13
/
results-to-csv.py
88 lines (72 loc) · 3.83 KB
/
results-to-csv.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# This file is a template for converting the JSON file that MTurk will return to you into a .csv which you can import into R, Excel, etc. Here we're operating on the assumption that you're using mmturkey (https://github.com/longouyang/mmturkey) or some comparable process in your JavaScript to communicate results to MTurk.
# Note that this script might need further modification, depending on how you recorded your data. If you follow the instructions below in encoding your data in the JavaScript, or use the sample experimental template provided with Submiterator, you can get away with just filling in a few values.
# This code assumes that what was submitted to MTurk was a JavaScript object (dictionary) where each trial was encoded as a separate object (dictionary), like so:
# data = {
# q1: {
# ...
# },
# q2: {
# ...
# },
# ...,
# q20: {
# ...
# }
# }
# If you know a but of Python you could easily modify this to take the case where your data is stored in an array, but it's probably easier to just change your JavaScript code to follow this format.
# MTurk will return a JSON with one line per participant, with 'workerid', Answer.q1', 'Answer.q2', etc. in the header line. The goal is to convert this into a .csv file with one line per trial, where each trial is labeled with all relevant information about the participant (workerid, demographic, etc) as well as all of the data recorded in the trial.
# Add to the subjectLevelVariables list any information which appears 1x per participant that you want access to in data analysis. One item in this list should always be 'workerid'.
# Values recorded trial-by-trial should not be named in subjectLevelVariables, since they will automatically be added to the byTrialVariables list when the individual trial reulsts are parsed. (It's critical here that exactly the same variables are recorded in each trial. Add dummy variables with NAs in your JavaScript if your experiment isn't set up like this.)
filename = "YOUR_FILENAME_HERE"
subjectLevelVariables = ['language']
import re
fl=open(filename, 'r')
lines = fl.readlines()
fl.close()
processedlines = [line.split('\"\t\"') for line in lines]
processed = [[re.sub('[\n\"{}]', '', x) for x in line] for line in processedlines]
header = processed[0]
data = processed[1:]
def find_idx (string):
vals = [i for i,x in enumerate(header) if x == string]
if len(vals) == 0:
return -1
else:
return vals[0]
bySubjectVariables = ['workerid'] + ['Answer.' + x for x in subjectLevelVariables]
individualTrialNames = [x for x in header if x[:7] == 'Answer.' and x not in bySubjectVariables]
sampletrial = data[0][find_idx(individualTrialNames[0])]
sampletrial_parsed = [x.split(':') for x in sampletrial.split(',')]
byTrialVariables = [x[0] for x in sampletrial_parsed]
d = {}
counter=0
for subjectdata in data:
counter += 1
subjname = "subject" + str(counter)
d[subjname] = {}
trialnum = 0
for trialname in individualTrialNames:
trialnum += 1
trialindex = find_idx(trialname)
trialdataraw = {}
keysAndValues = [x.split(":") for x in subjectdata[trialindex].split(",")]
for keyValuePair in keysAndValues:
trialdataraw[keyValuePair[0]] = keyValuePair[1]
for item in bySubjectVariables:
trialdataraw[item] = subjectdata[find_idx(item)]
d[subjname][trialnum] = trialdataraw
csv=""
for x in bySubjectVariables + byTrialVariables:
if x[:7] == 'Answer.':
csv = csv + x[7:] + ","
else:
csv = csv + x + ","
csv = csv[:-1] + "\n"
for subj in d.keys():
for trial in d[subj].keys():
for v in bySubjectVariables + byTrialVariables:
csv += d[subj][trial][v] + ","
csv = csv[:-1] + "\n"
parsed = open(filename + "-parsed.csv", 'w')
parsed.write(csv)
parsed.close()