Creating stimuli

7/11/2017

Once you've chosen a perception task, it's time to make stimuli for it.

How many stimuli do I need?
The answer to this question isn't simple. You'll need to strike a balance between getting a sufficient amount of data and how long you can reasonably expect people to sit and do your experiment. In our lab, we generally have to recruit participants with extra credit, the promise of snacks, and desperate pleas, so any experiment over an hour or an hour and 15 minutes is unlikely to have many people sign up. If you can pay people they'll be more willing to do a longer experiment, but that means more money you'll have to shell out for each person. Since your experiment is likely to be made up of two or more tasks, such as both discrimination and lexical decision plus a background questionnaire, each task in itself shouldn't be longer than about 25 minutes, if possible. Shorter tasks will also prevent participants' attention from wandering too much, which means more reliable data. A 20-minute AXB or oddity task is already very boring even with a break, and with difficult contrasts it can also be mentally taxing and demoralizing. I know some psychology experiments have participants doing one repetitive task for an hour (how?!), but if you don't want participants to constantly time out on trials because they are falling asleep or trying to surreptitiously check their phones, keep it shorter.

For most kinds of tasks, to calculate how long it will take you'll need to take into account the number of trials and how long each trial lasts. When figuring out how many trials you need, keep in mind that for AX, you should have an equal number of same and different trials, and with lexical decision you should have an equal number of word and nonword trials. For ABX-like tasks you'll need to balance the order of stimuli so that all six possible orders of stimuli are present: ABA, ABB, AAB, BAB, BAA, BBA. In other words, if your contrast is [e] vs. [i], in one trial the order will be [kek], [kik], [kek] (ABA), in another [kek], [kik], [kik] (ABB), etc. This ensures that X is equally likely to be A or B and that the order of presentation does not influence the results. Oddity tasks should also have the six possible orders of stimuli, plus all-the-same trials (AAA, BBB). You can either balance the number of same vs. odd-one-out trials or balance how often each button is the correct answer (first sound is different, second sound is different, third sound is different, or all the same). I prefer the second option, since participants are likely to hear difficult contrasts as same trials anyway. Note that the types of trials don't have to be perfectly balanced; for example, you could do 12 different trials (4, 4, and 4 in each position) and 8 same trials. Just make sure they aren't too disparate.

For the number of conditions, don't forget that you need a control condition to show that your task is working. In our experiments, we've tested up to 10 contrasts in a discrimination task. I think this is near the upper limit, since having more data points per condition is always better, especially if you plan to do individual-level analyses, and for each condition you add, the less trials per condition you'll be able to fit in. Around 16-20 trials per condition is a good number, which may be split into a couple different phonetic contexts. Here's an example from our latest oddity task:

10 contrasts (e.g. /u/ vs. /y/) x 10 trials per contrast (2 AAA, 2 BBB, AAB, ABA, ABB, BAA, BBA, BAB) x 2 contexts ([tVhVt], [kVhVk]) = 200 trials

It's helpful if you map out your trial set up in Excel, like this:

You'll also need practice trials so that participants can learn how to do the task. About 8 or 10 is enough. For some tasks you'll need filler trials as well, particularly if you are only testing a small number of conditions. The point is that you don't want participants to figure out what you're testing and start employing an explicit strategy for completing the task.

In order to calculate how long each trial will take, you add the length of the sound files in each trial + interstimulus intervals (pauses between stimuli) + intertrial interval (pause between trials). For example, in our oddity task each word lasts about 600-750 ms, so let's use 700 ms as an estimate. With an interstimulus interval (ISI) of 400 ms and an intertrial interval (ITI) of 500 ms, that means each trial takes 700 ms for the first stimulus + 400 ms pause + 700 ms for the second stimulus + 400 ms pause + 700 ms for the third stimulus + 500 ms after the trial. In other words, 700 ms x 3 + 400 ms x 2 + 500 ms = 3400 ms, or 3.4 seconds per trial.

Now you can calculate how long the all trials will take combined. 3.4 s x 200 trials = 680 s, or 11.33 min. With the instructions, practice trials, and a break, that's about 15 minutes for the oddity task, which is totally doable.

Words or nonwords?
If your task is examining discrimination, similarity, or categorization, it's best to use nonwords. By using nonwords, you won't need to worry about lexical frequency effects, such as the fact that people respond faster to more frequent words. Also, if you're testing learners and a control group of native speakers, the lexical knowledge of each group will likely vary, possibly affecting your results. For a lexical decision task you'll obviously need words, but you'll also need non-words both as near-words to test lexical knowledge and as fillers.

Tips for making nonwords:

While I personally haven't tried it, Wuggy seems like a useful program for creating nonwords based on words the user inputs. It can generate nonwords in Basque, Dutch, English, French, Serbian, and Spanish.
Choose phonotactically plausible nonwords. Using phonotactically infrequent patterns will be unrepresentative of the language you're testing, not to mention make it difficult for the speakers to produce usable stimuli. If the filler nonwords in a lexical decision task are phonotactically implausible, it will be obvious they are nonwords, possibly making participants more likely to incorrectly accept the near-words in the experiment.
Control the phonetic contexts surrounding the target segment, since context will affect perception. For example, American English listeners have more difficulty discriminating [y] and [u] in an alveolar context ([tyt] vs. [tut]) than a velar context ([kyk] vs. [kuk]). If you're testing discrimination, perceptual similarity, or categorization, you should pick perhaps two or three contexts for your stimuli.
Check to make sure your nonwords are truly nonwords, especially if you're a non-native speaker. In my work on Spanish, I like to check both WordReference.com and the dictionary of the Real Academia Española, since although the RAE has more words, WordReference includes more slang. Seeing if Microsoft Word underlines all your stimuli can also be helpful, because if something is not underlined, then it's a real word. Be sure to check that all possible alternate spellings are also nonwords, since how the stimuli are pronounced is what is important. Once I get my final list, I have my speakers make sure there aren't any real words I missed.
Nonwords should not be words in the L1. This is especially important for lexical decision tasks.

Tips for choosing real words:

EsPal for Spanish and the English Lexicon Project for English allow you to input lexical properties like number of syllables and they output words that fit the criteria. If you need minimal pairs, such as for a lexical decision task with repetition priming task, this page is useful for English, though it is for British English so be careful if you work on American English.
Avoid cognates unless they are the focus of your study, since participants often react differently to them. If it isn't possible to avoid cognates, you may want to balance the number of cognates and non-cognates and check later if cognate status affected the results.
Control lexical frequency. Frequency from subtitles is the best predictor of participants' reaction times in lexical decisions tasks, particularly for shorter words. Words should all be within a certain range, e.g. log frequency of at least 1.5, both for similar reaction times across stimuli and to ensure that learners will know the words. It's important to also have learners do a word familiarity questionnaire at the end of the experiment to verify that they knew the words, as frequency in a native-speaker corpus is not always an accurate predictor of what words learners know.

Tips for both words and nonwords:

Avoid difficult segments unless they are the focus of your study. For example, if you are looking at the lexical representations of English vowels by Japanese listeners, but are (cruelly!) including a bunch of /l/ and /r/ words as stimuli, responses to these words are highly likely to be influenced by the presence of the liquids and not necessarily the test vowels.

0 Comments

Creating stimuli

Leave a Reply.

Author

Archives

Categories