Make sure it's easy to read for the speakers (i.e. the font is 12pt or bigger). I like to use 3 double-spaced columns per page. You could also number the words.
If the list is long (more than 2 pages), number the pages to avoid confusion and add titles for each part of the list (stimuli for AX, stimuli for lexical decision). If they read the titles, this could help you later when cutting the sound files.
Oftentimes the last word of a list is said with different intonation (a final fall); repeat this word earlier in the list or at least have it be a filler.
Print the recording list one sided, or make sure they don't turn the page while speaking, since this will be audible.
The recording space:
Use a sound booth if possible, otherwise pick the quietest place you possibly can.
Avoid large spaces with echos.
Avoid air conditioners, fridges, heaters, etc. running in background, since these will often create an audible hum in the recording.
Place the mic about 2 inches from their mouth and slightly to the side to prevent large spikes in amplitude from their breath hitting the mic.
The recording style:
Model how you want the words read or play a small part of a previous recording if you want to match speed/style of speaking.
You need clear pauses between each word. I cannot stress enough how important this is for cutting the stimuli later. They're basically worthless if there is no pause between them because there will be audible coarticulation, making segmenting the words into individual sound files very difficult and causing the words to sound weirdly chopped off when you try.
Have them read with falling intonation on each word so that when the stimuli are cut, they don't sound like questions. This is difficult for speakers since it is natural to read with list intonation, which has rising intonation on each word. If they're really bad at it, you may need to have them repeat after you for each word (making sure they are pausing sufficiently and not overlapping your speech). Another technique is to have them say each word a few times as in a list, so that the final iteration of the word has falling intonation. You can also avoid list intonation by putting stimuli in a sentence. This can help create more natural-sounding stimuli, but recording takes longer and cutting stimuli is more time consuming. If you do embed words in a context, place the stimuli between stop consonants or repeat it after a sentence (e.g. "Say X again. X.") for ease of cutting later. We've found that sentence-final tokens are clearest for perception experiments (if clear segments are your intention). By repeating the word at the end of a sentence, it is easier for the speaker to produce it surrounded by a pause and with falling intonation.
If the speaker uses creaky voice, call attention to it and have them try to lessen it. I've noticed people tend to do it more toward the end of the recording when they're bored and tired, so giving them breaks and water might help.
Watch out for speakers' tendency to speed up over time.
Making the recording:
Have them practice part of the list and listen to/look at the recording to check the mic levels, background noise, reading speed, pauses, etc.
Record with a 3 to 1 ratio of recordings to number of stimuli you need. In other words, if you need one good token of a word, record the list 3 times. It may be necessary to record even more times and coach the speaker further if you need particular phonetic properties, such as final released [t] in English or dialectal variants like [h] for /s/ in Spanish. I recommend having the speaker read the list multiple times rather than reading each word multiple times in a row because people tend to say a word the same (possibly erroneous) way when repeating it.