Blog Archives

Analyzing free classification results: Using an R script to obtain (dis)similarity matrices

2/11/2023

To analyze your data with the R script below, you'll need 2 files:

Your Lookup Matrix file, which details which stimulus corresponds to which number on which slide. How to create this file is explained in the blog post here. This file should be saved as a tab-separated text file. The example Lookup Matrix from our Finnish length experiment is available here.
Your Coded FC Results file, which codes the results from each participant's free classification PowerPoint task. How to create this file is explain in the blog post here. This should also be saved as a tab-separated text file. Make sure your stimuli are in the same order across contexts, since the R script outputs results in alphabetical order, and the combined contexts results will be inaccurate if the orders are different across contexts. The example Coded FC Results file from our Finnish length experiment is available here.

You'll use this R Markdown file to analyze your results (if you don't have R and RStudio, download those first). The comments in the script show what you should get if you analyze the example files above. Don't forget to set your working directory to the file path where your files are located and to change the file names to match your Lookup Matrix and Coded FC Results files.

The R code will create various similarity and dissimilarity matrices (by counts, by percentages, by contexts individually and combined, etc.) that can be used to visualize your results and analyze them with multi-dimensional scaling. These will be saved to your working directory as text files with the name as specified in each code block. Note that when you open these files, the headers will be one column off, since they'll start at the very left. Let's look at one file as an example. Below we have a screenshot of the tab-separated text file for similarity in percentages for Context 1 (in this case "upu") with all speakers combined.

As you can see, the header cVccV_upu at the top left should be the header for the first column of numbers. If you want to make a table with these results, I recommend opening this file in Excel and moving all of the headers one column to the right. (Note that you do NOT need to do this to use the R script for multi-dimensional scaling described in the following blog post.) The file will now look like this:

This file shows us that, for example, [kuppu] tokens and [kuuppu] tokens were grouped together 19.7% of the time (column B, row 6). Since this is the similarities for all speakers combined, the numbers along the diagonal show how often the sound files of the same stimulus spoken by different speakers were grouped together (not very often; English speakers are bad at length).

The R code can handle up to 4 different contexts. If you have more than 4 contexts, have more than 2 versions (i.e. Version A and Version B for counterbalanced order of presentation), or find any errors with the code, let me know at daidoned AT uncw.edu and I can modify/fix the script.

0 Comments

Analyzing free classification results: Using an R script to obtain (dis)similarity matrices

Author

Archives

Categories