Who needs dictionaries?
Using Python dictionaries to transform survey responses
7/15/20242 min read
The only dictionaries that existed when I first learned to code were from Webster and Oxford. I have a Webster’s dictionary. It has a lot of dust on it. It hasn’t been opened in 20 years, maybe longer. I took an an on-line Python course, where I learned about a new type of dictionary. It doesn’t look exactly like my Webster’s. No definitions, just a list of names, each one followed by one or more strings, integers, floats, or other data type. I learned the dictionary syntax, but understanding why and where I would use them was not obvious to me. Their value only became clear when I worked on transforming a large medical survey into a format that would allow me to perform statistical analysis. The survey question answers came from a drop-down menu. The subject’s answer selections were recorded as strings, but they also had a numerical value. “Strongly disagree” was a one, “Slightly disagree” was a two, etc.
My first attempt to code the choices was to convert the strings to categorical factors and then convert the factors into an integer. This technique worked, but it relied on the list of factors being ordered to match their intended numerical value. I was concerned that some future researcher might skip reading my comments and not realize that the order was important. Mix up the order and the numbers will be incorrect, and you will have ruined the survey. I entered my code into ChatGPT and asked it for other ways to convert the answers to numbers. ChatGPT suggested dictionaries. The lightbulb turned on. Dictionaries have a key and a corresponding value. For example, the key “dog” might have a value “labrador retriever”, “cat” could have a value “fluffy’. For my survey, the drop-down answer choices could be my dictionary keys, and the integer associated with each answer choice would then be the dictionary values. The key “strongly disagree” would have a value 1, “Slightly disagree” would have the value 2, etc.. The order would no longer matter, as long as the key – value pairs are entered correctly.
There is one more step, applying the dictionary to the column of answers. In the old days one would write for-loops or while statements to go through each line of each column, followed by conditional statements to see which of the answer choices is a match. None of that is necessary in Pandas. Pandas excels at matrix operations. Panda series (columns) have a built-in function called “map”. Map takes a dictionary argument and applies it to each row of the series. If the row matches one of the keys, the value of the matched key is returned.
s_string = pandas.series([‘A little’, ‘A little’, ‘Extremely’])
choices = {'Very Slightly':1, 'A little':2, 'Moderately’:3,'Extremely':4}
s_int = s_string.map(choices). # One line, simple, clean, and elegant.
s_int
> 2, 2, 4
Dictionaries have other uses. People like to create data frames with dictionaries. The key becomes the column and the value, which can be a list or even another dictionary, becomes the row values. Dictionaries are quite handy. I have to admit, I am now using Python dictionaries a heck of a lot more than I ever used Webster’s!