Data visualization is fast and easy for me because I have a solid foundation in spreadsheet skills. I feel totally at home in spreadsheets. Nerd alert: I actually enjoyed doing taxes this year (and FINALLY finishing and getting a refund didn’t hurt, either).

I initially learned about spreadsheets in the early ’90s from my nerdy economist dad. I was approximately 4 feet tall and was supposed to be studying for a 2nd grade spelling test. My dad convinced me that learning formulas early and practicing them often would change my life. As a 2nd grader, that sounded ridiculous. As an adult, I’m so thankful for those lessons. We sat together in the home office, staring at the glowing Lotus 1-2-3 screen, and practiced averages and medians. I didn’t know what a median was. But I was promised that it would be important someday. Turns out, my nerdy dad was right. This is the same guy who got me a toolbox for my 8th birthday that same year. I asked for an American Girl doll. Go figure. But I used that hammer and screwdriver just last weekend, so… thanks, dad.

Nowadays I spend the vast majority of my time teaching others about data visualization. But all too often, my students run into roadblocks because they don’t have a solid foundation in spreadsheet skills. A graph that should take them 5 minutes to design ends up taking an hour. They haven’t been exposed to the time-saving shortcuts yet. They’re not sure how to transform their raw dataset into summary tables (which would then feed into their graphs). Data *visualization* feels overwhelming because the data *analysis* is overwhelming.

Today, let’s relieve some of those data analysis burdens! I’m going to share my favorite two techniques for categorizing values.

You might need to *categorize* values if:

- You have a list of
*zip codes*but you really just care about the*states.* - You have a list of
*states*but you really just care about the*regions*where those states are located. - You have a list of
*countries*but you really just care about*regions*of the world. - You have a list of
*ages*(0, 1, 2, 3, 4, 5, etc.) but you really just care about*age ranges*(0-9, 10-19, 20-29, etc.). - You have a list of
*schools*but you really just care about which*district*the school is located within. - You have a list of
*test scores*(40%, 55%, 70%) but you really just want to focus on students who*passed*or*didn’t pass*the exam. - You have a list of
*body mass indices*(19, 24, 29, 32, etc.) but you want to categorize the raw numbers into*underweight, normal weight, overweight,*and*obese.* - You have a list of
*languages spoken*but you really want to divide people into*those who speak Mandarin*and*those who don’t.* - You have a list of
*countries where people were born*but you really just want to divide people into*born in U.S.*and*not born in U.S.* - … and so on.

The raw dataset you’re given is rarely the one you actually want for your analyses. Clean, re-code, and re-categorize until you’ve got the groupings you need.

# Recategorize Values with =IF()

There are a couple ways to categorize your values: with *if* or *vlookup*.

This first option uses *if*. In this example, we’ll categorize a full list of produce types (apples, bananas, carrots, or watermelon). In the video, I’m demonstrating these formulas on a PC with Excel 2016. The formulas aren’t unique to Excel, though. You can use these formulas in any spreadsheet program (like Google Sheets, which I actually use more often than Excel these days).

First, we’ll categorize the individual values as either being either an *apple* or *other* type of food. I use these simple two-category breakdowns in projects a lot. No, I’m not usually categorizing apples. That would be a fun job though–an apple analyst! But I use this same technique to handle a lot of the transformations in the bullet point list above. For example, you could use this exact approach to categorize a long list of countries into being either *United States* or *other*.

Think of *if* formulas as *if-then statements*. You have to fill in three pieces of the formula:

- A logical argument (a mini statement that includes the greater than symbol, less than symbol, and/or equal to symbol)
- What your computer should do if that mini logic statement is true
- What your computer should do if that mini logic statement is false

Second, we’ll categorize the individual values as being either a *fruit* or a *vegetable*. This formula is a little more complex. It’s called a *nested if* formula because there are if statements inside of if statements inside of if statements inside of if statements! You’ll feel like a million bucks the first time you get a *nested if* formula right. But *nested if* statements are also a lot of work… which brings us to the second option, using a lookup formula like *vlookup*.

# Recategorize Values with =VLOOKUP()

Your second option for categorizing your values is using a lookup function like *vlookup*.

I use lookup functions to join tables together. You can use *vlookup* in all sorts of scenarios. That’s why it’s one of my all-time favorite features of spreadsheets.

In this scenario–using *vlookup* to categorize values–we’ll need to build a separate *lookup table*. In this video, I’ll show you how I have the regular table that I’m trying to fill in. That’s the table where I want to turn my apples, bananas, carrots, and watermelons into *apples* or *others* and then into *apples* or *vegetables*.

Once we’ve built our lookup table, then we can use vlookup to help those tables talk to each other. I show you how in this video.

You can use either *if* or *vlookup* to sort your values into categories. You’ll get the same result either way. *If* tends to work best when you’ve only got a couple of categories. *Vlookup* tends to work best when you’ve got lots and lots of categories. *Vlookup* also tends to work better when your dataset is especially messy (*apple* and *apples* and typos like *appel* and *aple*). But, to some extent, it’s also a matter of personal preference. Both techniques take a while to master, so go easy on yourself if you have to practice a few times before you’re fluent.

# Download the Spreadsheets

Want to practice *if* and *vlookup*? Download the materials and practice along with me.

Download the Spreadsheets ($0)

# Comment and Share Your Ideas

Share some real-life examples. What types of values have you needed to categorize in your own projects? I tried to add a ton of examples to the bullet point list above, but there are infinite ways to apply these techniques.

# Like These Techniques? Then You’ll Love Next Week’s Announcement

I’ve been working on something very special for the past few months. I’ve got an exciting announcement coming your way next week. This is a project I’ve wanted to launch for the entire four years that I’ve been running my own data company. And it’s finally ready!!!

Andrew Fairley

Apr 17, 2018 -

Nested IF statements have a limitation of 7 levels, so once you go past 8 categories you need to combine IF with another approach (VLOOKUP, AND/OR etc.).

Ann K. Emery

Apr 17, 2018 -

Ah, good to know! I don’t think I’ve ever hit the seven level limit before. Usually, I add three or four levels, then pause midway and switch to a vlookup instead.

Jason

Apr 18, 2018 -

Previous to Excel 2007 it was limited to 7.

If using a newer version of Excel the limit is 64 levels.

Ann K. Emery

Apr 18, 2018 -

64 levels?!?!?!?!?!?!?!??! Ahahahahahahahahah I can’t even imagine. What a mess that would be! I’m officially going on the record to say that an IF statement with 64 levels is a bad use of your time. For “more than a handful of categories,” use a lookup function instead of an if statement.

Mark Biegert

Apr 18, 2018 -

Thanks Ann. For those using more recent versions of Excel, you can also try the Switch function.

Ann K. Emery

Apr 18, 2018 -

Hi Mark, I’ve encountered switch and played around with it a few times… but can’t figure out how it’s better than if or vlookup. When do you use it? What are its main advantages? Sell me. 🙂

Nitesh Halai.

Apr 24, 2018 -

I usually prefer a VLOOKUP. I try to avoid using nested IF formulas as much as possible. Just imagine the effort required to debug a 64 level IF formula 🙂