How to Clean Up Raw Data in Excel

welcome to unit 2 cleaning up raw data in this unit we will look at the raw data again and do some basic formatting and formula exercises to clean up the data so it's ready for us to analyze now we're going to be using some of the Excel skills you learn in class one in terms of formulas and functions to clean up a raw data set that isn't exactly perfect yet for analyzing a lot of times you'll get data from a database or from someone else in your company and it still has like extra characters or is not you know filtered correctly and you just have to kind of quickly massage the data a little bit to make sure it's ready for you to analyze because if you're trying to analyze data that's not correctly formatted or contains incorrect values then that's not going to be useful at all right so we're going to do some quick um it's kind of tidying up with the data before we actually analyze it and this is a very common practice because sometimes when you get data from like a database that comes in a CSV format it's not always going to be perfectly formatted and you'll have to do a lot of formatting before the data is even ready to do any kind of crunching or analyzing or doing pivot tables and stuff like that so the first step is if you look at the raw data set you notice that in the campaign field every single kept product has the word as a word Acme and ampersand right Acme ampersand small wallet bowl Acme pepper sign recycled paper basket Acme ampersand hand thrown stoneware so we want to make sure that the campaign the final data doesn't have the Acme ampersand and then the product we just want to have the product so for instance in b2 I only want the words small walnut bowl I don't want the word I don't want all of Acme and small wallet bowl together that makes sense right so if we go back to the lesson plan we want to get rid of the word Acme text in the campaign column right so how do we do this in Excel we're going to use a formula that you'll probably come to use a lot when you're doing this kind of stuff and it's called the mid formula it's similar to left which we talked about a lesson in class one but it's a little more advanced and well I'll show you how to do it right now so our goal again is to find is to get get just the product name and not the word Acme the company name so I'm going to be in column em and sell em too and I'm going to start typing in the mid formula here equals m ID right and then I'm going to hit left parenthesis and you'll notice that it takes in three arguments three kind of input variables it has we have two input a text which is the actual vacuole text cell that we want to pull data from start num which is the point in the cell that we want to start taking values from and number of characters is how many values or characters are going to take from that cell once we've indicated a starting point now what I've said it sounds kind of complicated so let's just walk through how I would do this in Excel so the first input again is the actual cell that we want the data from so text I'm going to select column our b2 excuse me you notice that I think it was a B – I'm going to put in comma so start num so this is where we want to start taking in the characters so what that means is at what point in this cell do we want to strip out the characters right so you'll notice the letter A and Acme is is a like starting point 1 the letter C is starting point to the letter M is starting point 3 so if you notice a pattern actually let's go back to the data here the pattern in all this data is that Acme ampersand is all is prefixes all the products and so when does the product name actually start so if we double click on the cell if I count the number of time number of characters it takes to get to the product name we'll see what the start number is so again we know a is 1 so let's move over to column the product name so 1 2 3 4 5 6 7 so that means that all the product names start in position 7 of every single one of these cells right we know that everything before call position 7 we don't want that data because it's the Acme ampersand text right so let's go back to our mid formula equals M ID left parenthesis I'm going to take this cell again b2 comma and we we found out that 7 is a starting point for where the text starts right so tip number 7 and then write comma again and number of characters this is how many characters you want to pull after the position 7 so you know small walnut bowl it looks like it's about you know 20 or so characters we don't want to we want to be conservative and just try to pull as many characters characters as possible so I'm going to put like you know 50 here just in case because let's say I put the number 10 here I put number 10 I put right parenthesis I'm only going to be pulling the first 10 characters of that cell and I might not pull all the characters that I want so if I press ENTER here you'll see that the value is just small wall and omitted like the walnut bowl text so that's why you want to be conservative and just say you know 50 or even 100 let's 200 just to make sure we pull all the stuff afterwards so you'll see what Excel did it took all the text starting at position 7 of b2 and took everything afterwards and in this case is just the words small walnut bowl and so we have our own it now so let's copy this all the way down to all the cells below in column M because we want to make sure we get the product name for all these cells right not for the first cell first row but for every single product right so I'm just going to drag and drop and we have about 300 or something right there well drag I draw I'm dragging and dropping let go and there you go so if I go back to the list you'll notice that I have all the product names all the product names set up and actually I think we need to make this a number 8 just to be safe yeah so 7 actually is when 7 actually counts the space here so let's see 1 2 3 4 5 6 yeah 7 is a stay so is the number here it should actually be 8 so let's do 8 sorry about that when to 8 I'm going to fill this down we learn about fill down fill formula down in and class 2 so now we have just a product name and we want to replace column B with column M right so I'm going to do a paste special values which is a keyboard shortcut we learned in class 2 and it's also a function we learned in class 1 but I want to replace everything column B so I don't have the Acme name anymore so I'm going to select this data all of it all the way down to row 3 4 7 do a command C and now I have everything selected as you can see with the ants marching around the selection I'm going to move over now to be 2 I'm going to paste special which is again command control V and then command V 4 values enter there you go now our column B has only the product name and no more Acme a company name and now I can just delete all column them because I did that just primarily for using the mid formula so there you go I've correctly cleaned out the data a little bit here now let's go back to the lesson plan and see what else we have to do in terms of formatting we want to add dollar signs to the budget and cost columns and also add a percentage a percentage character to the click-through rate column and so this is again just to make your data look a little better I want to go back to Acme I'm going to select oh it looks like it's already correctly formatted but let's let's make budget have only one one decimal point we do format cells and go number I'm going to go to currency I'm going to do only one decimal place here right sure click the rate I want to have no I don't have any kind of decima percent decimal place excuse me command one for format cells again I'm going to do zero decimal places and then this is just kind of the format your data so it looks right because sometimes you'll get a person's cost per click there's no formatting around it you know sometimes there's one decimal place in letters – let's make this all have a dollar format and also have two decimal places so let's select all this data format cells currency two decimal places okay and there you go now this looks a little easier to play with the data once you have it all nicely formatted so this lesson was all about just cleaning up your data set once you have the data again it's not always when you get raw data it's not always going to be perfect sometimes you have to do text to columns sometimes you have to do the mid formula I talked about and other times it's just a matter of cleaning up the formatting in terms of number formats and adding decimal places etc and so that wraps up lesson two and in lesson three we'll move on to actually doing some analysis

Comments are closed.