By automating data cleaning, you can save hours of manual time and effort to get accurate and complete data sets for future analysis and reporting.
In this tutorial, we'll show you how to leverage ChatGPT to automate your data cleaning.
Many of these actions can be performed within spreadsheets, but working with ChatGPT becomes powerful when you start asking for non-traditional data transformations.
Steps:
- Upload your data file and ask ChatGPT to remove duplicates
- Request blank rows be removed
- Insert placeholder values in blank cells
- Ask for values to be standardized with specific formatting
- Label data based on defined rules
Once you complete this tutorial, you should check out our tutorial on how to analyse sales data with ChatGPT as a next step.
Upload your data file
Start by uploading your data file (e.g., CSV) to ChatGPT. Then, ask ChatGPT to remove any duplicate data. Provide a short description of what constitutes a duplicate in your data. In each step, ask ChatGPT to provide you a preview of the data transformation prior to having a new data file generated. This will save you time.
Suggested prompt:
Here's my data file. Can you remove any duplicate data? Duplicates will have the same name. Please provide a preview before generating an updated data file.
If the preview looks good, instruct ChatGPT to proceed with generating the updated data file.
Request blanks rows be removed
To remove blank rows, instruct ChatGPT to remove them. Make sure to provide detail on what you consider a blank row and ask for a preview for confirmation.
Suggested prompt:
Can you remove any blank rows? I consider any row with more than two columns blank to be a blank row. Please provide a preview before generating an updated data file.
If the preview looks good, instruct ChatGPT to proceed with generating the updated data file.
Insert placeholder values in blank cells
To insert placeholder values in blank cells, instruct ChatGPT to insert specific values for these cells. Make sure to provide detail on what you consider a blank cell, what value(s) you want inserted, and ask for a preview for confirmation.
Suggested prompt:
Can you insert the text “N/A” into any blank cells? I consider a blank cell any cell with NULL or NaN as a value. Please provide a preview before generating an updated data file.
Ask for values to be standardized with specific formatting
To standardize values with specific formatting, tell ChatGPT what columns you want formatted and how you want them formatted.
Example prompt:
Two columns in my data file don’t have consistent or standardized formatting: Age and Income. For the Age column, can you update all cells to just be an integer? For instance, one Age value in the data file is “31.0 years”, I want this to be “31”. For the income column, can you reformat them all into USD?
Label data based on defined rules
To label data, provide clearly defined rules and column names for this new data. In this example, I’m going to have ChatGPT categorize anyone in my data set that is between the ages of 25-40 as a “Millennial”. I’m also going to instruct ChatGPT to categorize anyone with an income >$80K USD as a “Target Customer”.
Example prompt:
I want to create two new columns in my data set: Age Category and Customer Category. Can you create these two new columns and for any row with an Age between 25 - 40, set their Age Category to “Millennial”. For any row with an Income >$80K USD, set their Customer Category to “Target Customer”.
This tutorial was created by Garrett.
More tutorials like this
Start learning today
If you scrolled this far, you must be a little interested...
Start learning ->Join 2,641 professionals already learning