ChatGPT is a must use tool for Data Analysts and Data Scientists. With recent feature additions like Code Interpreter and Plugins, it’s easier to load in and export large amounts of data. It can even perform many tasks on your behalf by executing its own python code in a sandbox!
Users can import and export files to and from ChatGPT using Code Interpreter. Code Interpreter is available as part of OpenAI’s paid tier ChatGPT Plus. Code Interpreter additionally enables users to prompt ChatGPT to generate and run its own Python code interact with and modify files.
Because of how useful ChatGPT Plus is, we highly recommend it for anybody that is serious about using AI and ChatGPT to improve productivity at work. Along with priority access to OpenAI’s latest AI Model, ChatGPT-4, users gain access to ChatGPT plugins which greatly extend the usefulness of ChatGPT by allowing it to call and interact with external services.
Let’s look at some ways Data Analysts and Data Scientists can use ChatGPT to make their jobs easier!
How to Use ChatGPT Code Interpreter
Along with ChatGPT plugins, OpenAI is also working on Code Interpreter. It is an advanced system that lets ChatGPT both write and execute Python code. This advanced feature has the potential to revolutionize the way that data analysis is performed.
For example, an analyst could load a file into ChatGPT and launch an analytics library like NumPy, Pandas or Matplotlib.
Plugins and Code Interpreter hold the promise of fixing a lot of the ChatGPT shortcomings when it comes to calculations and numbers while enabling analysts to do more with less.
As a simple test of Code Interpreter, we uploaded A Beginner’s Guide to Data & Analytics pdf file from Harvard Business School and asked ChatGPT to summarize the document into bullet points.
When using code interpreter, there will be a plus sign in the prompt input box. Then you can type your request in relation to the in the prompt box.
After entering the prompt you can have ChatGPT show it’s work.
It’s somewhat memorizing to watch it type out Python code to convert the PDF file into searchable text by invoking PyPDF2.
The entire process actually goes on for several minutes.
ChatGPT has difficulty with converting PDF to Text. Much like many other tools do. It can extract a majority of the text but does struggle at times which cause it to be unable to summarize the document on the first attempt.
An interest side note about this is that it extracted the table of contents from the document, and attempted to use it as a guide to separate out the sections.
After being unable to parse the document into sections due to special characters or artifacts in the PDF to text conversion, it summarized the document as a whole.
One drawback of this approach is that ChatGPT still attempts to summarize the document using Python to extract the first sentence of each page rather than ingest the data into it’s language model to allow you to ask follow-up questions.
This may be because ChatGPT Code Interpreter is set to execute within a sandboxed environment. Try using Code Interpret with Excel or CSV files and you can mostly interact with it like you would with Python scripts.
ChatGPT Parse Strings with RegEx
ChatGPT is capable of helping people extract parts of a complex string using RegEx.
Regular expressions, often abbreviated as Regex for pattern matching and manipulation of text. They are sequences of characters that form a search pattern. The computer searches for the pattern and aids in complex searches, substitutions or parsing.
RegEx is incredibly useful for delimiting when the part you want is not always in the same place and you can’t use a substring to extract it.
While incredibly useful, they can also be a huge nuisance to write if you don’t do it all of the time.
Here is an example of ChatGPT extracting the tip amount from a long sentence of a pizza order.
Using ChatGPT for Writing Python
A core competency of ChatGPT is the ability to write computer code.
ChatGPT has been trained in a wide variety of computer languages and it continues to improve over time.
If you are writing code, we highly recommend signing up for ChatGPT Plus. ChatGPT Plus gives users access to ChatGPT-4 the most advanced model of ChatGPT which is a lot better than the public version of ChatGPT-3.5
As a really quick example, we asked ChatGPT to write a Python script to update all of the columns of .xlsx files saved in a folder.
It invokes several different Python libraries and continues to generate the code that you would run on your machine.
Tip: As you write a description of Python that needs to be generated try and be as descriptive as possible. You can even include specific file paths and ChatGPT will write it into the code.
We ran a similar test at the beginning of 2023 using ChatGPT 3.5 and the results took much longer to generate and were not nearly as good. It took as quite a few tries to get it to generate the code that we wanted and one that would compile correctly.
The computer programming capabilities of ChatGPT have become much better over time.
Note: ChatGPT will do better at converting smaller pieces of code than large complex ones. To get the most efficient translation try and break your code down into smaller pieces whenever possible. Also be as specific as possible when describing the actions that you want ChatGPT to take. You may need to ask it several times before getting the code exactly the way that you want it.
Use ChatGPT to Convert Python to R
While a number of data analysts are bilingual in Python and R having to learn both of them in school, there tends to be a number of people that end up using Python much more often in the workplace because of how common it is.
However, if you do need to convert Python to R try using ChatGPT!
ChatGPT can actually convert a wide number of computer programming languages between each other.
Even if it is not 100% accurate in the conversion, it will likely be close enough to eliminate a majority of the work and jog your memory as to what the syntax should look like.
Generate Sample Datasets with ChatGPT
When we’re doing consulting work, one of our favorite tools for generating datasets is ChatGPT. You can use ChatGPT to make anonymized fake datasets that are in the same industry as our clients. It gives proof of concept creations an air of realism to better build a connection with a client.
When prompting ChatGPT to generate a sample dataset be aware that there are some limitations in that you will want to limit the number of rows and describe the columns the best that you can. Try prompts that include the direction to put it into a table.
Note: ChatGPT can only generate datasets with a limited number of rows. If you need to create larger datasets we recommend Mockaroo which has AI features built into it that users can access for free.
How to Export Datasets from ChatGPT
After generating a dataset you can prompt ChatGPT to export it to .csv or .xlsx using Code Interpreter and it will create a file that you can download.
Other options include copying and pasting it into Excel or Google Sheets. There is a plugin called Make a Sheet that can also export to .csv but with the advent of Code Interpreter it likely will not see much use.
Plugins are worth checking out though. You can turn them on and view the OpenAI plugin marketplace as part of ChatGPT Plus.
Using ChatGPT to Help Write SQL Queries
You can prompt ChatGPT to generate SQL queries on your behalf, but it may not be the most efficient.
For it to be quick, you’ll need to at least have an understanding of the data that you are working with. If you define table names and column names ChatGPT will work it into the actual SQL query.
Alternatively, you could define the system that you a re working in and ChatGPT will suggest its own table names and column names to put within the SQL query. As an example, we asked it specifically about SAP HANA S/4 tables. The returned query would need to be modified for specific column names relevant to what you’re looking for but it’s a great start.
From what we’ve seen, the more common and more well documented SQL tables are on the internet the better luck you will have asking ChatGPT to write SQL queries for that system.
ChatGPT can also convert SQL queries between syntaxes.
For example, if you’re more familiar with T-SQL but need to query Oracle or Postgres, you can write the query like you would for T-SQL and let ChatGPT do the conversion for you.
Explain Nuances of Machine Learning Algorithms and Prediction Models
One more use case for data scientists is having ChatGPT explain nuanced details about various ML models.
While ChatGPT is only trained up until September of 2021, and will not have information specific to newer models it can still explain details about various algorithms or predictive analytic techniques.
Tip: Try asking ChatGPT to explain an ML model at an 8th grade level or as it would to a child and it will greatly simplify the explanation for users that are less technical.
Current Limitations of ChatGPT for Data Analysis
If you’re a data analyst looking for a way to analyze data with cutting edge AI, ChatGPT in its current state will only be able to assist you in this process.
ChatGPT has several limitations that prevent it from being able to directly analyze data.
ChatGPT is bad at Math
It’s been well documented that ChatGPT often gets basic arithmetic incorrect. OpenAi, the developers of ChatGPT have worked on making ChatGPT better at math but it’s still one of the biggest limitations of the advanced AI chat bot.
ChatGPT was designed as a Large Language Model, and was trained using massive datasets of text, books, internet, and various archives. Because ChatGPT’s machine learning was primarily trained on text, it’s bad at math.
Code Interpreter Shows Its Work (Sort of)
In our earlier example, we attempted to extract text and have ChatGPT summarize a PDF file. It was unable to do so. As code interpreter works, it will explain what it is doing along each step of the way.
If it determines that a step fails, it will attempt and re-attempt using different methods.
ChatGPT also gives you the option to Show Work.
Unfortunately, the extent of the details of what it is doing is limited to showing the Python code that ChatGPT is attempting to execute.
It would be nice to see the intermediate steps of a partially converted PDF document or a preview of a .csv or .xlsx file that’s being processed so you could more easily identify where the failure point is.
What is the Future of ChatGPT for Data Analysis?
One of the exciting trends of ChatGPT is seeing how many software products are integrating ChatGPT or other Large Language Models directly into their software. Alteryx, Tableau, Power BI, and many other popular software tools have already released AI features or will be doing so soon.
As AI tools progress, data analysts will likely be expected to perform an even greater variety of tasks. AI will be able to complete simple tasks without assistance leaving really hard and complicated tasks for people with specialized data skills.
To get an idea of what AI powered data analytics looks like, the following demo was released by Microsoft to highlight some of the coming AI features for data analytics in Power BI.
While a successful data scientist or analyst may be expected to have a working knowledge of analysis, ML models, and several BI tools, they will likely be leaned on to help out with an even greater range of tasks if the easy stuff becomes more automated.
That being said, there are still many companies in the world that rely on outdated systems like Microsoft Access and Excel for a majority of their analysis. Even getting them up to a level of using a standard database structure and a business intelligence tool of any sort will be a big lift. We don’t expect AI to revolutionize the stragglers who will still need a helping hand to implement modern processes.