Post by account_disabled on Mar 10, 2024 23:56:38 GMT -6
I admit it... having a developer background, and knowing what it means to create good software, I have always been quite skeptical in considering the use of a language model for generating programming code . So, to face my " demon ", I decided to put it to the test in the field to support data analysis, to process a fairly large dataset while also generating simple widgets for a potential dashboard . The dataset featured in the analysis is an extraction of travel data from electric bicycles from Divvy , the bike sharing system in the Chicago metropolitan area.
I used 5 days of vehicle movement data: each day's data is saved India Mobile Number Data in a CSV file which has the following structure. The structure of the dataset analyzed using ChatGPT and Python The structure of the dataset analyzed using ChatGPT and Python The 5 days I used correspond to approximately 1 GB of text data . The steps for analysis using ChatGPT (GPT-4) Let's see the operations of data extraction and creation of widgets for data visualization . I performed the steps sequentially using ChatGPT ( GPT-4 ) to produce the necessary Python code . 1) The creation of the dataframe In the first step we are going to create the data structure by transforming a list of CSVs into a " dataframe ".
This will allow us, first of all, to avoid merging CSVs manually ( Excel cannot open such large files ), but also to operate much faster. So let's start with the following prompt. Generate a python script to import all the CSV files of a directory and insert them into a single dataframe. The script must also print the first 20 rows of the dataframe. Use the "os" library. In the prompt I specify the library to use ( "os" ), but it is not necessary. The following is the output of ChatGPT. As we can see, the script is very simple: to use it, simply replace " path_to_your_directory " with the path of the local directory containing the list of CSV files. By running the code from a Python console, the dataframe is created and we get the printout of the first 20 rows of the data structure as requested by the prompt. 2) The types of users Through the following prompt on ChatGPT we are going to ask to extract the distribution of the different types of users.
I used 5 days of vehicle movement data: each day's data is saved India Mobile Number Data in a CSV file which has the following structure. The structure of the dataset analyzed using ChatGPT and Python The structure of the dataset analyzed using ChatGPT and Python The 5 days I used correspond to approximately 1 GB of text data . The steps for analysis using ChatGPT (GPT-4) Let's see the operations of data extraction and creation of widgets for data visualization . I performed the steps sequentially using ChatGPT ( GPT-4 ) to produce the necessary Python code . 1) The creation of the dataframe In the first step we are going to create the data structure by transforming a list of CSVs into a " dataframe ".
This will allow us, first of all, to avoid merging CSVs manually ( Excel cannot open such large files ), but also to operate much faster. So let's start with the following prompt. Generate a python script to import all the CSV files of a directory and insert them into a single dataframe. The script must also print the first 20 rows of the dataframe. Use the "os" library. In the prompt I specify the library to use ( "os" ), but it is not necessary. The following is the output of ChatGPT. As we can see, the script is very simple: to use it, simply replace " path_to_your_directory " with the path of the local directory containing the list of CSV files. By running the code from a Python console, the dataframe is created and we get the printout of the first 20 rows of the data structure as requested by the prompt. 2) The types of users Through the following prompt on ChatGPT we are going to ask to extract the distribution of the different types of users.