ChatGPT Cheat Sheet for Data Science (100 Prompts)
December 5, 2024General Coding Workflows
- Debugging Python Code
I want you to be a Python programmer. Here is a piece of Python code containing {problem} — {insert code snippet} — I am getting the following error {insert error}. What is the reason for the bug? - Debugging R Code
I want you to be an R programmer. Here is a piece of R code containing {problem} — {insert code snippet} — I am getting the following error {insert error}. What is the reason for the bug? - Debugging SQL Code
I want you to be a SQL programmer. Here is a piece of SQL code containing {problem} — {insert code snippet} — I am getting the following error {insert error}. What is the reason for the bug? - Python Code Explanation
I want you to act as a code explainer in Python. I don’t understand this function. Can you explain what it does and provide an example? {Insert function} - R Code Explanation
I want you to act as a code explainer in R. I don’t understand this function. Can you explain what it does and provide an example? {Insert function} - SQL Code Explanation
I want you to act as a code explainer in SQL. I don’t understand this snippet. Can you explain what it does and provide an example? {Insert SQL query} - Python Code Optimization
I want you to act as a code optimizer in Python. {Describe problem with current code, if possible}. Can you make the code {more Pythonic/cleaner/more efficient/run faster/more readable}? {Insert Code} - R Code Optimization
I want you to act as a code optimizer in R. {Describe problem with current code, if possible}. Can you make the code {cleaner/more efficient/run faster/more readable}? {Insert Code} - SQL Code Optimization
I want you to act as a query optimizer in SQL. {Describe problem with current code, if possible}. Can you suggest ways to make the query {run faster/more readable/simpler}? {Insert Code} - Python Code Simplification
I want you to act as a programmer in Python. Please simplify this code while ensuring that it is {efficient/easy to read/Pythonic}? {Insert Code} - R Code Simplification
I want you to act as a programmer in R. Please simplify this code while ensuring that it is {efficient/easy to read}? {Insert Code} - SQL Code Simplification
I want you to act as a SQL programmer. I am running {PostgreSQL 14/MySQL 8/SQLite 3.4/other versions}. Can you please simplify this query {while ensuring that it is efficient/easy to read/insert any additional requirements}? - Translate R to Python
I want you to act as a programmer in R. Please translate this code to Python. {Insert code} - Translate Python to R
I want you to act as a programmer in Python. Please translate this code to R. {Insert code} - Compare Python Function Speeds
I want you to act as a Python programmer. Can you write code that compares the speed of two functions {functionname} and {functionname}? {Insert functions} - Write Unit Tests in R
I want you to act as an R Programmer. Can you please write unit tests for the function {functionname}? {Insert requirements for unit tests, if any} {Insert code} - Write Unit Tests in Python
I want you to act as a Python Programmer. Can you please write unit tests for the function {functionname}? {Insert requirements for unit tests, if any} {Insert code}
Data Analysis Workflows
- SQL Data Generation
I want you to act as a data generator. Can you write SQL queries in {database version} that create a table {table name} with the columns {column name}. Include relevant constraints and index. - Common Table Expressions in SQL
I want you to act as a SQL code programmer. I am running {database version}. Can you rewrite this query using CTE? {Insert query} - SQL Aggregation Example
I want you to act as a data scientist. {Insert description of tables}. Can you {count/sum/take average} of {value} which are {insert filters}? - 7-Day Running Average in SQL
I want you to act as a data scientist. I am running {PostgreSQL 14/MySQL 8/SQLite 3.4/other versions}. I have the tables {table_name} which are {table description}. The sales table consists of the columns {column names}. Can you please write a query that finds the 7-day running average of {quantity}? - Window Functions in SQL
I want you to act as a data scientist. I am running {PostgreSQL 14/MySQL 8/SQLite 3.4/other versions}. I have the tables {table_name} which are {table description}. The sales table consists of the columns {column names}. Can you please write a query that finds {required window function}? - Generate Markdown in Python
I want you to act as a data generator in Python. Can you generate a Markdown file that contains {data requirement}. Save the file to {filename}. - Generate CSV in Python
I want you to act as a data generator in Python. Can you generate a CSV file that contains {data requirement}. Save the file to {filename}. - Generate JSON in Python
I want you to act as a data generator in Python. Can you generate a JSON file that contains {data requirement}. Save the file to {filename}. - Clean Data with Pandas
I want you to act as a data scientist programming in Python Pandas. Given a CSV file that contains data of {dataframe name} with the columns {column names} for {dataset context}, write code to clean the data? {Insert requirements for data} - Data Aggregation in Pandas
I want you to act as a data scientist programming in Python Pandas. Given a table {table name} that consists of the columns {column names}, can you please write a query that finds {requirement}? - Merge Data in Pandas
I want you to act as a data scientist programming in Python Pandas. Given a table {table 1 name} that consists of the columns {column names} and another table {table 2 name} with the columns {column names}, please merge the two tables. {Insert additional requirement, if any} - Data Reshaping in Pandas (Long to Wide)
I want you to act as a data scientist programming in Python Pandas. Given a table {table name} that consists of the columns {column names}, can you aggregate the {value} by {column} and convert it from long to wide format? - Generate Markdown in R
I want you to act as a data generator in R. Can you generate a Markdown file that contains {data requirement}. Save the file to {filename}. - Generate CSV in R
I want you to act as a data generator in R. Can you generate a CSV file that contains {data requirement}. Save the file to {filename}. - Generate JSON in R
I want you to act as a data generator in R. Can you generate a JSON file that contains {data requirement}. Save the file to {filename}. - Data Cleaning in R (Tidyr)
I want you to act as a data scientist programming in R tidyr. You are given the {dataframe name} dataframe containing the columns {column name}. {Insert requirement} - Data Aggregation in R (Tidyr)
I want you to act as a data scientist programming in R tidyr. You are given the {dataframe name} dataframe containing the columns {column name}. {Insert requirement} - Merge Data in R (Tidyr)
I want you to act as a data scientist programming in R tidyr. You are given the {dataframe 1 name} dataframe containing the columns {column name}. You also have a {dataframe 2 name} dataframe containing the columns {column name}. Find the {required output} - Reshape Data (Long to Wide) in R (Tidyr)
I want you to act as a data scientist programming in R tidyr. You are given the {dataframe name} dataframe containing the columns {column name}. Please convert the data to wide format. - Reshape Data (Wide to Long) in R (Tidyr)
I want you to act as a data scientist programming in R tidyr. You are given the {dataframe name} dataframe containing the columns {column name}. Please convert the data to long format.
Data Visualization Workflows
- Create Plots in ggplot2
I want you to act as a data scientist coding in R. Given a dataframe {dataframe name} containing the columns {column names}, use ggplot2 to plot a {chart type and requirement}. - Gridplot Visualizations in ggplot2
I want you to act as a data scientist coding in R. Given a dataframe {dataframe name} containing the columns {column names}, please create a gridplot using ggplot2 with {number of rows/columns}. - Subplot Visualizations in Python
I want you to act as a data scientist programming in Python. I have a dataset {dataset name} with columns {column names}. Please create a subplot with {rows/columns and visualization type}. - Heatmap Visualization in Python
I want you to act as a data scientist programming in Python. Please generate a heatmap for a dataset {dataset name} containing columns {column names}. Adjust the color palette according to {requirement}. - Interactive Plots in Python (Plotly)
I want you to act as a data scientist programming in Python. Please generate an interactive plot using {Plotly/Bokeh/other tool} for {data context}. - Boxplot Visualization in R
I want you to act as a data scientist coding in R. Please create a boxplot visualization for {dataset name} with {parameters/variables for visualization}. - Barplot Visualization in Python
I want you to act as a data scientist programming in Python. I have a dataset {dataset name} with columns {column names}. Please generate a barplot visualization. - Pie Chart Visualization in Python
I want you to act as a data scientist programming in Python. I have a dataset {dataset name}. Please generate a pie chart visualization showing the distribution of {column name}. - Scatter Plot Visualization in R
I want you to act as a data scientist coding in R. Please create a scatter plot with {dataset name} using {x-axis} and {y-axis} for the plot. - Line Plot Visualization in R
I want you to act as a data scientist coding in R. Please create a line plot using {dataset name} with {x-axis and y-axis}. - Pairplot Visualization in Python (Seaborn)
I want you to act as a data scientist programming in Python. Given a dataset {dataset name}, please create a pairplot to show relationships between all numerical variables. - Histogram Visualization in R
I want you to act as a data scientist coding in R. Please create a histogram using {dataset name} with {column name} on the x-axis. - Violin Plot in Python (Seaborn)
I want you to act as a data scientist programming in Python. Please create a violin plot to show the distribution of {column name} from {dataset name}. - Facet Grid Visualization in R (ggplot2)
I want you to act as a data scientist coding in R. Please create a facet grid using ggplot2 to visualize {variables} from {dataset name}. - Time Series Plot in Python
I want you to act as a data scientist programming in Python. Please create a time series plot with {dataset name} showing {time variable} on the x-axis and {data variable} on the y-axis. - Density Plot Visualization in R
I want you to act as a data scientist coding in R. Please create a density plot using {dataset name} for {column name} to visualize the distribution. - 3D Plot in Python (Matplotlib)
I want you to act as a data scientist programming in Python. Given a dataset {dataset name}, please create a 3D scatter plot using {x-axis, y-axis, z-axis}. - Radar Chart in Python (Plotly)
I want you to act as a data scientist programming in Python. Please create a radar chart to visualize the values of {features} from {dataset name}. - Gantt Chart in Python (Plotly)
I want you to act as a data scientist programming in Python. Please create a Gantt chart to visualize {project/task information}. - Bubble Plot in Python (Plotly)
I want you to act as a data scientist programming in Python. Please create a bubble plot to visualize {data variables} with size proportional to {another variable}. - Stacked Bar Chart in Python (Matplotlib)
I want you to act as a data scientist programming in Python. Please create a stacked bar chart using {dataset name} to show {category comparisons over time}. - Tree Map in Python (Plotly)
I want you to act as a data scientist programming in Python. Please create a tree map using {dataset name} to visualize hierarchical data with {category names}. - Sunburst Plot in Python (Plotly)
I want you to act as a data scientist programming in Python. Please create a sunburst plot to visualize hierarchical data using {dataset name}.
Machine Learning Workflows
- Train a Linear Regression Model in Python
I want you to act as a machine learning engineer. Please create and train a linear regression model in Python using {dataset name} to predict {target variable}. - Train a Random Forest Model in Python
I want you to act as a machine learning engineer. Please create and train a random forest model in Python using {dataset name} to predict {target variable}. - Train a Decision Tree Model in Python
I want you to act as a machine learning engineer. Please create and train a decision tree model in Python using {dataset name} to predict {target variable}. - Train a SVM Classifier in Python
I want you to act as a machine learning engineer. Please create and train a Support Vector Machine (SVM) classifier in Python using {dataset name} to classify {target variable}. - Train a K-Nearest Neighbors (KNN) Model in Python
I want you to act as a machine learning engineer. Please create and train a K-Nearest Neighbors (KNN) model in Python using {dataset name} to predict {target variable}. - Model Evaluation in Python
I want you to act as a machine learning engineer. Please evaluate the performance of the {model type} using {accuracy/precision/recall/F1 score} on {test dataset}. - Cross-Validation in Python
I want you to act as a machine learning engineer. Please perform cross-validation on the {model type} with {dataset name} using {number of folds}. - Hyperparameter Tuning with GridSearchCV in Python
I want you to act as a machine learning engineer. Please perform hyperparameter tuning for the {model type} using GridSearchCV on {dataset name} with the parameters {list of parameters}. - Feature Selection in Python
I want you to act as a data scientist in Python. Given the {dataset name} containing {features}, please perform feature selection and explain which features should be retained for {target variable}. - Feature Engineering in Python
I want you to act as a data scientist in Python. Given {dataset name} and {target variable}, please suggest and implement relevant feature engineering techniques. - Ensemble Learning in Python
I want you to act as a machine learning engineer. Please combine the predictions from {model 1}, {model 2}, and {model 3} using an ensemble method (e.g., bagging/boosting) in Python. - Dimensionality Reduction in Python (PCA)
I want you to act as a machine learning engineer. Given {dataset name}, please apply Principal Component Analysis (PCA) to reduce the dimensionality and explain the result. - Clustering with K-Means in Python
I want you to act as a data scientist. Please apply the K-Means clustering algorithm to {dataset name} to identify {number of clusters}. - Clustering with DBSCAN in Python
I want you to act as a data scientist. Please apply the DBSCAN clustering algorithm to {dataset name} to identify clusters and visualize the results. - Train a Neural Network in Python (Keras)
I want you to act as a machine learning engineer. Please build and train a simple neural network in Python using Keras to classify {target variable} from {dataset name}. - Train a Convolutional Neural Network (CNN) in Python (Keras)
I want you to act as a machine learning engineer. Please build and train a Convolutional Neural Network (CNN) in Python using Keras to classify images from {image dataset}. - Train a Recurrent Neural Network (RNN) in Python (Keras)
I want you to act as a machine learning engineer. Please build and train a Recurrent Neural Network (RNN) in Python using Keras to predict {target variable} from {time-series dataset}. - Train a LSTM Model in Python
I want you to act as a machine learning engineer. Please build and train a Long Short-Term Memory (LSTM) network in Python to predict {target variable} from {time-series dataset}. - Transfer Learning with Pretrained Models in Python
I want you to act as a machine learning engineer. Please implement transfer learning using a pretrained {model type, e.g., ResNet} to classify {image dataset} in Python. - Natural Language Processing (NLP) Pipeline in Python
I want you to act as a data scientist. Please create an NLP pipeline in Python that performs {text preprocessing, tokenization, sentiment analysis, etc.} on {text data}.
Model Deployment and Production
- Deploy a Model with Flask
I want you to act as a machine learning engineer. Please deploy the trained {model type} to a Flask API for predictions with {input format}. Include code for {input preprocessing, output formatting}. - Deploy a Model with FastAPI
I want you to act as a machine learning engineer. Please deploy the trained {model type} to a FastAPI API for predictions with {input format}. Include code for {input preprocessing, output formatting}. - Containerize a Model with Docker
I want you to act as a machine learning engineer. Please containerize the {model type} and its dependencies using Docker. Include instructions for running the container. - Model Monitoring in Production
I want you to act as a machine learning engineer. Please explain how to monitor the performance of {model type} in a production environment, including methods for {error logging, performance tracking, model drift detection}. - Automated Retraining Pipeline
I want you to act as a machine learning engineer. Please design an automated retraining pipeline that triggers when {data drift/accuracy drop} occurs in production for the {model type}. Include steps for {data collection, model evaluation, retraining}.
Big Data Workflows
- Set Up Apache Spark Cluster
I want you to act as a data engineer. Please explain how to set up an Apache Spark cluster for distributed data processing using {cloud platform/technology}. - Process Big Data with Apache Spark
I want you to act as a data engineer. Given a large dataset {dataset name}, please write a Spark job in Python/Scala to perform {data transformation, aggregation, filtering}. - Data Pipeline with Apache Kafka
I want you to act as a data engineer. Please create a simple data pipeline using Apache Kafka to stream data from {source} to {destination}. - Querying Big Data with Apache Hive
I want you to act as a data engineer. Please write a HiveQL query to extract {data fields} from the big data stored in {HDFS} using Apache Hive. - ETL Process for Big Data
I want you to act as a data engineer. Please explain and implement an ETL process for {dataset name}, including {data extraction, transformation, and loading} into {data warehouse}. - Data Warehousing with Amazon Redshift
I want you to act as a data engineer. Please create a schema and load {dataset name} into Amazon Redshift. Write an SQL query to analyze {specific data insights}. - Big Data Processing with Google BigQuery
I want you to act as a data engineer. Please use Google BigQuery to process {dataset name} and write a query to extract {specific data insights}. - Streaming Data Analysis with Apache Flink
I want you to act as a data engineer. Please write a job in Apache Flink to process streaming data from {source} and output results to {destination}. - NoSQL Database Querying (MongoDB)
I want you to act as a database engineer. Please write a MongoDB query to extract {data fields} from a collection where {specific conditions} are met. - Data Lake Implementation on AWS S3
I want you to act as a data engineer. Please explain how to implement a data lake on AWS S3 to store and query large datasets, including {data cataloging with AWS Glue}. - Distributed Machine Learning with Spark MLlib
I want you to act as a machine learning engineer. Please train a machine learning model using Apache Spark’s MLlib to predict {target variable} from {dataset name}. - Data Visualization in Tableau
I want you to act as a data analyst. Please create a dashboard in Tableau to visualize {specific data insights} from {dataset name}. - Interactive Dashboards with Power BI
I want you to act as a data analyst. Please create an interactive dashboard in Power BI to visualize {specific data insights} from {dataset name}. - Pipeline Orchestration with Apache Airflow
I want you to act as a data engineer. Please design and implement a data pipeline workflow using Apache Airflow to perform {data extraction, transformation, and loading}. - Scalable Data Processing with AWS Glue
I want you to act as a data engineer. Please explain how to use AWS Glue to automate the ETL process for {dataset name} and prepare data for analysis.
- Pairplot Visualization in Python (Seaborn)
Related posts:
Cell Biology Fundamentals for Bioinformatics
Is postdoc experience valued by industry?
Efficient Linux File Management and NGS Data Analysis Techniques
AlphaFold's Limits: Students Expose AI's Predictive Flaws in Protein Stability
Exploring Bioinformatics Databases: From Genomes to Structures
Navigating the Complex Landscape of Metagenomics Assembly: A Guide to Strategies and Tools
Foundations of Computing for Bioinformatics
Exploring Protein Information and Analysis with UniProt
Synergies in Artificial Intelligence and Bioinformatics: A Comprehensive Overview
Clinical Data Management Essentials from Experienced Health Informaticians
Educational AI Resources for Children
Top 10 AI-Driven Medicinal Breakthroughs in 2024