- Published on
25 ChatGPT Prompts for Data Analysts: SQL, Python (2026)
- Authors

- Name
- PromptShelf Editorial
Most ChatGPT prompts for data analysts that you find online are written by people who have never written a window function under a deadline. They give you generic "analyze this data" prompts that miss the actual hard parts of the job: writing SQL that runs on the first try, cleaning a CSV without breaking the schema, and explaining a regression coefficient to a VP who wants the answer in two sentences.
This list covers the prompts I would actually paste into ChatGPT during a normal work week. They are organized by the five places data analysts lose the most time: SQL, Python, analysis framing, stakeholder communication, and the parts of the job that are not actually analysis at all.
You can copy any of them straight into a chat. Each one has the role, the task, the constraints, and the output shape the model needs to do something useful. The example brackets are inputs you fill in.
How to use these ChatGPT prompts for data analysts
A few notes before you start.
ChatGPT is a confident writer of plausible-looking SQL. That confidence is the failure mode. Always run the query against a real (or test) database before you trust the number it produces. Type errors, missing nulls, and join multiplication will not show up until you do.
Do not paste anything into the free tier that you would not paste into a public Reddit thread. That means no production data, no customer identifiers, no internal metric names that double as competitive intelligence. For real work, use a paid plan with chat history disabled, your company's enterprise AI tool, or a local model.
The prompts below assume you are working with the data structure described in the bracketed brief. Replace the brackets with your actual columns, table names, and constraints. Vague inputs give vague outputs.
SQL writing and review
Five prompts that cover most of the SQL tasks an analyst hands to ChatGPT.
1. Write a SQL query from a plain-English question
Prompt: "You are a senior analytics engineer. I am working in Postgres 15. Schema below. Write a single SQL query that answers this question:
[business question, e.g., 'What is the 7-day rolling retention rate for users who signed up between 2026-01-01 and 2026-03-31, broken out by acquisition channel?']. Tables and columns:[paste schema as CREATE TABLE statements or column lists]. Constraints: use window functions where appropriate, no temp tables, name your CTEs descriptively, include a 1-sentence comment above the query explaining the approach. Output: SQL only, in a single code block."
Use this when the analytical question is clear but the SQL is annoying to write. The Postgres version and the schema paste are the two inputs people skip most often, and skipping them is why the model invents column names.
2. Review a SQL query for correctness
Prompt: "You are a senior data engineer reviewing a SQL pull request. Below is a query that is meant to do
[describe intended behavior]. Walk through it and flag (a) any logical errors that would produce wrong numbers, (b) any performance risks at[expected row count], and (c) any places where a null value would silently change the result. Be specific about line or CTE. Do not rewrite the query. Output as three bulleted lists under the headings Correctness, Performance, Nulls. Query:[paste SQL]."
The "do not rewrite the query" instruction is load-bearing. Without it, the model skips the diagnosis and rewrites your code, which is harder to learn from.
3. Translate a query between SQL dialects
Prompt: "Translate the following query from
[source dialect, e.g., Snowflake]to[target dialect, e.g., BigQuery Standard SQL]. Preserve the logic exactly. Where a function does not have a direct equivalent, use the closest semantically correct construct and add a 1-line comment explaining the substitution. Output: translated query in a single code block, then a 2-3 bullet list of substitutions you made. Query:[paste SQL]."
Useful when you switch tools mid-project. The substitution list is what stops you from finding out about the silent difference between DATE_DIFF argument orders three days later.
4. Explain a complex query you inherited
Prompt: "You are a senior analyst explaining a query to a junior teammate. Read the SQL below and explain what it does, CTE by CTE, in plain English. For each CTE: one sentence on the input, one sentence on the transformation, one sentence on the output shape. Then summarize the final result in a single sentence the team's PM would understand. Do not critique the query. Output as a markdown list. Query:
[paste SQL]."
Better than reading line by line. The "do not critique" guardrail keeps the answer focused on comprehension instead of opinion.
5. Write a SQL test for a metric definition
Prompt: "I have a SQL metric defined as:
[paste metric SQL]. Write a SQL test that catches three common failure modes for this metric: (1) duplicate rows in the source table, (2) timestamp timezone drift, (3) a backfill that lands a wrong-dated row in the window. The test should output a row per failure mode with columnsfailure_mode,count_affected,sample_id. Output: SQL only, single code block. Database:[Postgres/BigQuery/Snowflake]."
Metric tests are the unglamorous work that prevents the "why is the number different in two dashboards" thread. Most analysts write them only after an incident.
Python and data cleaning
Five prompts for the part of the job that lives in pandas, polars, or a notebook.
6. Clean a messy CSV without breaking the schema
Prompt: "You are a Python data engineer. I have a CSV with the following sample (first 5 rows):
[paste rows or describe columns and their values]. The known issues are:[e.g., 'mixed date formats in signup_date, currency strings in revenue, leading/trailing whitespace in user_id']. Write a pandas script that: (1) reads the CSV with the correct dtypes, (2) cleans each issue with a separate function, (3) returns a DataFrame ready for analysis. Add type hints. Print the dtype map before and after. Constraints: no inplace operations, no chained assignment warnings, no silent type coercion. Output: a single Python code block."
The "separate function per issue" instruction matters because it lets you fix one thing without rewriting everything when the next CSV shows up with a new edge case.
7. Turn a Jupyter scratch notebook into a reusable script
Prompt: "Below is a Jupyter notebook cell sequence that produces a useful analysis. Refactor it into a single Python script with: a
main()function, separate functions for load / clean / transform / output, type hints on all function signatures, docstrings (one-line) on each function, and aif __name__ == '__main__'block. Preserve all logic exactly. Do not add features. Do not remove logging. Output: complete Python script in a single code block. Notebook:[paste cells]."
This is the prompt that gets the most "wait, that actually saved me an hour" reactions when you share it with another analyst.
8. Profile a DataFrame before analysis
Prompt: "Write a Python function
profile(df: pd.DataFrame) -> dictthat returns a structured profile of any DataFrame: row count, column count, dtype per column, null count and null %, distinct count per column, top 3 values for each categorical column with their frequencies. Constraints: use only pandas + standard library, handle empty DataFrames without error, return a dict that prints cleanly withjson.dumps(profile, indent=2, default=str). Output: single Python code block."
Saves the 10 minutes of exploratory typing you do at the start of every new dataset.
9. Generate a pandas script from a SQL query you already wrote
Prompt: "I have this SQL query that produces the right answer:
[paste SQL]. Write a pandas script that does the same transformation on a pandas DataFrame calleddfwith the same columns. Preserve column names and order. For each SQL operation, add a 1-line comment naming the SQL clause it replaces (e.g.,# WHERE clause,# LEFT JOIN). Constraints: chain operations where readable, but break the chain at any window function or join for clarity. Output: single Python code block."
Useful when the data lives in a CSV instead of a database, or when you want a self-contained reproducible notebook.
10. Debug a pandas error message
Prompt: "I am getting this pandas error:
[paste full error including traceback]. The code that produced it is:[paste relevant function]. The DataFrame I am operating on has these columns and dtypes:[paste df.dtypes output]. Walk through what the error means, what the most likely cause is given my DataFrame schema, and write the fixed code. Output: three sections labeled Diagnosis, Likely Cause, Fix. Code in a single block under Fix."
The dtypes paste is the input people skip, and it is the input that turns a generic answer into a useful one.
Analysis framing and interpretation
Five prompts for the analytical thinking work that is harder to automate.
11. Sanity-check an unexpected number
Prompt: "I just calculated
[metric, e.g., 'a 38% week-over-week drop in DAU']for[business context, e.g., 'our consumer mobile app']. Before I share this with the team, walk me through the top 5 reasons this number could be wrong, ordered by likelihood. For each reason, name (a) what to check, (b) what query or data pull would rule it out, (c) what the false-positive scenario looks like. Do not assume the number is correct. Do not reassure. Output as a numbered list."
The "do not reassure" guardrail is important. Without it, the model tends toward "this is a significant finding" framing instead of "here is what would invalidate it."
12. Generate hypotheses for a metric change
Prompt: "Our
[metric, e.g., 'free-to-paid conversion rate']dropped from[X]to[Y]between[time period]. Other context:[anything you noticed, e.g., 'we shipped a pricing page change on 2026-04-12', 'organic traffic was flat', 'paid traffic was up 20%']. List 8 hypotheses that could explain the drop, ranked by how testable they are with our existing data (assume we have product events, marketing attribution, and Stripe). For each, name the one query or chart that would confirm or rule it out. Output as a numbered list with two sub-bullets each (hypothesis, test)."
Use this before you start pulling data, not after. It changes what you query.
13. Choose the right statistical test
Prompt: "I want to test whether
[hypothesis, e.g., 'our new checkout flow has a higher conversion rate than the old one']. My data:[describe, e.g., '4 weeks of A/B test data, 12,000 users per arm, binary conversion outcome, no obvious time trend']. Which statistical test is appropriate, and which is the second-best alternative if the assumptions of the first fail? For each: name the test, the assumptions you must verify, what tool or function (Python or R) implements it, and the interpretation rule you should write down before you look at the result. Output as a comparison table."
The "write down before you look at the result" line nudges you toward pre-registration, which most analysts have heard of and almost no analyst actually does.
14. Interpret a regression output for a stakeholder
Prompt: "I ran a regression with
[dependent variable]regressed on[list of independent variables]. Output:[paste full regression table including coefficients, standard errors, p-values, R-squared]. Translate this into 3 plain-English findings for a non-technical executive. For each finding: one sentence on what the model says, one sentence on the strength of the evidence (avoid 'significant', use 'we can be more or less confident'), one sentence on what it does not prove. End with one limitation of the model that the executive should hear. Output as a numbered list."
The "avoid 'significant'" instruction is there because once a non-technical reader sees that word, they will treat the finding as proven. The model defaults to using it.
15. Critique your own analysis plan
Prompt: "Below is my analysis plan for answering
[business question]. Read it like a skeptical reviewer. List the top 5 issues you see, ordered by how badly they would damage the conclusion if I missed them. For each issue: name it, explain why it matters in one sentence, and recommend the smallest change to the plan that would fix it. Plan:[paste your bullet-point plan]."
Better than asking for feedback in a Slack channel because the model has no incentive to be nice.
Stakeholder communication
Five prompts for the writing work that data analysts often underestimate the importance of.
16. Draft an executive summary of an analysis
Prompt: "You are a senior analyst writing the executive summary of a longer analysis. The audience is
[role, e.g., 'a VP of Product who has 60 seconds']. The analysis answered[the question]and the answer is[the headline finding]. Other key results:[2-3 supporting findings]. Key caveats:[1-2 things that limit the conclusion]. Recommended action:[what you want them to do]. Write the summary as: (1) one sentence with the answer, (2) one paragraph with the 2-3 supporting findings, (3) one paragraph naming the caveats honestly, (4) one sentence with the recommended action. Total under 200 words. No buzzwords. No filler."
The under-200 ceiling is enforceable and prevents the model from writing the long version it prefers.
17. Translate a metric definition for a non-technical audience
Prompt: "Explain the metric
[name]to a non-technical audience. Technical definition:[paste the SQL or formula]. Business context:[what the metric is used to decide]. Write: (1) a one-sentence definition the audience can repeat, (2) a 2-3 sentence explanation of how it is computed without using the word 'query' or 'database', (3) one example of how a one-unit change in the metric would map to a business outcome, (4) one thing the metric does NOT measure that they might assume it does. Output as four short paragraphs with bold headings."
Item (4) is what stops the metric from being misused in a meeting six weeks from now.
18. Pre-meeting stakeholder summary
Prompt: "You are preparing for a stakeholder meeting about
[analysis topic, e.g., 'Q1 customer churn drivers']. The stakeholders are:[list roles and what they care about]. Your analysis found:[bullet list of 3-5 findings with the actual numbers]. Likely questions or objections:[anything you anticipate]. Write a 1-page meeting brief with: (1) the single most important finding in 25 words or less, (2) the supporting findings as a 5-bullet list with numbers, (3) a 'questions I expect' section with your honest 1-sentence answer to each, (4) a 'what I don't know' section listing 2-3 limitations. Tone: direct, no hedging, no buzzwords."
This is the prompt I tested on free ChatGPT for this post. The verbatim response and the editor's notes are below.
What this prompt actually returns (tested 2026-05-19)
We ran Prompt 18 on free ChatGPT with a concrete brief baked in: [analysis topic: 'Q1 2026 customer churn analysis for our $99/mo B2B SaaS product, n=412 churned accounts'; stakeholders: 'VP of Customer Success who wants retention plays, CFO who wants to know if this changes the 2026 plan, Head of Product who wants to know which features correlate with churn'; findings: 'churn rate ticked up from 2.1% to 2.9% MoM in Feb 2026; 64% of churners did not use the new reporting module shipped Jan 2026; admins who churned had logged in <3 times in the 30 days before cancellation in 71% of cases; pricing is cited as primary reason in only 12% of exit surveys; CSM coverage was reduced for accounts under $1,500/mo in Dec 2025'; expected objections: 'is the reporting module signal causal or correlated, what does the 30-day login pattern mean for accounts on annual plans, did the CSM coverage change actually move the needle']. Reproduced verbatim:
Q1 2026 Churn Analysis Brief — $99/mo B2B SaaS
Most Important Finding
Low product engagement, not pricing, is the clearest churn signal: 71% of churned admins logged in fewer than 3 times in the 30 days before cancellation.
Supporting Findings
- Monthly churn increased from 2.1% to 2.9% in February 2026, reversing the prior flat trend.
- 64% of churned accounts did not use the reporting module launched in January 2026.
- 71% of churned admins logged in fewer than 3 times in the 30 days before cancellation.
- Pricing was the primary churn reason in only 12% of exit surveys.
- CSM coverage for accounts under $1,500 MRR was reduced in December 2025, immediately before the Q1 churn increase.
Questions I Expect
Q: Is the reporting module signal causal or just correlated? We do not have evidence that the reporting module prevents churn, but non-adoption is a strong marker for low engagement and low perceived product value.
Q: Does the "<3 logins in 30 days" pattern matter for annual-plan customers? Yes — even on annual plans, low recent admin engagement is still the strongest observable leading indicator before cancellation or non-renewal.
Q: Did reducing CSM coverage actually increase churn? The timing lines up, especially in lower-revenue accounts, but we have not isolated the impact from other Q1 changes.
Q: Does this change the 2026 financial plan? Not yet, but if churn remains near 2.9% for another quarter, retention assumptions in the current plan will be too optimistic.
Q: Should we respond with pricing changes or discounts? No — pricing appears secondary; the bigger issue is weak engagement and low feature adoption.
Q: What should Customer Success do immediately? Focus retention outreach on low-login admins and accounts that never activated reporting within the first 30–45 days.
What I Don't Know
- We have not quantified whether reporting module adoption independently predicts retention after controlling for account size, tenure, and CSM coverage.
- We do not yet know whether reduced CSM coverage affected all segments equally or primarily smaller accounts.
- Exit survey data may understate pricing pressure because customers often cite operational reasons instead of budget constraints.
Editor's notes. The model held the structure and landed a strong headline (24 words, lead with the 71% data point, names the cause as engagement not price). The five supporting bullets all carry numbers, which is rare for a default ChatGPT response. The four things worth fixing before walking into the meeting: (1) the first supporting bullet adds "reversing the prior flat trend" which the brief never mentioned, a small fabricated context that becomes a real problem if the prior trend was not actually flat; (2) the CSM coverage bullet adds "immediately before the Q1 churn increase" which slides a correlational claim into what should be a neutral data point; (3) the model expanded the three expected objections in the brief to six questions, which is helpful framing but means the answers to the three the brief actually flagged are mixed in with three new ones, so the highest-stakes answer (the causal vs. correlated reporting-module question) is no longer the load-bearing first answer; (4) the "Yes" on annual plans is more confident than the brief data justifies: the brief said the 30-day login pattern was 71% in churned accounts but said nothing about whether annual cohorts behave the same way, so this needs a hedge or a footnote that the data behind it is monthly accounts. Also worth noting the third "What I Don't Know" bullet is a strong observation the brief did not seed: exit-survey pricing pressure being understated is a sharp methodological caveat. Net: usable as a 90% first draft. The flat-trend and immediately-before phrasings need to come out before sharing.
19. Write a Slack update about an in-progress analysis
Prompt: "I am working on
[analysis]and a stakeholder just asked for an update in Slack. Where I am:[bullet list of what is done and what is not]. Expected ETA:[when you will have a real answer]. Write a Slack message of 4-6 lines that (1) shows progress, (2) is honest about what is not yet certain, (3) gives an ETA without overcommitting. No emojis. No 'just checking in' filler. No promises about findings."
The "no promises about findings" line stops the model from writing "early signs point to X" before the analysis is done.
20. Disagree with a stakeholder's interpretation
Prompt: "A stakeholder just shared the following interpretation of my analysis:
[paste their take]. The actual finding was:[your interpretation]. The reason their take is wrong (or partly wrong):[your reasoning]. Write a 3-paragraph reply for a written channel (email or Slack) that (1) restates their interpretation accurately, (2) explains where it diverges from the data, (3) proposes the next step. Tone: collegial, factual, not defensive. Do not use the word 'actually'. Do not soften the disagreement into agreement."
The "do not soften" instruction is necessary because the model defaults to a both-sides framing that erases the disagreement you needed to make.
Workflow, documentation, and career
Five prompts for the parts of the job that are not actually analysis.
21. Write a dbt model docstring from the SQL
Prompt: "Below is the SQL for a dbt model called
[model_name]. Write the YAML.ymldocumentation block for it: a 2-sentence model description, a column-level description for each output column (one sentence each, factual, no fluff), and at least 2 generic dbt tests (not_null,unique,accepted_values,relationships) where they make sense. Output as a single YAML block ready to paste into the project. SQL:[paste]."
Most dbt projects rot at the documentation layer first. This prompt fights that.
22. Generate a runbook for an existing dashboard
Prompt: "I own this dashboard:
[name, audience, link or description]. Write a runbook entry covering: (1) what business question the dashboard answers, (2) the upstream data sources and their refresh cadence, (3) the top 3 failure modes (e.g., 'this metric goes to zero if the events pipeline lags'), (4) the on-call check (where to look first if it breaks), (5) the contact list (who to ping for each upstream system). Output as 5 numbered sections, total under 400 words."
If your team's dashboards do not have runbooks, this is the cheapest 30-minute project on your week.
23. Critique a metric definition before it gets adopted
Prompt: "Our team is about to adopt this metric definition:
[paste]. The intended use is[what decisions it will inform]. Read the definition as a senior analyst and flag (a) ambiguities in the definition (where two people could compute it differently), (b) failure modes (where the metric becomes misleading), (c) what it does NOT measure that the team might assume it does. Be specific. Do not propose alternatives. Output as three bulleted lists."
Easier to fix a metric definition before the company starts making decisions with it.
24. Write a 1-pager proposing an analysis
Prompt: "I want to propose an analysis:
[your idea, e.g., 'investigate whether organic-channel users have lower LTV than paid-channel users']. The team's leadership cares about[what they care about]. Write a 1-page proposal with: (1) the question and why it matters in 2 sentences, (2) the expected output (chart, report, model) in 1 sentence, (3) the data sources needed, (4) the estimated effort in days, (5) the decision the analysis will inform. Tone: direct. No filler. Under 300 words."
Most analysts skip the proposal step and start the work. The proposal is what tells you whether the work is worth doing.
25. Draft a self-review for a performance cycle
Prompt: "You are helping me write a draft of my analyst performance self-review for
[period]. My role:[level, e.g., 'Senior Analyst on the Growth team']. My 3-5 biggest projects this cycle:[list, with the business outcome each one drove]. My biggest learning:[honest answer]. Where I struggled:[honest answer]. Write a self-review draft in 4 sections: Impact, Collaboration, Growth Areas, Next Cycle Goals. Each section: 100-150 words. Tone: confident, specific, no hedging, no false modesty. Quantify outcomes where the input gives a number."
The "no false modesty" instruction matters because the default model output tends toward humility in ways that hurt you at calibration.
Tips for getting better results
Three things that matter more than any specific prompt.
Paste your actual schema. The single biggest determinant of whether ChatGPT writes useful SQL is whether you gave it the column names and types. Pseudo-code descriptions of your tables ("we have a users table and an events table") produce pseudo-code answers.
State your constraints explicitly. "No CTEs," "no inplace operations," "under 200 words." The model honors negative constraints when you put them in writing. It ignores them when you say "keep it clean."
Treat the first answer as a draft. Even with a strong prompt, the first response is usually a 70% answer. The follow-up "now do it again but X" is where the useful version comes from. Budget for it.
FAQ
Is ChatGPT reliable enough to write production SQL?
Not on its own. Treat any SQL the model produces as code that needs review and testing before it lands in a dashboard or report. The failure modes are subtle: a wrong join cardinality, a window function with the wrong partition, a null-handling difference between dialects. The model writes plausible SQL that compiles. It does not write correct SQL on the first try.
Should I paste my company data into ChatGPT?
Not into the free tier. The free tier may use your inputs for model training, and even where it does not, the inputs are stored. For real work, use ChatGPT Team or Enterprise (data is not used for training), an enterprise-licensed tool your company has approved, or a local model. For learning and personal projects, use synthetic data or public datasets.
Can ChatGPT replace a data analyst?
No, and the framing misses the actual question. ChatGPT can take 30 minutes off a 90-minute task for an analyst who already knows what to ask. It cannot replace the analyst's judgment about what is worth asking. The hard parts of the job are framing, sanity-checking, and communicating, and the model is a copilot for all three, not a replacement.
How do I tell whether ChatGPT's SQL is correct?
Run it against a test table where you know the expected output. For window functions specifically, write a 10-row test input by hand, compute the answer on paper, then run the query against the test input and compare. Most "ChatGPT got it wrong" stories I have heard came from analysts who skipped this step.
What is the single most useful prompt for a junior analyst?
Prompt 4: explain a complex query you inherited. Inheriting an analytics codebase is the most common first task on a new job, and the bottleneck is comprehension, not writing. Use that prompt before you write a single new query.
What to do next
If you only try one prompt from this list, try Prompt 18 (the pre-meeting stakeholder summary). Most analysts overweight the analysis work and underweight the communication work, and Prompt 18 is the cheapest fix for that.
If you are early in your career, bookmark Prompts 4 and 25 (inherited query and self-review). They cover the parts of the job that get easier with practice and faster with a good prompt.
Send the list to one teammate who would use it. The team that has shared prompts is the team that ships analyses faster.