AI Tools for Data Scientists

AI tools that help data scientists find research papers, pull economic datasets, generate charts, and build evidence-based models faster.

Get started for free

Works in Chat, Cowork and Code

Papers found
14 results · Semantic Scholar + arXiv
Top result
TabPFN v2 — 2024 · SOTA on 95 classification benchmarks
Runner-up
SAINT+ — self-attention + intersample attention · GitHub available
Key finding
Transformers match or beat XGBoost on <10K row datasets
Open implementations
9 of 14 papers include public GitHub repos

Academic literature search for model research

Search millions of peer-reviewed papers by method, dataset, or benchmark. Find the state-of-the-art paper on a specific architecture before spending a week implementing something that was superseded two months ago.

Find papers on self-supervised learning for anomaly detection in time-series data, published 2023–2025. I want methods that work on short sequences (<100 time steps).

Found 14 papers. Top 3: (1) "TimesNet" — 2D variation modeling for time-series, achieves 94.2 F1 on SMAP dataset. (2) "DCdetector" — dual-attention contrastive learning, SOTA on PSM and MSL. (3) "Anomaly Transformer" — association discrepancy method, works on sequences as short as 50 steps. All include GitHub repos.

ToolRouter search_papers
TimesNet
2D variation modeling · 94.2 F1 on SMAP · GitHub available
DCdetector
Dual-attention contrastive · SOTA on PSM and MSL
Anomaly Transformer
Association discrepancy · works on sequences ≥50 steps
Total papers
14 results · 11 with public implementations

Macroeconomic feature engineering data

Pull time-series data for hundreds of economic indicators from the World Bank and FRED — GDP growth, inflation, unemployment, housing starts — to use as features in economic prediction models. Get structured data ready to merge into your training set.

Pull monthly data for US CPI, 10-year Treasury yield, and unemployment rate from 2010 to 2025. I need it for a credit risk feature matrix.

Retrieved 3 time series: US CPI (CPIAUCSL) — 180 monthly observations, 2010–2025. 10Y Treasury (DGS10) — 180 obs. Unemployment (UNRATE) — 180 obs. All aligned to month-end dates. Notable: CPI peaked at 9.1% June 2022, Treasury spiked to 5.0% Oct 2023.

ToolRouter get_series
15102010201620222025
CPI YoY (%)
10Y Treasury (%)
Unemployment (%)

Chart and visualization generation

Turn raw numbers into publication-quality charts for papers, slide decks, and stakeholder presentations. Generate bar, line, scatter, and histogram charts from data without spinning up a Jupyter notebook.

Create a dual-axis line chart: left axis shows monthly revenue ($M), right axis shows customer count, both from Jan 2023 to Dec 2024. Revenue: 1.2, 1.4, 1.5, 1.7, 1.9, 2.1, 2.4, 2.6, 2.8, 3.1, 3.4, 3.8 (monthly). Customer count: 450, 490, 510, 560, 600, 640, 700, 740, 790, 840, 900, 960.

Generated dual-axis line chart. Revenue (left, blue line): strong upward trend, 216% growth Jan 2023 to Dec 2024. Customer count (right, green line): 113% growth in same period. Revenue growing faster than customers indicates strong ARPU expansion.

ToolRouter create_chart
1510JanAprJulOctDec
Revenue ($M)
Customers (hundreds)

Nutrition and health data for dietary models

Look up USDA nutrition data for any food by name — calories, macros, micronutrients, glycemic index — to build food classification models, dietary recommendation systems, or health outcome datasets.

Get nutrition data for these 10 foods: oatmeal, chicken breast, brown rice, broccoli, almonds, Greek yogurt, sweet potato, salmon, black beans, and spinach. I need calories per 100g, protein, carbs, fiber, and glycemic index.

Retrieved 10 entries. Highest protein per 100g: chicken breast (31g), salmon (20g), Greek yogurt (10g). Highest fiber: black beans (8.7g), broccoli (2.6g), oatmeal (2.4g). Lowest glycemic index: spinach (15), broccoli (15), almonds (15). Data sourced from USDA FoodData Central.

ToolRouter lookup_food
FoodCaloriesProtein (g)
Chicken breast16531
Salmon20820
Black beans1328.9
Broccoli342.8
Source: USDA FoodData Central · 4 of 10 shown

Clinical trial data for healthcare models

Search ClinicalTrials.gov for completed trials with outcome data on specific conditions, treatments, and patient populations. Use published trial endpoints as ground truth for validating healthcare prediction models.

Find completed Phase 3 trials for Type 2 diabetes interventions that report HbA1c reduction as a primary endpoint, from 2020–2025.

Found 23 completed Phase 3 trials. Top by enrollment: SURMOUNT-4 (tirzepatide, N=670, HbA1c -2.4%), SURPASS-5 (tirzepatide vs insulin, N=475), PIONEER 1-8 series (semaglutide oral, 8 trials). Average HbA1c reduction across GLP-1 trials: -1.9%. Detailed endpoints and NCT numbers included.

ToolRouter search_trials
TrialDrugN
SURMOUNT-4Tirzepatide670
SURPASS-5Tirzepatide vs Insulin475
PIONEER 6Semaglutide (oral)3,183
SUSTAIN-7Semaglutide vs Dulaglutide1,201
23 total trials · Avg HbA1c reduction -1.9% across GLP-1 class

Earthquake and geospatial data for risk models

Pull seismic event data by region and magnitude range for natural disaster risk models, insurance underwriting features, or geospatial ML datasets. Get historical earthquake records with location, depth, and magnitude.

Pull all magnitude 5.0+ earthquakes in Japan from 2015 to 2024 for a seismic risk prediction model. I need lat/lon, depth, magnitude, and date.

Retrieved 847 events: magnitude 5.0–9.0, Japan region (30°N–45°N, 130°E–145°E). Largest: 2024-01-01 Noto Peninsula, M7.6, depth 10km. Average 94 events/year above M5.0. Depth distribution: 73% shallow (<70km). Dataset includes lat/lon coordinates for feature engineering.

ToolRouter search_earthquakes
1
2024 Noto Peninsula M7.6
Depth 10km · Jan 1, 2024
2
2021 Fukushima M7.3
Depth 60km · Feb 13, 2021
3
2019 Chiba M6.1
Depth 35km · Jun 18, 2019
4
2018 Hokkaido M6.7
Depth 35km · Sep 6, 2018

Ready-to-use prompts

Find SOTA papers on tabular ML

Search for papers published 2023–2025 comparing XGBoost, TabNet, and TabTransformer on tabular classification benchmarks. I need accuracy comparisons, dataset sizes used, and whether any outperform gradient boosting on real-world data.

Pull FRED economic time series

Fetch monthly data from 2015 to 2025 for: US CPI (CPIAUCSL), Core PCE (PCEPILFE), 10-year Treasury yield (DGS10), and unemployment rate (UNRATE). Format as time series with date and value columns.

Generate churn analysis chart

Create a line chart with two series over 12 months: monthly churn rate (%) and monthly NPS score. Churn: 5.2, 4.8, 4.5, 5.1, 3.9, 3.7, 3.4, 3.2, 3.0, 2.8, 2.6, 2.4. NPS: 32, 34, 36, 33, 39, 41, 44, 46, 48, 51, 53, 55.

Nutrition data for 20 foods

Get USDA nutrition data per 100g for: oatmeal, white rice, quinoa, lentils, chicken breast, salmon, tofu, eggs, cheddar, whole milk, broccoli, spinach, sweet potato, banana, apple, almonds, olive oil, Greek yogurt, black beans, and avocado.

Clinical trials for condition

Find completed Phase 3 clinical trials for treatment-resistant depression published 2020–2025. Show primary endpoints, sample sizes, and key efficacy outcomes.

Historical earthquake data

Pull all M4.5+ earthquakes within 200km of Los Angeles (34.05°N, 118.24°W) from 2000 to 2024. Include date, magnitude, depth, and epicenter coordinates for a seismic risk feature dataset.

World Bank indicator data

Fetch GDP per capita (constant 2015 USD), life expectancy at birth, and CO2 emissions per capita for the US, China, Germany, India, and Brazil from 2000 to 2023.

Research feature engineering techniques

Search for papers on feature engineering techniques for imbalanced classification in fraud detection, published 2022–2025. Focus on SMOTE alternatives and cost-sensitive learning approaches.

Tools to power your best work

165+ tools.
One conversation.

Everything data scientists need from AI, connected to the assistant you already use. No extra apps, no switching tabs.

Model research and benchmarking

Before building a model, find the current state-of-the-art approaches, gather benchmark data, and pull the training dataset features you need.

1
Academic Research icon
Academic Research
Find SOTA papers and benchmark comparisons for the task
2
Economic Data icon
Economic Data
Pull time-series data for model features
3
Generate Chart icon
Generate Chart
Visualize data distributions and feature correlations

Healthcare prediction model data pipeline

Gather clinical trial outcomes, nutrition data, and medical literature to build a training dataset for a health prediction model.

1
Clinical Trials icon
Clinical Trials
Find completed trials with outcome data for the condition
2
Nutrition Data icon
Nutrition Data
Build nutrition feature matrix for dietary variables
3
Academic Research icon
Academic Research
Research published prediction models for validation approach

Stakeholder presentation prep

After model results are in, generate charts for the presentation and research context for the findings.

1
Generate Chart icon
Generate Chart
Generate performance metric charts (precision-recall, ROC)
2
Economic Data icon
Economic Data
Pull macroeconomic context data for the business narrative
3
Academic Research icon
Academic Research
Find citations to benchmark your model against published work

Frequently Asked Questions

How many papers does Academic Research cover?

Academic Research searches across major databases including Semantic Scholar, arXiv, PubMed, and CrossRef — covering hundreds of millions of papers across computer science, medicine, economics, and natural sciences. Results include citation counts and links to full text where available.

What economic indicators are available in the Economic Data tool?

Economic Data provides access to 800,000+ time series from FRED and US Census sources — including CPI, unemployment, GDP components, housing data, and demographics at zip-code level. Specify the FRED series ID or describe the indicator you need.

Can Generate Chart export images suitable for academic papers?

Yes. Generate Chart produces high-resolution PNG images that are suitable for inclusion in reports and presentations. Specify axis labels, titles, and color schemes in your prompt for publication-ready output.

How current is the clinical trials data?

Clinical Trials searches ClinicalTrials.gov in real time, covering all registered trials including recently completed and updated status. Results include primary endpoints, sample sizes, and completion dates.

Can I pull World Bank data for custom country groups?

Yes. World Economy supports any combination of countries and indicators. Specify country names or ISO codes and the indicator you need. It covers 200+ countries across 16,000+ World Bank indicators from 1960 to present.

More AI tools by profession

Give your AI superpowers.

Get started for free

Works in Chat, Cowork and Code