I can't attend the webinar. Can I get the video recording?

Yes, I will send you a video recording after the webinar is over. All those who register will have access to a video recording.

When will I get the video recording?

We will send you the video recording on by 5pm GMT on 17 March, one day after the workshop. Please email us about the recording only if you haven't received it by 5pm GMT on 17 March.

From Raw Data to Published Paper Blueprint

ByMushtaq Bilal, PhD

Online event

Monday, Mar 16 from 6 pm to 9:30 pm CET

Overview

Learn a structured, iterative framework to transform raw data into publishable outputs.

From Raw Data to Publsihed Paper Blueprint

This blueprint is a structured, iterative framework for transforming raw data into publishable outputs. It integrates scientific rigor, exploration, modern analytical workflows, and the communication of research findings through five metaphorical roles.

For each role, we will cover the foundational logic and scientific principles, and immediately apply them with a real dataset to move from theory to practice.

By the end of the session, you will have completed a full end-to-end workflow, taking one dataset from its raw state to a submission-ready manuscript.

1. The Scientist: Science Before Statistics

Focus: Conceptualization and The Data-Question Fit

A. Defining the research question. Is the data a good match for the research question?

B. Defining the Estimand: Being precise about what you want to measure (e.g., the average effect of a treatment vs. a descriptive trend).

C. Causal Inference vs. Correlation: Moving beyond "associations" to understand the mechanism. Why does X lead to Y?

2. The Janitor: The Master of the Pipeline

Focus: Reproducible Organization.

A. Tidy Data Principles: Ensuring every variable is a column, every observation is a row, and every value is a cell.

B. Standardized Folder Structure: Organizing your project into /data, /scripts, /outputs, and /figs to ensure you never lose a file again.

C. Literate Programming with Quarto: Combining prose and code in one document. We will use LLMs to write the "engine" (the code) while you provide the "steering" (the logic).

3. The Explorer: Mapping the Terrain

Focus: Exploratory Data Analysis (EDA)

A. Visualizing Distributions: Using histograms and density plots to check for normality, skewness, and multi-modality.

B. Pattern Recognition: Using scatterplots and correlation heatmaps to see how variables interact.

C. Outlier & Missing Data Audit: Identifying "weird" data points and deciding—scientifically—how to handle them (Imputation vs. Exclusion).

4. The Engineer: The Modeling Workflow

Focus: AI-Assisted Implementation of statistical frameworks

A. Regression and linear modeling: Understanding the "light saber" of statistics

B. Prompt Engineering for Analysis: How to use LLMs (ChatGPT/Claude) to generate R or Python code for your specific model.

C. Model Diagnostics: Validating the "Engineer's" work—checking residuals and ensuring the model assumptions haven't been violated.

5. The Storyteller: From Numbers to Narrative

Focus: High-Impact Communication.

A. Publication-Ready Figures: Designing charts—optimizing colors, fonts, and labels for peer-reviewed journals.

B. Reporting Statistical Results. Interpreting outputs in plain language; what the estimate means, how certain you are, and why it matters.

C. Rendering the Final Manuscript. Using Quarto to compile your entire analysis—prose, code, tables, and figures—into a Word, PDF, or HTML document in a single step, with outputs that update automatically when the data changes.

Learn a structured, iterative framework to transform raw data into publishable outputs.

From Raw Data to Publsihed Paper Blueprint

For each role, we will cover the foundational logic and scientific principles, and immediately apply them with a real dataset to move from theory to practice.

By the end of the session, you will have completed a full end-to-end workflow, taking one dataset from its raw state to a submission-ready manuscript.

1. The Scientist: Science Before Statistics

Focus: Conceptualization and The Data-Question Fit

A. Defining the research question. Is the data a good match for the research question?

B. Defining the Estimand: Being precise about what you want to measure (e.g., the average effect of a treatment vs. a descriptive trend).

C. Causal Inference vs. Correlation: Moving beyond "associations" to understand the mechanism. Why does X lead to Y?

2. The Janitor: The Master of the Pipeline

Focus: Reproducible Organization.

A. Tidy Data Principles: Ensuring every variable is a column, every observation is a row, and every value is a cell.

B. Standardized Folder Structure: Organizing your project into /data, /scripts, /outputs, and /figs to ensure you never lose a file again.

C. Literate Programming with Quarto: Combining prose and code in one document. We will use LLMs to write the "engine" (the code) while you provide the "steering" (the logic).

3. The Explorer: Mapping the Terrain

Focus: Exploratory Data Analysis (EDA)

A. Visualizing Distributions: Using histograms and density plots to check for normality, skewness, and multi-modality.

B. Pattern Recognition: Using scatterplots and correlation heatmaps to see how variables interact.

C. Outlier & Missing Data Audit: Identifying "weird" data points and deciding—scientifically—how to handle them (Imputation vs. Exclusion).

4. The Engineer: The Modeling Workflow

Focus: AI-Assisted Implementation of statistical frameworks

A. Regression and linear modeling: Understanding the "light saber" of statistics

B. Prompt Engineering for Analysis: How to use LLMs (ChatGPT/Claude) to generate R or Python code for your specific model.

C. Model Diagnostics: Validating the "Engineer's" work—checking residuals and ensuring the model assumptions haven't been violated.

5. The Storyteller: From Numbers to Narrative

Focus: High-Impact Communication.

A. Publication-Ready Figures: Designing charts—optimizing colors, fonts, and labels for peer-reviewed journals.

B. Reporting Statistical Results. Interpreting outputs in plain language; what the estimate means, how certain you are, and why it matters.

Lineup

Headliner

Ruben Dario Palacio, PhD

Mushtaq Bilal, PhD

Good to know

Highlights

3 hours 30 minutes
Online

Refund Policy

Refunds up to 7 days before event

Location

Online event

Agenda

05:00 PM - 05:55 PM

Session 1 (Times in GMT)

1. The Scientist: Science Before Statistics 2. The Janitor: The Master of the Pipeline

06:00 PM - 06:55 PM

Session 2 (Times in GMT)

3. The Explorer: Mapping the Terrain 4. The Engineer: The Modeling Workflow

07:00 PM - 08:00 PM

Session 3 (Times in GMT)

5. The Storyteller: From Numbers to Narrative Q&A

Frequently asked questions

Organized by

Mushtaq Bilal, PhD

Followers--

Events49

Hosting3 years

Report this event