Data Science for Biologists

This website contains all materials used in BIOL 01301 Spring 2020 at Rowan University with Dr. Spielman. This course now meets remotely, but used to meet Monday 12:30-1:45 in SCI 239, Wednesday 12:30-1:45 in SCI 226, and Thursday 11-1:45 in SCI 239.

All grading and assignment submissions (unless otherwise stated) will be hosted on Blackboard, but course materials will be posted on this site and/or within the rstudio.cloud BIOL01301 space.



Online Books and Resources


Post-Spring Break Class Schedule and Materials

Materials are subject to change up until the Monday of the given week. Below is a ROUGH OUTLINE of what we’ll be doing.

UPDATED DEADLINES:

  • tidyr assignment due Sunday March 29th 11:59 pm on Blackboard
  • Midterm Project due as Pull Request Sunday March 29th at 11:59 pm
    • 10% deduction deadline will be Wednesday April 1st 11:59 pm
    • Additional 5% deduction for any additional late day (i.e. Thursday 4/2 submission by 11:59 pm has 15% deduction, etc.)

Week of 3/30/20

Week of 4/6/20

  • Lecture videos and associated materials
    • Linear regression II and concepts in model selection: Tutorial
      • See Slack #general for some lecture clarifications: one and two
    • Introdution to model evaluation: Slides
  • Assignment due Thursday 4/16/20 at 11:59 pm on Blackboard

Week of 4/13/20


Week of 4/20/20


Week of 4/27/20


Final Assignment (Fork to your github account!)

  • Counts as TWO assignment grades (aka if you don’t do this, you are still stuck with a 0 grade on an assignment. Get it?)
  • Due Tuesday 5/12/20 at 11:59 pm as github PR. Absolutely NO extensions are possible for this submission due to University grade deadlines.
    • BONUS DEADLINE! Submit by Friday 5/8/20 by 11:59 pm as github PR for an automatic extra 10%!!
  • Attend the live Thursday session on 4/30/30 from 11:30 am - 1:45 pm - we will do about 1/3 of the final assignment TOGETHER! You don’t want to miss this!!!!!
  • During the normal final exam time, Thursday 5/6/20 from 12:30pm - 2:30pm, there will be an optional ZOOM SESSION (see class calendar) to come ask questions for completing this assignment. REALLY REALLY HIGHLY RECOMMENDED.

Pre-Spring Break Class Schedule and Materials

Materials are subject to change up until the day of class based on student progress. Any change in deadlines will only be to postpone.

Link to midterm project: https://github.com/sjspielman/datascience_midterm/

No. Day/Date
Topic and materials
Background Reading Assignment DUE
1 W 1/22 Introduction/Syllabus Day None None
2 R 1/23 Introduction to R and RStudio None None
3 M 1/27 Types of data and visualizations, I Two blog posts on types of data:
1. Towards Data Science
2. Scary Scientist
None
4 W 1/29 Types of data and visualizations, II 1. Read chapters 5, 6, 7, and 10 from “Fundamentals of Data Visualization”
2. Blog post on comparing plot types
5 R 1/30 Introduction to ggplot2

R script template
None
6 M 2/3 Communicating with (in)effective visualizations Watch “Lecture 6: Data Visualization” Videos Evaluating Data Visualization Assignment
7 W 2/5 Introduction to RMarkdown None None
8 R 2/6 More practice with ggplot2 Read chapter 21 from “Fundamentals of Data Visualization” Introduction to ggplot2 assignment due on Blackboard as an R script.
9 M 2/10 Reading, writing, and creating datasets None None
10 W 2/12 Manipulating data with dplyr I Background resources:
1. dplyr vignette
2. dplyr Intro from STAT545
3. GET MORE PRACTICE HERE!
None
11 R 2/13 Data manipulation with dplyr None More practice with ggplot2 assignment due on Blackboard as an Rmarkdown file (NOT the knitted HTML and/or PDF!!)
12 M 2/17 Best practices and other miscellany Watch rstudio::conf 2020 talk on debugging with “cheatsheet” None
13 W 2/19 Manipulating data with dplyr II Background resources
1. dplyr two-table vignette
2. Ch 14-15 from STAT545
3. GET MORE PRACTICE HERE
None
14 R 2/20 Data manipulation with dplyr II None Data manipulation with dplyr assignment due on Blackboard as an Rmd file.
15 M 2/24 Regroup and Project Introduction None None
16 W 2/26 Tidying data with tidyr 1. Read Tidy Data Paper
2. Look over tidyr vignette MAKE SURE ITS >=tidyr1.0.0
17 R 2/27 Working with tidy and untidy data
Link to full materials
HINT: use read_csv2() for the planets question!!
None Data manipulation with dplyr II assignment due on Blackboard
18 M 3/2 Introduction to Version Control with GitHub, and reproducibility 0. What is VC? ; Tutorial - IGNORE BRANCHES 1. Data sharing editorial from NEJM
2. One of many rebuttals to NEJM
3. Another rebuttal
4. Concepts in reproducibility and peer review
5. Reflections on sci-hub
6. The state of affairs is grim
7. But could be hopefully improving!
8. Michael Hoffman has good thoughts
9.Two twitter threads that showcase what bothers Dr. Spielman the most: one and two
19 W 3/4 Introduction to UNIX Cheatsheet
Exercises
None
20 R 3/5 Conducting a reproducible analyses with git/github Cheatsheet None
21 M 3/9 Working with strings with stringr 1. stringr reference
2.stringr vignette
None

END OF IN-PERSON INSTRUCTION