The Alt-Ac Job Beat Newsletter Post 11

Hi Everyone,

My advice this week is to not get too hung up on specific software or tools. The fad now in data science is to know about LLMs, a few years ago many were into "big data" (scare quotes because very few orgs really have data so large it needs to be distributed, most Hadoop systems I have seen in the wild are bad ideas compared to more typical SQL databases). No doubt there will be something new and shiny a year from now I know nothing about at the moment.

Focus on doing real projects for things you care about -- you can (and will need to) learn new tools all the time. If you know a few tools well (stats + R|python + SQL) you can get to a pretty advanced place for the majority of companies.

JOBS

Job board link

For some of the recent gigs:

I interviewed at Toyota (in Plano, TX in a fraud role) before taking my current gig. I have seen other companies do similar philanthropic type work (CVS and some banks in Dallas area), although not familiar with any scientists who have landed those gigs. (So let me know if you get a role like that! Would be cool to hear experiences.)

EXAMPLE SCIENTIST

Robert Fornango, PhD in criminology from UMSL. Currently has his own company, but you can check out his LinkedIn profile to see the journey from professor, to a healthcare group and advancing to a director role, and then going out on his own.

CRIME De-Coder is really just a public face to the consulting work I always did while in grad school and then as a professor. I do not know for certain, but I suspect Rob's story is similar -- after a while was able to go out and do it full time on his own. You can start small and build up more consulting work over time. Just having a public blog has netted me quite a few consulting gigs over the years via cold emails.

TECH ADVICE

Part of writing more professional code is knowing how typical projects are set up. So for python projects, it often looks like:

├── README.md           <- High level overview of project + any special notes
├── requirements.txt    <- info to set up environment, may use renv for R
├── main.py             <- main script at root of project
├── /reports            <- outputs that are not code
└── /src                <- functions in src folder
    ├───func1.py
    └───func2.py

And then in main.py, your code will look something like:

# import functions from src folder
from src import func1, func2
... do stuff ...

If you have R functions, your R code might look like:

# can source functions in R
source("./src/func1.R")
source("./src/func2.R")
... do stuff ...

The important thing here is to have functions in a nice isolated location that can be repeatedly called from other scripts. So even if you have say a Quarto report or a Jupyter notebook instead of main.py, they load those functions.

It makes code so much simpler to understand when you isolate regularly used functions, as opposed to 100% of the code dumped into singular files. Many projects if you go check out their github page are much more complicated, but this is the minimal (and it is good to start minimal!) way that most code projects look like (whether python, R, or other languages).

Best, Andy Wheeler