Blog
Notes on statistical modeling, machine learning systems, and the tools I am learning from.
Data Science
- Understanding GLM CoefficientsHow GLM coefficients are interpreted, and why correlated predictors can make that interpretation unstable.GLMCoefficientsMulticollinearity
- Controls, Offsets, and Omitted Variable Bias in GLMsWhen to estimate a variable as a control, when to use an offset, and why omitted variables can bias insurance models.GLMControlsOffsets
- Variance Inflation Factor: Why Adding a Constant MattersHow VIF diagnoses multicollinearity, why it affects standard errors, and why the auxiliary regression should include an intercept.VIFMulticollinearityRegression
- Model Interpretability: SHAP, LIME, PDP, and ICE PlotsHow I think about local explanations, global explanations, marginal effects, and individual-level model behavior.InterpretabilitySHAPLIME
- A Practical Guide to GLM DistributionsHow binomial, Poisson, negative binomial, gamma, inverse Gaussian, chi-square, and Tweedie distributions show up in modeling.GLMDistributionsStatistics
- Double Lift Charts for Insurance Model EvaluationHow double lift charts compare actual loss cost, current pricing, and a challenger model.InsuranceModel EvaluationLift
- Why Tweedie GLMs Matter in InsuranceWhy Tweedie models are useful for pure premium, loss ratio, and nonnegative outcomes with many zeros.TweedieGLMInsurance
- Calibration vs. Model PowerWhy a model can rank risks well but still produce unreliable probabilities.CalibrationModel EvaluationRisk
- How LLMs WorkA narrative note on next-token prediction, attention, training, inference, hallucination, nondeterminism, and agents.LLMAttentionAI Agents
ML Systems
- How SageMaker Runs a Training JobA practical walkthrough of Docker images, ECR, SageMaker estimators, S3 inputs, and model artifacts.AWSSageMakerDocker
- Data Lake vs. Data WarehouseWhy cheap storage is not the whole story when designing analytical data platforms.Data EngineeringData LakeWarehouse
- ML System Design, Part 1: From Business Problem to Training DataNotes on production priorities, business metrics, data systems, labels, sampling, and class imbalance.ML System DesignTraining DataData Engineering
- ML System Design, Part 2: Features, Evaluation, and DeploymentNotes on feature engineering, leakage, model selection, experiment tracking, batch prediction, online prediction, and model compression.ML System DesignDeploymentEvaluation
- ML System Design, Part 3: Monitoring, Continual Learning, and MLOps ToolsNotes on distribution shift, production testing, continual learning, model stores, feature stores, Docker, Kubernetes, and orchestration.ML System DesignMonitoringMLOps