Blog
Notes on statistical modeling, machine learning systems, and the tools I am learning from.
Data Science
- How LASSO WorkA practical walkthrough of LASSO regularization.GLMStatisticsModeling
- How LLMs WorkNotes on next-token prediction, tokenization, embeddings, attention, MLPs, training, inference, hallucination, and nondeterminism.LLMAttentionTransformers
- Model Interpretability: SHAP, LIME, PDP, and ICE PlotsHow I think about local explanations, global explanations, marginal effects, and individual-level model behavior.InterpretabilitySHAPLIME
- Double Lift Charts for Insurance Model EvaluationHow double lift charts compare actual loss cost, current pricing, and a challenger model.InsuranceModel EvaluationLift
- Calibration vs. Model PowerWhy a model can rank risks well but still produce unreliable probabilities.CalibrationModel EvaluationRisk
- Controls, Offsets, and Omitted Variable Bias in GLMsWhen to estimate a variable as a control, when to use an offset, and why omitted variables can bias insurance models.GLMControlsOffsets
- Variance Inflation Factor: Why Adding a Constant MattersHow VIF diagnoses multicollinearity, why it affects standard errors, and why the auxiliary regression should include an intercept.VIFMulticollinearityRegression
- Understanding GLM CoefficientsHow GLM coefficients are interpreted, and why correlated predictors can make that interpretation unstable.GLMCoefficientsMulticollinearity
- How GLMs WorkA practical walkthrough of GLM likelihood, link functions, mean-variance relationships, deviance, dispersion, and optimization.GLMStatisticsModeling
ML Systems
- How SageMaker Runs a Training JobA practical walkthrough of Docker images, ECR, SageMaker estimators, S3 inputs, and model artifacts.AWSSageMakerDocker
- Data Lake vs. Data WarehouseWhy cheap storage is not the whole story when designing analytical data platforms.Data EngineeringData LakeWarehouse
- ML System Design, Part 3: Deployment, Monitoring, and MLOps ToolsNotes on batch and online prediction, model compression, cloud and edge deployment, distribution shift, production testing, continual learning, feature stores, and MLOps tools.ML System DesignDeploymentMonitoringMLOps
- ML System Design, Part 2: Features, Model Selection, and EvaluationNotes on feature engineering, leakage, model selection, experiment tracking, evaluation, and distributed training.ML System DesignFeaturesEvaluation
- ML System Design, Part 1: From Business Problem to Training DataNotes on production priorities, business metrics, data systems, labels, sampling, and class imbalance.ML System DesignTraining DataData Engineering