In this video presentation, Aleksa Gordić explains what it takes to scale ML models up to trillions of parameters! He covers the fundamental ideas behind all of the recent big ML models like Meta’s OPT-175B, BigScience BLOOM 176B, EleutherAI’s GPT-NeoX-20B, GPT-J, OpenAI’s GPT-3, Google’s PaLM, DeepMind’s Chinchilla/Gopher models, etc.
Video Highlights: Ultimate Guide To Scaling ML Models – Megatron-LM | ZeRO | DeepSpeed | Mixed Precision
Book Review: The Kaggle Book/Workbook
Kaggle is an incredible resource for all data scientists. I advise my Intro to Data Science students at UCLA to take advantage of Kaggle by first completing the venerable Titanic Getting Started Prediction Challenge, and then moving on to active challenges. Kaggle is a great way to gain valuable experience with data science and machine learning. Now, there are two excellent books to lead you through the Kaggle process. The Kaggle Book by Konrad Banachewicz and Luca Massaron published in 2022, and The Kaggle Workbook by the same authors published in 2023, both from UK-based Packt Publishing, are excellent learning resources.
Video Highlights: Fine Tune GPT-J 6B in Under 3 Hours on IPUs
Did you know you can run GPT-J 6B on Graphcore IPU in the cloud? Following the now infamous leaked Google memo, there’s been a real storm in the AI world recently around smaller, open source language models, like GPT-J, that are cheaper and faster to fine-tune, run and perform just as well as larger models for many language tasks.
Video Highlights: Introduction to Explainable AI
Responsible AI is reaching new heights these days. Companies have started exploring Explainable AI as a means to explain the results better to senior leadership and increase their trust in AI Algorithms. This workshop presentation, conducted by Supreet Kaur, Assistant Vice President at Morgan Stanley, will entail an overview of this area, importance of it in today’s era, and some of the practical techniques that you can use to implement it.
Why a Data-driven Culture is Important to the Success of your SaaS Business
In this contributed article, Joseph “OG” Meyers, discusses one of the best ways SaaS businesses can create advantage is by fostering a data-driven culture. Doing so lays the groundwork for employees at all levels to make sound business decisions that lead to success. To elaborate, here’s an explanation of what a data-driven culture means and why it’s so important to the success of a SaaS business.
Data Science: U-M Partners with Google to Offer Job-ready Tech Skills Program
A new flexible online training program on data science will prepare job-seekers in Michigan and beyond to quickly enter one of the fastest-growing labor markets and advance their careers. The University of Michigan’s Center for Academic Innovation created the program, “Data Analytics in the Public Sector with R,” for data science and other professionals interested in how public data sets can drive decisions and policymaking in the public sector. The course complements current Google career certificates, flexible online “Grow with Google” job-training programs for high-demand fields.
Cloudera Shines Educational Spotlight on Data and AI with Children’s Book for 8- to 12-year-olds
Cloudera, Inc., the enterprise data cloud company, announced “A Fresh Squeeze on Data,” a downloadable children’s book that explains simple ways to problem solve with data in a manner that kids can understand. The book was created in partnership with education company ReadyAI, with the goal of making data and AI more interesting and accessible to 8- to 12-year-olds.
Book Review: Mathematics for Machine Learning
“Mathematics for Machine Learning” by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, published by Cambridge University Press, is an excellent way to learn the math behind the models. This review shall highlight all the ways this book is special among the competition. Of all the books I’ve reviewed thus far, this is my favorite. Read on to learn why.
Video Highlights: FeatureTerminatoR Package for R
FeatureTerminatoR is an R package to remove unimportant variables from statistical and machine learning models automatically. The motivation for this package is simple, while there are many packages that do similar things, few of them perform automated removal of the features from your models. The author provides the video presentation below to help get you familiar with how the package works.
Global Data Science Competition Gathered Brilliant Minds to Solve Social Problems
For over two months, 50 teams representing 34 nationalities competed for a spot in the top ten of the World Data League (WDL) – a quest to find long-lasting solutions for social-oriented problems using data.